Confidence Score of LLM Using Python

11h

Researchers automated LLM reasoning strategy design and cut token usage by 69.5%

Researchers from Meta and Google built AutoTTS to automatically discover optimal LLM reasoning strategies, cutting token ...

22h

LLMs believe false statements even after explicit warnings that they’re false

New research on so-called “negation neglect” finds that LLMs in a roughly analogous situation don’t behave that way. They appear to learn from the statistical patterns in their training text more than ...

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

DeepSWE puts GPT-5.5 atop the AI coding leaderboard while raising new questions about Claude Opus, SWE-Bench Pro, and ...

Decrypt

This Half-Gigabyte AI Model Runs Local Agents on Your Phone

OpenBMB's 1B-parameter model MiniCMP 5 brings MCP support and agentic tool use to on-device AI—but it has trouble with logic ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results