Feed · AI Feed

X · @elonmusk X / Twitter 6 days ago

RT WELT: In Deutschland ohne Freigabe – Elon Musk stellt ganzen Film von Uwe Boll bei X online http://to.welt.de/QRrI3OS

RT WELTIn Deutschland ohne Freigabe – Elon Musk stellt ganzen Film von Uwe Boll bei X online http://to.welt.de/QRrI3OS

X · @ylecun X / Twitter 6 days ago

RT Daniel Jeffries: It doesn't take much skill to game this out and see where we're going if we keep this up. Kneecap our own companies. Gate everythi…

RT Daniel JeffriesIt doesn't take much skill to game this out and see where we're going if we keep this up.Kneecap our own companies. Gate everything.…

X · @teortaxesTex X / Twitter 6 days ago

RT Coocoo: Re @teortaxesTex I wonder what the average European have to say about chINeSE oVERcaPACiTy right now https://x.com/LEDx2000/status/20698583…

RT CoocooRe @teortaxesTex I wonder what the average European have to say about chINeSE oVERcaPACiTy right nowhttps://x.com/LEDx2000/status/2069858340440584277Fml: @CeliaBedelia Yesterday i found out, someone even made a…

X · @teortaxesTex X / Twitter 6 days ago

> you cannot roll out air conditioning across Europe at the scale and speed needed to solve immediate problems such as the current heatwave …I guess …

> you cannot roll out air conditioning across Europe at the scale and speed needed to solve immediate problems such as the current heatwave…I guess the…

MarkTechPost Tech Media 6 days ago

Build a Nanobot-Style AI Agent in Google Colab with Tool Calling, Session Memory, Skills, and MCP Servers

In this tutorial, we build a lightweight personal AI agent inspired by the architecture of nanobot, runnable entirely in Google Colab. We start from a provider…

HF Daily Papers Papers 6 days ago

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

As LLM agents become capable of increasingly long-horizon tasks, evaluating their performance in economic systems is becoming increasingly important. Unlike existing benchmarks that primarily evaluate a…

Hacker News (front page) Communities 6 days ago

Why current LLM costs are not sustainable

Article URL: https://aditya.patadia.org/p/ai-and-cloud-costs Comments URL: https://news.ycombinator.com/item?id=48683588 Points: 28 # Comments: 7

X · @teortaxesTex X / Twitter 6 days ago

The yet unconquered frontier for open weights LLMs:

The yet unconquered frontier for open weights LLMs:GDP: Kind of questions where GLM 5.2 trips over (so does Opus 4.6, only Opus 4.8 does a pretty…

X · @ylecun X / Twitter 6 days ago

RT Daniel Jeffries: There will be more and more pressure and more money and lobbying behind open soon. It is not a "two big companies have all the pow…

RT Daniel JeffriesThere will be more and more pressure and more money and lobbying behind open soon. It is not a "two big companies have all…

llama.cpp releases Infrastructure 6 days ago

b9804

mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check (#23082) mamba2: remove hardcoded 2x expansion factor, support any expand value mamba2: remove invalid…

LessWrong AI Communities 6 days ago

Don't ignore the car crashes, and remember your freshman CS

Car crashes kill over 35,000 people in the US every year. Plane crashes, on the other hand, kill ~350. Despite this, we have shows like Mayday/Air…

X · @swyx X / Twitter 6 days ago

RT lily zhang: http://x.com/i/article/2069838668370673665

RT lily zhanghttp://x.com/i/article/2069838668370673665

METR Tech Media 6 days ago

Summary of METR's predeployment evaluation of GPT-5.6 Sol

Note on independence: This evaluation was conducted under a standard NDA. Due to the sensitive information shared with METR as part of this evaluation, OpenAI’s comms…

r/LocalLLaMA Communities 6 days ago

KLD is flawed in abliteration.

I've noticed while creating my abliteration engine that KL is a flawed metric because it can be represented so many different ways, it depends completely on…

X · @SakanaAILabs X / Twitter 6 days ago

RT Takashi Ishida // ICML 2026: Excited to share CoffeeBench!!☕️☕️☕️ We evaluate LLM agents in a 90-day B2B coffee supply-chain economy spanning…

RT Takashi Ishida // ICML 2026Excited to share CoffeeBench!!☕️☕️☕️We evaluate LLM agents in a 90-day B2B coffee supply-chain economy spanning farmers, roasters, and retailers, where these…

X · @hardmaru X / Twitter 6 days ago

RT Takashi Ishida // ICML 2026: Excited to share CoffeeBench!!☕️☕️☕️ We evaluate LLM agents in a 90-day B2B coffee supply-chain economy spanning…

RT Takashi Ishida // ICML 2026Excited to share CoffeeBench!!☕️☕️☕️We evaluate LLM agents in a 90-day B2B coffee supply-chain economy spanning farmers, roasters, and retailers, where these…

r/LocalLLaMA Communities 6 days ago

Anyone tried Ornith-1.0 9B?

Should I even give it a chance over "qwopus3.5 9b v3.5" or "qwopus3.5 9b coder"? anyone tried it?? submitted by /u/BothYou243 [link] [comments]

r/MachineLearning Communities 6 days ago

How’re you deploying LLMs in production now-a-days? What’s the best and most affordable way? [D]

I've been developing an AI product using LLM APIs (from OpenRouter) but want to deploy an open-source LLM in my own Prod env. which I can…

Hacker News (front page) Communities 6 days ago

Micron locks in historically high memory prices for five years

Article URL: https://www.theregister.com/systems/2026/06/25/micron-locks-in-historically-high-memory-prices-for-five-years/5261854 Comments URL: https://news.ycombinator.com/item?id=48683041 Points: 16 # Comments: 7

Hacker News (front page) Communities 6 days ago

US Govt to individually approve who gets GPT 5.6

Article URL: https://old.reddit.com/r/LocalLLaMA/comments/1ufo0un/us_govt_to_individually_approve_who_gets_gpt_56/ Comments URL: https://news.ycombinator.com/item?id=48683021 Points: 10 # Comments: 4

r/LocalLLaMA Communities 6 days ago

Ornith 1.0 – terminology and concepts explained (basic)

I made a quick guide for myself while wanting to try the new models, so I share it with you. It's pretty basic, but it may…

X · @hardmaru X / Twitter 6 days ago

RT Sakana AI: Re CoffeeBenchでは、6体のエージェントがメールや取引で相互作用し、各社が利益の最大化を目指します。LLMエージェントが経営を担う社会が来たと…

RT Sakana AIRe CoffeeBenchでは、6体のエージェントがメールや取引で相互作用し、各社が利益の最大化を目指します。LLMエージェントが経営を担う社会が来たときに、協調や競争、ときに不正はどう現れるのか。CoffeeBenchは、それを観察するための実験場でもあります。

X · @hardmaru X / Twitter 6 days ago

RT Sakana AI: SakanaAIは、有限責任あずさ監査法人と共同で、LLMエージェントの長期的な経営能力を評価する新しいベンチマーク「CoffeeBench」を公開しました。…

RT Sakana AISakanaAIは、有限責任あずさ監査法人と共同で、LLMエージェントの長期的な経営能力を評価する新しいベンチマーク「CoffeeBench」を公開しました。ブログ：https://sakana.ai/coffee-bench/現実の経済では、消費者へ直接売るビジネスだけでなく、企業同士が継続的に取引するビジネスも重要です。CoffeeBench は、農家・焙煎店・小売店の計6社が参加するコーヒー業界のサプライチェーンをシミュレーションし、各社をLLMエージェントが運営。90日間にわたって価格交渉・発注・在庫管理などを行い、純利益の最大化を目指します。最新のLLMを同じ環境で競わせると、経営成績は大きく分かれ

HF Daily Papers Papers 6 days ago

Discretizing Reward Models

Despite their widespread use, the role of reward models in shaping reinforcement learning is poorly understood. Reward models offer a tempting promise: they automatically estimate response…

Feed 6,424 posts

RT WELT: In Deutschland ohne Freigabe – Elon Musk stellt ganzen Film von Uwe Boll bei X online http://to.welt.de/QRrI3OS

RT Daniel Jeffries: It doesn't take much skill to game this out and see where we're going if we keep this up. Kneecap our own companies. Gate everythi…

RT Coocoo: Re @teortaxesTex I wonder what the average European have to say about chINeSE oVERcaPACiTy right now https://x.com/LEDx2000/status/20698583…

> you cannot roll out air conditioning across Europe at the scale and speed needed to solve immediate problems such as the current heatwave …I guess …

Build a Nanobot-Style AI Agent in Google Colab with Tool Calling, Session Memory, Skills, and MCP Servers

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

Why current LLM costs are not sustainable

The yet unconquered frontier for open weights LLMs:

RT Daniel Jeffries: There will be more and more pressure and more money and lobbying behind open soon. It is not a "two big companies have all the pow…

b9804

Don't ignore the car crashes, and remember your freshman CS

RT lily zhang: http://x.com/i/article/2069838668370673665

Summary of METR's predeployment evaluation of GPT-5.6 Sol

KLD is flawed in abliteration.

RT Takashi Ishida // ICML 2026: Excited to share CoffeeBench!!☕️☕️☕️ We evaluate LLM agents in a 90-day B2B coffee supply-chain economy spanning…

RT Takashi Ishida // ICML 2026: Excited to share CoffeeBench!!☕️☕️☕️ We evaluate LLM agents in a 90-day B2B coffee supply-chain economy spanning…

Anyone tried Ornith-1.0 9B?

How’re you deploying LLMs in production now-a-days? What’s the best and most affordable way? [D]

Micron locks in historically high memory prices for five years

US Govt to individually approve who gets GPT 5.6

Ornith 1.0 – terminology and concepts explained (basic)

RT Sakana AI: Re CoffeeBenchでは、6体のエージェントがメールや取引で相互作用し、各社が利益の最大化を目指します。LLMエージェントが経営を担う社会が来たと…

RT Sakana AI: SakanaAIは、有限責任あずさ監査法人と共同で、LLMエージェントの長期的な経営能力を評価する新しいベンチマーク「CoffeeBench」を公開しました。…

Discretizing Reward Models