What's Missing From LLM Chatbots: A Sense of Purpose
LLM-based chatbots’ capabilities have been advancing every month. These improvements are mostly measured by benchmarks like MMLU, HumanEval, and MATH (e.g. sonnet 3.5, gpt-4o). However, as…
LLM-based chatbots’ capabilities have been advancing every month. These improvements are mostly measured by benchmarks like MMLU, HumanEval, and MATH (e.g. sonnet 3.5, gpt-4o). However, as…
Huge thanks to @josephpollack for bringing this amazing demo! Now Yi-Coder's power is at your fingertips!Joseph Pollack #Ï 🎗️: 🙋🏻♂️hey there folks, just released a coding…
FastAPI, FastHTML, Next.js, SvelteKit, and thoughts on how coding assistants influence builders' choices.
We hear awesome feedback on our Sep 4 Yi-Coder release and so glad the community finds it helpful! Here's more scoop🍦on our tech blog -- "Meet…
RT VentureBeatYi-Coder: The open-source AI that wants to be your coding buddy https://venturebeat.com/ai/yi-coder-the-open-source-ai-that-wants-to-be-your-coding-buddy/
By Technical Program Manager Christopher FiorelliAt Ai2, we’re continually looking for ways to expand the impact of AI research. Through collaboration with DSRI, Ai2 was invited…
We’re introducing OLMoE, jointly developed with Contextual AI, which is the first mixture-of-experts model to join the OLMo family. OLMoE brings two important aspects to the…
Turning models into products runs into five challenges
Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.
Blog written by Yuling GuLooking for an interpretable explanation evaluation tool that can automatically characterize the explanation capabilities of modern LLMs? Meet Digital Socrates at ACL…
The last few years of AI development have shown the power and potential of generative AI. Naturally, these leaps in machine intelligence have opened existential questions…
Along with our rebrand, we’re excited to debut a new release note process. Because we’re making regular updates and new asset roll-outs in our open ecosystem…
Interim report on ongoing work on mechanistic anomaly detection
🔥 Meet Yi-Large Turbo: the powerful, cost-effective upgrade to Yi-Large. Faster and more affordable at only $0.19 per 1M tokens for input and output. Ideal for…
IntroductionImagine yourself a decade ago, jumping directly into the present shock of conversing naturally with an encyclopedic AI that crafts images, writes code, and debates philosophy.…
Building and evaluating an open-source pipeline for auto-interpretability
An Open Course on LLMs, Led by Practitioners
How speculation gets laundered through pseudo-quantification
After studying how companies deploy generative AI applications, I noticed many similarities in their platforms. This post outlines the common components of a generative AI platform,…
Hallucination in large language models usually refers to the model generating unfaithful, fabricated, inconsistent, or nonsensical content. As a term, hallucination has been somewhat generalized to…
Rethinking AI agent benchmarking and evaluation
Writing up results from a recent project
Achieving even more surgical edits than LEACE without concept labels at inference time.
What We’ve Learned From A Year of Building with LLMs