Weights & Biases LLM-Evaluator Hackathon – Hackathon Judge
Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon
Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon
For an AI model to be useful in specific contexts, it often needs access to background knowledge.
Exploring the implementation details of muTransfer
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction In the past three months since Qwen2’s release, numerous developers have built new models on the Qwen2 language models,…
A new benchmark to measure the impact of AI on improving science
Great to see how easy it is to build a search page with Yi-Coder and Cursor! Check out this useful tutorial!Second State: Write a Search Webpage…
Preliminary benchmark results for the new OpenAI o1 models.
The book was published September 2024
LLM-based chatbots’ capabilities have been advancing every month. These improvements are mostly measured by benchmarks like MMLU, HumanEval, and MATH (e.g. sonnet 3.5, gpt-4o). However, as…
Huge thanks to @josephpollack for bringing this amazing demo! Now Yi-Coder's power is at your fingertips!Joseph Pollack #Ï 🎗️: 🙋🏻♂️hey there folks, just released a coding…
FastAPI, FastHTML, Next.js, SvelteKit, and thoughts on how coding assistants influence builders' choices.
We hear awesome feedback on our Sep 4 Yi-Coder release and so glad the community finds it helpful! Here's more scoop🍦on our tech blog -- "Meet…
RT VentureBeatYi-Coder: The open-source AI that wants to be your coding buddy https://venturebeat.com/ai/yi-coder-the-open-source-ai-that-wants-to-be-your-coding-buddy/
By Technical Program Manager Christopher FiorelliAt Ai2, we’re continually looking for ways to expand the impact of AI research. Through collaboration with DSRI, Ai2 was invited…
We’re introducing OLMoE, jointly developed with Contextual AI, which is the first mixture-of-experts model to join the OLMo family. OLMoE brings two important aspects to the…
Turning models into products runs into five challenges
Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.
Blog written by Yuling GuLooking for an interpretable explanation evaluation tool that can automatically characterize the explanation capabilities of modern LLMs? Meet Digital Socrates at ACL…
The last few years of AI development have shown the power and potential of generative AI. Naturally, these leaps in machine intelligence have opened existential questions…
Along with our rebrand, we’re excited to debut a new release note process. Because we’re making regular updates and new asset roll-outs in our open ecosystem…
Interim report on ongoing work on mechanistic anomaly detection
🔥 Meet Yi-Large Turbo: the powerful, cost-effective upgrade to Yi-Large. Faster and more affordable at only $0.19 per 1M tokens for input and output. Ideal for…
IntroductionImagine yourself a decade ago, jumping directly into the present shock of conversing naturally with an encyclopedic AI that crafts images, writes code, and debates philosophy.…
Building and evaluating an open-source pipeline for auto-interpretability