Using LLM-as-a-Judge For Evaluation: A Complete Guide
Using LLM-as-a-Judge For Evaluation: A Complete Guide
Using LLM-as-a-Judge For Evaluation: A Complete Guide
Much of the recent advancements in large language models (LLMs) have been powered by human feedback, usually in the form of preference datasets. Think of preferences…
Look at and label your data, build and evaluate your LLM-evaluator, and optimize it against your labels.
“Theory of Mind” (ToM) is the ability to understand that others have their own thoughts and beliefs, even when they differ from ours — a skill…
Empowering conservation efforts through innovative technologies and global collaborationA vessel captured by NASA’s Landsat 8. Skylight’s computer vision models leverage this imagery to identify suspicious behavior,…
We are proud to present the latest model ⚡️Yi-Lightning ⚡️ now #6 in the world, higher than the original GPT-4o released 5 months ago. Also humbled…
We're thrilled to unveil Yi-Lightning and Yi-Lightning-Lite, our latest proprietary models! Both are now accessible via API at https://platform.lingyiwanwu.com and featured in @lmarena_ai's Chatbot Arena (https://lmarena.ai/).…
Interim report on ongoing work on mechanistic anomaly detection
GPT-NeoX now supports post-training thanks to a collaboration with SynthLabs.
What's in the book and how we wrote it
A central goal of the OLMo project is to use our experience to contribute to an open science of LM pretraining to provide a foundation for…
An Architect model describes how to solve the coding problem, and an Editor model translates that into file edits. This Architect/Editor approach produces SOTA benchmark results.
Merge pull request #617 from 01-ai/Haijian06-patch-1 Update README.md
Merge pull request #616 from 01-ai/nlmlml-patch-3 Update README.md
Merge pull request #615 from 01-ai/Haijian06-patch-1 Update README.md
Merge pull request #614 from 01-ai/Haijian06-patch-1 Update README.md
Merge pull request #613 from 01-ai/Haijian06-patch-1 Update README.md
Merge pull request #612 from 01-ai/Haijian06-patch-1 Update README.md
Merge pull request #611 from 01-ai/Haijian06-patch-1 Update README.md
Merge pull request #610 from 01-ai/Haijian06-patch-2 Update README_cn.md
Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon
For an AI model to be useful in specific contexts, it often needs access to background knowledge.
Exploring the implementation details of muTransfer
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction In the past three months since Qwen2’s release, numerous developers have built new models on the Qwen2 language models,…