Applying massive language models in the real world with Cohere
A little less than a year ago, I joined the awesome Cohere team. The company trains massive language models (both GPT-like and BERT-like) and offers them…
Every story across every category, newest first. Each card links to the original publisher; daily-brief posts open as editorial pages.
A little less than a year ago, I joined the awesome Cohere team. The company trains massive language models (both GPT-like and BERT-like) and offers them…
This is part 2 of what to do when facing a limited amount of labeled data for supervised learning tasks. This time we will get some…
Discussion: Discussion Thread for comments, corrections, or any feedback. Translations: Korean, Russian Summary: The latest batch of language models can be much smaller yet achieve GPT-3…
When facing a limited amount of labeled data for supervised learning tasks, four approaches are commonly discussed.
[Updated on 2022-03-13: add expert choice routing.] [Updated on 2022-06-10]: Greg and I wrote a shorted and upgraded version of this post, published on OpenAI Blog:…
Understanding the building blocks and design choices of graph neural networks.
What components are needed for building learning algorithms that leverage the structure and properties of graphs?
[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27:…
After five years, Distill will be taking a break.
The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones…
Reprogramming Neural CA to exhibit novel behaviour, using adversarial attacks.
Introducing the Explainable AI Cheat Sheet, your high-level guide to the set of tools and methods that helps humans understand AI/ML models and their predictions. I…
Weights in the final layer of common visual models appear as horizontal bands. We investigate how and why.
When a neural network layer is divided into multiple branches, neurons self-organize into coherent groupings.
Large pretrained language models are trained over a sizable collection of online data. They unavoidably acquire certain toxic behavior and biases from the Internet. Pretrained language…
We report the existence of multimodal neurons in artificial neural networks, similar to those found in the human brain.
Neural Cellular Automata learn to generate textures, exhibiting surprising properties.
We present techniques for visualizing, contextualizing, and understanding neural network weights.
Reverse engineering the curve detection algorithm from InceptionV1 and reimplementing it from scratch.
A family of early-vision neurons reacting to directional transitions from high to low spatial frequency.
By visualizing the hidden state between a model's layers, we can get some clues as to the model's "thought process". Figure: Finding the words to say…
[Updated on 2021-02-01: Updated to version 2.0 with several work added and many typos fixed.] [Updated on 2021-05-26: Add P-tuning and Prompt Tuning in the “prompt…
Interfaces for exploring transformer language models by looking at input saliency and neuron activation. Explorable #1: Input saliency of a list of countries generated by a…
Neural networks naturally learn many transformed copies of the same feature, connected by symmetric weights.