Feed · AI Feed

Jay Alammar Tech Media March 7, 2022

Applying massive language models in the real world with Cohere

A little less than a year ago, I joined the awesome Cohere team. The company trains massive language models (both GPT-like and BERT-like) and offers them…

Lilian Weng Tech Media February 20, 2022

Learning with not Enough Data Part 2: Active Learning

This is part 2 of what to do when facing a limited amount of labeled data for supervised learning tasks. This time we will get some…

Jay Alammar Tech Media January 3, 2022

The Illustrated Retrieval Transformer

Discussion: Discussion Thread for comments, corrections, or any feedback. Translations: Korean, Russian Summary: The latest batch of language models can be much smaller yet achieve GPT-3…

Lilian Weng Tech Media December 5, 2021

Learning with not Enough Data Part 1: Semi-Supervised Learning

When facing a limited amount of labeled data for supervised learning tasks, four approaches are commonly discussed.

Lilian Weng Tech Media September 24, 2021

How to Train Really Large Models on Many GPUs?

[Updated on 2022-03-13: add expert choice routing.] [Updated on 2022-06-10]: Greg and I wrote a shorted and upgraded version of this post, published on OpenAI Blog:…

Distill.pub Papers September 2, 2021

Understanding Convolutions on Graphs

Understanding the building blocks and design choices of graph neural networks.

Distill.pub Papers September 2, 2021

A Gentle Introduction to Graph Neural Networks

What components are needed for building learning algorithms that leverage the structure and properties of graphs?

Lilian Weng Tech Media July 11, 2021

What are Diffusion Models?

[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27:…

Distill.pub Papers July 2, 2021

Distill Hiatus

After five years, Distill will be taking a break.

Lilian Weng Tech Media May 31, 2021

Contrastive Representation Learning

The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones…

Distill.pub Papers May 6, 2021

Adversarial Reprogramming of Neural Cellular Automata

Reprogramming Neural CA to exhibit novel behaviour, using adversarial attacks.

Jay Alammar Tech Media May 4, 2021

Explainable AI Cheat Sheet

Introducing the Explainable AI Cheat Sheet, your high-level guide to the set of tools and methods that helps humans understand AI/ML models and their predictions. I…

Distill.pub Papers April 8, 2021

Weight Banding

Weights in the final layer of common visual models appear as horizontal bands. We investigate how and why.

Distill.pub Papers April 5, 2021

Branch Specialization

When a neural network layer is divided into multiple branches, neurons self-organize into coherent groupings.

Lilian Weng Tech Media March 21, 2021

Reducing Toxicity in Language Models

Large pretrained language models are trained over a sizable collection of online data. They unavoidably acquire certain toxic behavior and biases from the Internet. Pretrained language…

Distill.pub Papers March 4, 2021

Multimodal Neurons in Artificial Neural Networks

We report the existence of multimodal neurons in artificial neural networks, similar to those found in the human brain.

Distill.pub Papers February 11, 2021

Self-Organising Textures

Neural Cellular Automata learn to generate textures, exhibiting surprising properties.

Distill.pub Papers February 4, 2021

Visualizing Weights

We present techniques for visualizing, contextualizing, and understanding neural network weights.

Distill.pub Papers January 30, 2021

Curve Circuits

Reverse engineering the curve detection algorithm from InceptionV1 and reimplementing it from scratch.

Distill.pub Papers January 27, 2021

High-Low Frequency Detectors

A family of early-vision neurons reacting to directional transitions from high to low spatial frequency.

Jay Alammar Tech Media January 19, 2021

Finding the Words to Say: Hidden State Visualizations for Language Models

By visualizing the hidden state between a model's layers, we can get some clues as to the model's "thought process". Figure: Finding the words to say…

Lilian Weng Tech Media January 2, 2021

Controllable Neural Text Generation

[Updated on 2021-02-01: Updated to version 2.0 with several work added and many typos fixed.] [Updated on 2021-05-26: Add P-tuning and Prompt Tuning in the “prompt…

Jay Alammar Tech Media December 17, 2020

Interfaces for Explaining Transformer Language Models

Interfaces for exploring transformer language models by looking at input saliency and neuron activation. Explorable #1: Input saliency of a list of countries generated by a…

Distill.pub Papers December 8, 2020

Naturally Occurring Equivariance in Neural Networks

Neural networks naturally learn many transformed copies of the same feature, connected by symmetric weights.

Feed 5,539 posts