AI Feed

Lilian Weng Tech Media June 23, 2023

LLM Powered Autonomous Agents

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as…

Chip Huyen Tech Media June 7, 2023

Generative AI Strategy

I had a lot of fun preparing the talk: “Leadership needs us to do generative AI. What do we do?” for Fully Connected. The idea for…

Jay Alammar Tech Media May 9, 2023

Generative AI and AI Product Moats

Here are eight observations I’ve shared recently on the Cohere blog and videos that go over them.: Article: What’s the big deal with Generative AI? Is…

Lilian Weng Tech Media March 15, 2023

Prompt Engineering

Prompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the…

Lilian Weng Tech Media January 27, 2023

The Transformer Family Version 2.0

Many new Transformer architecture improvements have been proposed since my last post on “The Transformer Family” about three years ago. Here I did a big refactoring…

Lilian Weng Tech Media January 10, 2023

Large Transformer Model Inference Optimization

[Updated on 2023-01-24: add a small section on Distillation.] Large transformer models are mainstream nowadays, creating SoTA results for a variety of tasks. They are powerful…

Jay Alammar Tech Media January 1, 2023

Remaking Old Computer Graphics With AI Image Generation

Can AI Image generation tools make re-imagined, higher-resolution versions of old video game graphics? Over the last few days, I used AI image generation to reproduce…

Jay Alammar Tech Media October 4, 2022

The Illustrated Stable Diffusion

Translations: Chinese, Vietnamese. (V2 Nov 2022: Updated images for more precise description of forward diffusion. A few more images in this version) AI image generation is…

Lilian Weng Tech Media September 8, 2022

Some Math behind Neural Tangent Kernel

Neural networks are well known to be over-parameterized and can often easily fit data with near-zero training loss with decent generalization performance on test dataset. Although…

Lilian Weng Tech Media June 9, 2022

Generalized Visual Language Models

Processing images to generate text, such as image captioning and visual question-answering, has been studied for years. Traditionally such systems rely on an object detection network…

Lilian Weng Tech Media April 15, 2022

Learning with not Enough Data Part 3: Data Generation

Here comes the Part 3 on learning with not enough data (Previous: Part 1 and Part 2). Let’s consider two approaches for generating synthetic data for…

Jay Alammar Tech Media March 7, 2022

Applying massive language models in the real world with Cohere

A little less than a year ago, I joined the awesome Cohere team. The company trains massive language models (both GPT-like and BERT-like) and offers them…

Lilian Weng Tech Media February 20, 2022

Learning with not Enough Data Part 2: Active Learning

This is part 2 of what to do when facing a limited amount of labeled data for supervised learning tasks. This time we will get some…

Jay Alammar Tech Media January 3, 2022

The Illustrated Retrieval Transformer

Discussion: Discussion Thread for comments, corrections, or any feedback. Translations: Korean, Russian Summary: The latest batch of language models can be much smaller yet achieve GPT-3…

Lilian Weng Tech Media December 5, 2021

Learning with not Enough Data Part 1: Semi-Supervised Learning

When facing a limited amount of labeled data for supervised learning tasks, four approaches are commonly discussed.

Lilian Weng Tech Media September 24, 2021

How to Train Really Large Models on Many GPUs?

[Updated on 2022-03-13: add expert choice routing.] [Updated on 2022-06-10]: Greg and I wrote a shorted and upgraded version of this post, published on OpenAI Blog:…

Distill.pub Papers September 2, 2021

Understanding Convolutions on Graphs

Understanding the building blocks and design choices of graph neural networks.

Distill.pub Papers September 2, 2021

A Gentle Introduction to Graph Neural Networks

What components are needed for building learning algorithms that leverage the structure and properties of graphs?

Lilian Weng Tech Media July 11, 2021

What are Diffusion Models?

[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27:…

Distill.pub Papers July 2, 2021

Distill Hiatus

After five years, Distill will be taking a break.

Lilian Weng Tech Media May 31, 2021

Contrastive Representation Learning

The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones…

Distill.pub Papers May 6, 2021

Adversarial Reprogramming of Neural Cellular Automata

Reprogramming Neural CA to exhibit novel behaviour, using adversarial attacks.

Jay Alammar Tech Media May 4, 2021

Explainable AI Cheat Sheet

Introducing the Explainable AI Cheat Sheet, your high-level guide to the set of tools and methods that helps humans understand AI/ML models and their predictions. I…

Distill.pub Papers April 8, 2021

Weight Banding

Weights in the final layer of common visual models appear as horizontal bands. We investigate how and why.

Latest