X · @AndrewYNg June 4, 2026 · X / Twitter

New course on serving LLMs efficiently — how do you serve models to many concurrent users at low latency and reasonable cost? This short course is bu…

New course on serving LLMs efficiently -- how do you serve models to many concurrent users at low latency and reasonable cost? This short course is built with @RedHat and taught by @cedricclyburn.Efficient LLM serving requires efficient memory management. A 70B-parameter model takes ~140 GB just to load the weights. On

Read original