Patterns for Building Cybersecurity Evals
A sandboxed target, inputs that influence task difficulty, tools, and a grader.
A sandboxed target, inputs that influence task difficulty, tools, and a grader.
Build a threat model, discover vulnerabilities, verify, triage, and patch.
Context as infra, taste as config, verification for autonomy, scale via delegation, closing the loop.
An eventful year of progress in health and career, while making time for travel and reflection.
Label some data, align LLM-evaluators, and run the eval harness with each change.
Based on what I've learned from role models and mentors in Amazon
An LLM that can converse in English & item IDs, and make recommendations w/o retrieval or tools.
Evaluation metrics, how to build eval datasets, eval methodology, and a review of several benchmarks.
Recsys & search are converging with LLMs via semantic IDs, data augmentation, and unified foundation models.
What makes a good leader? What do good leaders do? And commando, soldier, and police leadership.