Why Modern Creator?

Bitwise AI · YouTube

A Google Director Open-Sourced His Claude Skills, 51K Stars on Github

How Addy Osmani packaged 14 years of Google engineering judgment into 23 markdown files -- and what is actually worth stealing.

Posted

June 11th

1 months ago

Duration

05:44

Format

Essay

educational

Views

327

35 likes

Big Idea

The argument in one line.

The addyosmani/agent-skills repo matters not because all 23 files are novel, but because it proved that packaging senior engineering judgment as portable markdown is itself a distribution strategy -- one that reached 51,000 stars before most tech press noticed.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use Claude Code, Cursor, Codex, or Gemini CLI and want an honest verdict on whether installing this plugin is worth it.
You are building or customizing your own agent skills or CLAUDE.md and want a reference for what Google-grade engineering habits look like codified.
You want to understand what doubt-driven development actually means as an agent instruction, not as a philosophy.

SKIP IF…

You want a step-by-step tutorial on using any specific skill -- this is a review and audit, not a walkthrough.
You already follow Addy Osmani closely and have read the repo yourself.

TL;DR

The full version, fast.

Addy Osmani open-sourced his complete AI agent workflow as 23 markdown skill files totaling 253KB. The three skills with genuine teeth are doubt-driven development (a second agent attacks every non-trivial decision with a clean context, max 3 rounds, reviewer never sees the original conclusion), context engineering (a 5-level load hierarchy that forces a structured confusion block when sources conflict instead of letting the agent silently pick), and source-driven development (detect version from lockfile, fetch official docs, cite the URL in code comments -- Stack Overflow and training data are explicitly banned). The whole set loads into Claude Code, Gemini CLI, and opencode from a single shared folder. The honest audit: zero benchmarks, most files are standard practice wearing a lanyard, maybe 4 of 23 have genuinely new ideas. The recommendation even Hacker News agrees on: treat it as reference, steal three skills, adapt them to how you actually work.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:19

01 · Intro

51K stars, growing 781/day, number 1 GitHub trending. Sets up the credibility and velocity of the repo.

00:19 – 00:56

02 · Who Is Addy Osmani

14 years Chrome DevTools, Google director, wrote the JavaScript Design Patterns book, latest project is Beyond Vibe Coding.

00:56 – 01:23

03 · Inside the Repo

23 skills, 22 covering the full software lifecycle, 1 meta-router skill. 253KB -- Google engineering handbook compressed.

01:23 – 01:55

04 · What's a Skill, Anyway

A skill is a folder with a SKILL.md file: name, description, instructions. Open standard -- Codex, Cursor, Copilot, Gemini CLI all load the same format. One-liners load at startup; full files load on match.

01:55 – 02:31

05 · Doubt-Driven Development

16KB, the biggest skill. A second agent with clean context attacks every non-trivial decision. Claim/Extract/Doubt/Reconcile/Stop, max 3 rounds. Reviewer never sees the original conclusion.

02:31 – 02:58

06 · Context Engineering

Agents mostly fail from wrong context, not low intelligence. Five-level load hierarchy. When sources conflict, agent surfaces a confusion block and asks -- never silently picks.

02:58 – 03:27

07 · Source-Driven Development

Detect version from lockfile, fetch official docs, implement, cite URL in code comment. Banned: Stack Overflow, blog posts, training data.

03:27 – 04:05

08 · One Repo, Three Agents

Same skills/ folder serves Claude Code (545B plugin.json, 7 slash commands), Gemini CLI (mirrored TOML files), opencode (10-byte symlink). AGENTS.md: skills=how, personas=who, commands=when.

04:05 – 04:37

09 · The Honest Audit

Zero benchmarks, no with/without comparisons. Git workflow skill triggers on any code change, always. Model can drop any markdown rule. 4 of 23 files have genuinely new ideas.

04:37 – 05:08

10 · The Lineage

January: Karpathy fan repo 172K stars (he did not publish it). Feb 3: Pocock 124K. Feb 15: Osmani 51K. Entire genre is 5 months old.

05:08 – 05:31

11 · What to Steal

One command installs everything, but the better move is reference not dependency. Steal doubt-driven-development, confusion block, interview-me. Adapt, do not bulk-install.

05:31 – 05:44

12 · Outro

A file. A folder. A format. Whose workflow goes viral next?

Atomic Insights

Lines worth screenshotting.

Agents mostly fail from wrong context, not low intelligence -- that is the diagnosis the entire context-engineering skill is built on.
The doubt loop never shows the reviewer your original conclusion, because showing it biases the reviewer toward agreement.
If you cannot write the claim compactly, you have a vibe, not a decision -- the entry test for the doubt-driven development loop.
Source-driven development bans Stack Overflow, blog posts, and the model own training data as primary sources.
Confidence is not evidence -- when an agent cannot verify something, it ships an explicit unverified block instead of bluffing.
The opencode port of the entire skills folder is a 10-byte symlink; the Claude Code adapter is a 545-byte plugin.json.
One skills folder loads natively into Claude Code, Gemini CLI, and opencode -- it is an open standard, not a Claude exclusive.
23 skills do not flood context because agents only read the one-liner descriptions at startup; full files load on match.
The repo hit 27K stars before the press noticed, got 2 points on Hacker News in April, then 376 points on the front page in May after a blog post.
Only 4 of 23 files contain genuinely new ideas; the rest is good engineering practice with a famous name attached.
The personal-workflow-as-viral-repo genre is five months old: Karpathy (Jan) to Pocock (Feb 3) to Osmani (Feb 15).
Senior engineering judgment -- the kind absorbed over years of code review -- now ships as a markdown repo you install in one command.

Takeaway

Three skills worth stealing from 23.

WHAT TO LEARN

The repo has 23 files but genuine novelty lives in roughly four of them -- and the honest audit the video runs is itself the lesson about how to evaluate any viral engineering resource.

Doubt-driven development forces you to write a compact falsifiable claim before any review loop starts -- a vibe fails the entry test, which is the point.
The reviewer never sees your original conclusion because showing it biases them toward agreement; clean context is the whole mechanism.
Context engineering treats agent failures as a context problem, not an intelligence problem -- wrong information loaded at the wrong priority causes most errors developers blame on the model.
When sources conflict, the right protocol is a structured confusion block that asks the human to resolve it, not a silent pick that buries the ambiguity.
Source-driven development bans not just Stack Overflow but the model own training data as a primary source -- confidence in memory is not the same as verified, version-specific documentation.
A single SKILL.md folder can serve Claude Code, Gemini CLI, and opencode simultaneously because the format is an open standard; agent lock-in is a choice, not a constraint.
Most viral engineering repos are good checklists with a famous name on them -- the test for genuine novelty is whether any idea would be non-obvious to a senior engineer who had not seen it before.
The personal-workflow-as-GitHub-repo is a five-month-old genre; the distribution insight (markdown as a credibility vehicle) may outlast the specific content of any individual repo.

Glossary

Terms worth knowing.

SKILL.md: A markdown file with a name, one-line description, and detailed instructions that defines a single agent skill. The description is always loaded; the full file loads only when a task matches.
Doubt-driven development: An agent pattern where a second agent with a clean context attacks every non-trivial decision in a Claim/Extract/Doubt/Reconcile/Stop loop, capped at three rounds.
Confusion block: A structured output format the agent uses when sources conflict, presenting options A/B/C and asking the user to resolve the ambiguity instead of silently choosing.
Source-driven development: An agent protocol that requires detecting the exact dependency version from a lockfile, fetching official docs, implementing, and citing the source URL in a code comment. Stack Overflow and training data are banned as primary sources.
Context engineering: The discipline of deciding what information to load into an agent context and in what priority order. The skill defines a five-level hierarchy: rules files, specs, relevant source, error output, conversation.
Hyrum Law: The observation that with enough users, all observable behaviors of a system will be depended on by somebody, regardless of what the spec says.
Beyonce rule: A Google engineering rule: if you liked it, you should have put a test on it. Referenced in the repo to enforce test coverage on any behavior you care about preserving.

Resources

Things they pointed at.

00:00linkaddyosmani/agent-skills ↗

00:24bookBeyond Vibe Coding (Addy Osmani) ↗

04:01linkagentskills.io ↗

04:37linkandrej-karpathy-skills (fan repo) ↗

04:55linkmattpocock/skills ↗

04:06linkHacker News discussion (376 pts) ↗

Quotables

Lines you could clip.

02:24

“If you cannot write the claim compactly, you have a vibe, not a decision.”

Standalone aphorism, no setup needed, lands hard in under 5 seconds→ TikTok hook↗ Tweet quote

03:18

“Confidence is not evidence.”

Four-word line, complete thought, universally applicable→ IG reel cold open↗ Tweet quote

04:01

“Skills are the how. Personas are the who. Commands are the when. Write your judgment once. Run it in any agent.”

Punchy three-part structure, clear architecture summary→ newsletter pull-quote↗ Tweet quote

05:01

“Senior engineering judgment, the thing you used to absorb over years of code review, now ships as a markdown repo.”

Thesis line of the video, captures the cultural shift in one sentence→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphor

Adi Osmani, fourteen years on Chrome's developer tools, now a director at Google, just open sourced how he works with AI agents.

23 markdown files, a quarter of a megabyte, 51,000 stars, And the biggest file in there, a skill that tells your agent to doubt everything it just did.

If you learned JavaScript design patterns from a book, it was probably his. His latest one is literally called Beyond Vibe Coding. So when this repo appeared in February, his 49,000 GitHub followers noticed.

27,000 stars before the press did. Someone submitted it to Hacker News in April.

Two points. Then Adi blogged about it in May, hit the front page with 376, and the repo never slowed down.

Today, it's at 51,000 stars, climbing almost 800 a day, number one on GitHub trending.

Open it up and there's no framework in sight. 23 skills, 22 covering the software life cycle from define and plan through build, verify, review, ship, plus one meta skill that roots between them.

The whole thing reads like Google's engineering handbook compressed into a quarter megabyte, Hiram's Law, the Beyonce rule, 100 line change sizing. If it mattered in a Google code review, it's in here somewhere.

Quick primer because this format is the actual story. A skill is just a folder with a skill dot m d file in it. A name, a description, and instructions.

It started at Anthropic, but it's an open standard now. Codecs, Cursor, Copilot, Gemini CLI, all load the same format.

And no, 23 skills don't flood your context window. Agents only read the one liner descriptions at startup. The full file loads when a task actually matches.

Now the headliner, doubt driven development. 16 kilobytes, the biggest skill in the repo.

Before any nontrivial decision stands, your agent spawns a second agent with a clean context to attack it. Claim, extract, doubt, reconcile, stop.

Three rounds maximum. The clever part, the reviewer never sees the original conclusion because handing it your answer biases it toward agreement.

And my favorite line in the file, if you can't write the claim compactly, you have a vibe, not a decision. But does any of this fix why agents actually fail?

Because they mostly fail from wrong context, not low intelligence. That's what the context engineering skill targets.

A five level hierarchy of what to load in order, rules files, specs, relevant source, error output, conversation. The standout rule, when sources conflict, the agent can't silently pick one. It has to stop and show you a structured confusion block.

Option a, option b, option c, and ask skill three, source driven development. Detect the exact version from your dependency file, fetch the official docs, implement, and cite the URL in a code comment.

The banned list of primary sources is the fun part. Stack Overflow, blog posts, and the model's own training data.

As the file puts it, confidence is not evidence. Can't verify it? The agent ships an explicit unverified block instead of bluffing.

Here's the part nobody covers. This one skill set loads into three different agents. For Claude code, a tiny plug in manifest maps the skills folder plus seven slash commands.

One command installs the whole thing from the marketplace. For Gemini CLI, the same commands are mirrored as TOML files and skills install natively.

And the open code port? It's a SIM link, 10 bytes pointing at the same folder. A seven kilobyte agents file ties it together.

Skills are the how. Personas are the who. Commands are the when.

Write your judgment once. Run it in any agent. Time for the honest audit.

There are zero benchmarks in this repo. No with and without comparisons for any of it.

And plenty of these files are standard practice wearing a lanyard. The git workflow skill triggers on, quote, any code change always. Hacker News said, the quiet part loud.

The model can drop any rule in your markdown whenever it likes. The genuinely new ideas live in maybe four of the 23 files. The rest is a very good checklist with a famous name on it.

Zoom out and there's a pattern. January. A fan repo distills Karpathy's Claude MD notes, a 172,000 stars, and he didn't even publish it.

February 3, Matt Pocock ships his skills folder, a 124,000. Twelve days later, Addy. This entire genre is five months old.

Senior engineering judgment, the thing you used to absorb over years of code review, now ships as a markdown repo. So worth installing? One command adds the whole plugin if you want it.

But the better move, and even hacker news agrees, is to treat it as reference, not a dependency. Steal doubt driven development. Steal the confusion block.

Steal interview me, which interrogates you until it's 95% sure what you want. Adapt them to how you work. Carpathi's notes became a file.

Pocock shipped a folder. Adi shipped a format. Whose workflow goes viral next?

Drop the repo in the comments, and subscribe if you want it audited.

The Hook

The bait, then the rug-pull.

A Google director put his entire AI agent workflow on GitHub. Not a blog post, not a talk -- 23 markdown files, 253KB, the kind of thing that hits 51,000 stars before most tech journalists notice the repo exists.

Frameworks

Named ideas worth stealing.

01:55model

Doubt-Driven Development

Claim
Extract
Doubt
Reconcile
Stop

A second agent with a clean context attacks every non-trivial decision. Max 3 rounds. Reviewer never sees original conclusion. Requires a compact falsifiable claim to start.

Steal forArchitecture decisions, major refactors, ambiguous specs -- any engineering decision that warrants adversarial review

02:31list

Context Engineering Hierarchy

rules files
specs
relevant source
error output
conversation

Five-level priority order for context loading. Key rule: when sources conflict, surface a structured confusion block and ask -- never silently pick.

Steal forAny CLAUDE.md or system prompt that specifies what context an agent should prioritize

02:58list

Source-Driven Development

Detect version from lockfile
Fetch official docs
Implement
Cite URL in code comment

Four-step protocol for grounding implementation in verified, version-specific documentation. Bans Stack Overflow, blog posts, and training data as primary sources.

Steal forAny agent that writes code against external dependencies -- especially useful for quickly-changing APIs

CTA Breakdown

How they asked for the click.

VERBAL ASK

05:29subscribe

“Drop the repo in the comments, and subscribe if you want it audited.”

Low-pressure ask after the value delivery, framed as a mutual exchange. No merch, no link in bio.

MENTIONED ON CAMERA

00:00linkaddyosmani/agent-skills ↗

00:24bookBeyond Vibe Coding (Addy Osmani) ↗

04:01linkagentskills.io ↗

04:37linkandrej-karpathy-skills (fan repo) ↗

04:55linkmattpocock/skills ↗

04:06linkHacker News discussion (376 pts) ↗

FROM THE DESCRIPTION

OTHER LINKSAlso linked in the description.

Storyboard

Visual structure at a glance.

open

hookopen00:00

who

contextwho00:19

repo tour

valuerepo tour00:56

doubt loop

valuedoubt loop01:55

context hierarchy

valuecontext hierarchy02:31

source rules

valuesource rules02:58

three agents

valuethree agents03:27

honest audit

valuehonest audit04:05

lineage

valuelineage04:37

steal these

ctasteal these05:08

outro

ctaoutro05:31

Frame Gallery

Visual moments.

open

Frame at 00:07 from A Google Director Open-Sourced His Claude Skills, 51K Stars on Github

Frame at 00:11 from A Google Director Open-Sourced His Claude Skills, 51K Stars on Github

Frame at 00:16 from A Google Director Open-Sourced His Claude Skills, 51K Stars on Github

Frame at 00:20 from A Google Director Open-Sourced His Claude Skills, 51K Stars on Github

who

Chat about this