Shipped Solo
Build and ship a real software product as a one-person AI studio, in 8 weeks.
What you walk away with
By Demo Day you have a live product reachable by strangers: an App Store listing, a site with a buy or download, a public link. Something a person who has never met you can actually get.
You also have the phase artifacts that prove you ran the method: a calibration map, a competitor and pricing analysis, a spec, an adversarial QA report, an interrogation log, user-test findings, and a one-page honesty-and-transparency statement.
How it works
The format. Two sessions a week, about two hours each, run as a studio: roughly 45 minutes of concept, then 75 minutes of hands-on building and critique. You do the real build work between sessions. Class time is for teaching the move, then working and critiquing.
Who it is for. Builders: people who want to ship products, not casual AI users. You do not need to be an engineer, but you do need to be willing to learn the under-the-hood vocabulary of whatever you build. The method is for the operator who is not an expert in every domain they ship into.
What you need. A capable frontier-model subscription (for example Claude) with access to a light, mid, and heavy model tier; a real development environment for your target (Mac or iOS, web, and so on); and a genuine personal pain point to build against, which is the motivation engine.
Curriculum
Unit 1: Foundations
Week 1-2 · Sessions 1-4. The mindset, the operator, and the studio mechanics, before anyone builds.
01The motorcycle: scope, not speed
AI as a motorcycle, not a faster bicycle: it changes the composition of what one person can attempt, and it changes what a mistake costs. The framework as protective gear for the human. AI fails like a person, not a calculator; the false-zero baseline; the coworker model. (Thesis Section 1; 2.1-2.2.)
Thesis Section 1 and 2.1-2.2. Kusano et al. (Waymo vs human-driver injury benchmarks) as the honest-baseline case.
"Calculator vs coworker" log: capture five AI outputs over the week, mark which you trusted blindly and which you sanity-checked, and write why. Bring two or three real personal pain points to Session 3.
02Augmentation, and the evaluation muscle
Replacement fails; augmentation is the frame. The genius-intern model. Why the method is for builders: error compounds across multi-step work, and frontier models clear hour-long expert tasks only about half the time. The eroding ability to tell good output from merely plausible output. (Thesis 2.3-2.5.)
Thesis 2.3-2.5. Dziri et al., "Faith and Fate" (compounding error); Kwa et al. / METR (task-length success rates); Forrester Predictions 2026 (layoff-then-rehire). Optional: Zohar et al., "Against frictionless AI."
Take a multi-step task you would hand to AI. Estimate a per-step success rate, compute the compounded failure over the chain, and name where an undetected failure would land.
03The operator: four skills and the free critique
What actually qualifies the operator. The four skills: role fluency, signal discrimination, the questioning skill, and observing a failure the person cannot see. The free questioning AI allows (zero social cost), and the art-school critique that makes it usable. Why the method transmits. (Thesis Section 3.)
Thesis Section 3. Takeuchi & Nonaka, "The New New Product Development Game," and the Scrum Guide (cross-functional roles).
Role self-audit: map your strengths and gaps across the roles a product needs (design, PM, engineering, marketing, research). Decide which roles you will staff with AI. Bring your strongest pain point as a candidate product.
04Working with AI: the studio as an organization
Agents as roles (threads you treat as colleagues), standing vs disposable roles, hub-and-spoke, context hygiene, model tiering, sycophancy and its limits, and the operator's own failure modes (discipline decaying under deadline). (Thesis Section 6.)
Thesis Section 6. Liu et al., "Lost in the Middle," and Chroma's "Context Rot" (why scope context); Sharma et al. and Ibrahim et al. (sycophancy, and that people choose it).
Stand up your studio: create your role-threads (lead engineer, a strategist, an adversarial QA you will spin up fresh), write your anti-sycophancy custom instructions, and decide your light, mid, and heavy model tiers.
Unit 2: Calibrate, Research, Spec
Week 3-4 · Sessions 5-8. The first three phases. By the end, every student has a validated wedge and a spec.
05Phase 1: Calibrate
Calibrate yourself against the AI, not the AI against you: test it where you can grade the output, so you learn its edges before trusting it where you cannot. The cross-domain inference and its named weakness. (Thesis 5.1.)
Thesis 5.1. Anthropic, "Demystifying evals for AI agents" (eval culture, the nearest prior).
Run a calibration test in a domain you know cold: try the task naively, then decomposed to mirror your expert process; document where it failed and where it matched your own work. Produce a personal reliability map.
06Phase 2: Research (the sieve)
Up-front desk research to find a wedge before spending a build token. Don't lead the witness: ask so the AI can tell you no. The death stages (shipped, validated, explored) and why most ideas should die here, cheap. (Thesis 5.2; 8.1; 8.5.)
Thesis 5.2, 8.1, 8.5. Ries, The Lean Startup (MVP and validated learning); Marmer et al., Startup Genome (premature scaling).
Throw your idea at the desk and make the AI argue against it. Produce a real competitor matrix and pricing matrix. Reach a go/kill call.
07The kill, and locking your product
Kill discipline: kill fast when the obstacle outweighs the value, and because you can always revive. The product hunger games. Mapping deaths to obstacle types (compute, payoff, scope, opportunity cost). (Thesis 8.4-8.5.)
Thesis 8.4-8.5. Revisit your own matrices.
Present your wedge for brutal-but-kind critique (the art-school move). Defend it or kill it. Lock the product you will ship.
08Phase 3: Spec (groom it like a Jira ticket)
A spec composes three things the naive prompt collapses: the data, the intent, and the verification (how the worker proves the output is right). The spec-writer role and model tiering for it. Named priors: Spec-Driven Development and eval-driven development; the operator position the method keeps. (Thesis 5.3.)
Thesis 5.3. GitHub, "Spec-Driven Development with AI" (the open-source toolkit); revisit the Anthropic evals guide for the verification half.
Write a full spec for your MVP's core flow with data, intent, and verification separated. Have a cheap model groom it into the directive. Run it.
Unit 3: Build and Interrogate
Week 5-6 · Sessions 9-12. Students build their MVP and learn to interrogate the tool.
09Phase 4: Build MVP (core flow first)
Build only the core user flow, get it right, then add features one at a time. Complexity compounds; the spaghetti trap; default only when certain, otherwise a toggle. Incremental building preserves the mental model that lets you debug. (Thesis 5.4.)
Thesis 5.4. Revisit Dziri et al. (why complexity decays output).
Build your day-one MVP: the single core flow, nothing else, working end to end.
10Hub-and-spoke and adversarial QA
The lead engineer as a manager that writes the briefs that spin up its own reports; a shared source of truth (decision records plus a handoff doc); adversarial QA as a fresh instance that audits cold and writes a prioritized risk report, no code. (Thesis 6.1-6.2; QA origin in 5.4.)
Thesis 6.1-6.2.
Run a clean-instance adversarial QA audit on your MVP. Get a prioritized risk report. Patch the criticals.
11Phase 6: Interrogate
Two modes: interrogate to learn (no error implied) and interrogate to catch. The expertise-keyed catch. The questioning technique that forces the AI to defend the options it rejected ("why not the others"). The automated form, Chain-of-Verification, routed to the Phase 4 QA role; the human does it live. (Thesis 5.6.)
Thesis 5.6. Dhuliawala et al., Chain-of-Verification; Shneiderman, human-centered AI (a boundary marker).
Interrogation drill. First, learn mode: take a step in your build you cannot fully follow and go command-by-command. Then catch mode: take a decision that touches a domain you actually know and find the context-blind error.
12Build lab and integration
The six phases are one move applied at six breakpoints: decompose into chunks the intern cannot fumble, give explicit phasing and evaluation criteria, stay in the loop. The WhisperPad build history as a model of incremental layering. (Thesis Section 4; 7.2.)
Thesis Section 4; 7.2.
Open build lab: add one or two features incrementally on top of your core flow without breaking it. Instructor and peer critique. Goal: a build ready to put in front of users.
Unit 4: Test, Launch, Ship
Week 7-8 · Sessions 13-16. Students test with real users, decide harden / add / kill, launch, and ship.
13Phase 5: Test (real users, fast)
Get it in front of real users fast; your own heuristic evaluation is fast and blind to your accumulated familiarity. The researcher's reflex: hear confusion and conclude "that is a real problem," not "the user does not get it." The modes arc as a worked finding. Document research only when an audience will act on it. (Thesis 5.5; 7.3.)
Thesis 5.5, 7.3. Medlock et al., the RITE method; Amershi et al., Guidelines for Human-AI Interaction.
Run a small user test (about five people, mixed technical background); change the product between sessions. Capture the friction you had gone blind to.
14Iterate, kill-or-harden, and the worked case
The harden / add / kill decision, made on evidence. The full WhisperPad arc as the worked example: injury to MVP to the modes finding to the App Store rejection to the two-SKU fork to the reversal. (Thesis Section 7.)
Thesis Section 7 (the full case study).
Decide: harden, add, or kill, and act. If hardening, ship the fix. If killing, write the kill rationale and the condition that would revive it.
15Launch and studio economics
Distribution as a strategic choice (the two-SKU fork; store reach vs direct margin); the mechanics of shipping yourself (notarization, payments, updates, licensing); narrative and press as the lever a solo operator has; the numbers: cost structure, break-even, and undercutting a funded incumbent on a focused use case. (Thesis 7.4-7.8; 8.6-8.7.)
Thesis 7.4-7.8, 8.6-8.7. Optional: Apple's App Store Small Business Program; the Wispr Flow funding coverage.
Stand up a real distribution channel where a stranger can get or buy your product. Set your pricing. Draft your honest launch narrative.
16Demo Day: ship it
AI transparency and the honest stance: disclose how AI was used, account for sycophancy, and be precise about what is demonstrated versus claimed (the n=1 honesty). What one operator can and cannot prove. (Thesis Sections 9-10.)
Thesis Sections 9-10.
Ship. Each student presents a live product reachable by strangers, tells the build story, states honest claims (demonstrated vs not), and attaches a one-page transparency note on how AI was used. Public launch.
Assessment
Students are assessed on the method, not on commercial success. The graded artifacts:
- Calibration map (S5)
- Competitor and pricing analysis with a go/kill decision (S6-S7)
- MVP spec with data, intent, and verification separated (S8)
- Working MVP and an adversarial QA report (S9-S10)
- Interrogation log, both modes (S11)
- User-test findings and a harden / add / kill decision (S13-S14)
- A shipped, live product reachable by strangers (S16)
- A one-page AI-transparency and honest-claims statement (S16)
A defensible kill, fully documented, is a passing outcome. Shipping something nobody can use because you were the only person who could operate it is not.
The compressed variant
For an intensive, or a cohort arriving with an already-validated idea, the course folds to 12 sessions over 6 weeks: merge Sessions 1-2 (foundations), 6-7 (Research and the kill), 9-10 (Build and QA), and 13-14 (Test and iterate). This keeps every phase and the ship, but assumes students move faster between sessions and arrive with a pain point already in hand. The risk is the Test phase, where compressed cohorts most often skip the discipline the method exists to enforce, so it is protected.