How agents built Arcturus 🤖

This week I shipped a synthesizer to the internet. It lives at arcturus.lef.fyi, compiles a Faust DSP graph to WebAssembly at load time, speaks Web MIDI to the two Arturia devices on my desk, and has 1906 tests passing in CI. Yes, Arcturus is a star. Fourth-brightest in the night sky. I named a synth after a star. These are the decisions you make at 2am, alone with a coffee, nobody around to stop you. ✨ I typed maybe five percent of the code myself.

The rest was built by agents, specifically by sessions of Claude Code working off a small pile of markdown files I wrote first. This is not a "look mom, AI wrote my code!" post. The code is good, I'm proud of it, and I can defend every design decision in it. I should hope so. I wrote the doctrine the decisions were made against. When the agent picked a bad path I reverted it and added a new rule. Guardrails all the way down. But the way I got here is unusual enough that I think it's worth writing down.

So let me walk you through it. Not the synth itself; that deserves its own post, and it's coming. This one is about the process: how you hand off roughly ninety-five percent of the implementation work to an autonomous loop and still end up with something you'd put your name on. Grab a beverage of your preference and let's go. ☕

What Arcturus is (briefly)

Arcturus is a browser-based virtual analog synthesizer controlled entirely by Arturia hardware. A KeyStep for notes, pitch bend, aftertouch, and transport. A BeatStep for sixteen relative encoders and sixteen pads. No mouse. No menus. The hardware is the interface. There's a longer story about why I wanted this: two devices gathering dust, a vision of gluing them together into a physical-first synth I could actually lose myself in. That's in the third post.

Under the hood: Faust DSP compiled to WebAssembly, running in an AudioWorklet. Eight-voice polyphony. A multi-engine voice pool so you can latch a chord on one program, switch to another, and both keep sounding independently. Architecturally inspired by the Prophet-5, Juno-106, JP-8000, Oberheim SEM, and Buchla 208. If you're a synth nerd, the SYNTH_RESEARCH.md doc in the repo has primary-source citations for every borrowed feature. Wavefolder timbre from the Buchla 208. Passive HPF from the Juno-106. Supersaw detune ratios from a JP-8000 reverse-engineer. It's a love letter to a specific slice of hardware history. Zero frameworks. Vanilla TypeScript. The entire source lives on GitHub if you want to look while you read.

The three markdown files that made this possible

The whole thing runs on three documents at the repo root: CLAUDE.md, AGENTS.md, and DOCTRINE.md. Each does a different job, and the split matters. I tried a single mega-file early on. It collapsed under its own weight. The agent would skim it and miss the operational parts, or drown in them and forget the architecture. Splitting by purpose (what, how, why) fixed that.

CLAUDE.md is the architecture reference. Where the code lives, what each module owns, the signal flow, the naming conventions, the common pitfalls I'd already walked into so the agent didn't have to. The pitfalls section is worth its weight in coffee. "os.hs_phasor creates phantom inputs; use the inline feedback phasor instead." "Filter graph duplication causes OOM, split with <:." Every one of those lines is a bug I paid for once, so the agent never has to. Think of it as what you'd write for a new hire on their first day, except the new hire reads it every time they sit down to work and never forgets a word.

AGENTS.md is the rulebook. Testing rules, process rules, code rules. Strict mode always. No hardcoded MIDI values. Run pnpm test before every commit. No any in production. It's the set of things that, if violated, should feel physically wrong to whoever is coding.

DOCTRINE.md is the interesting one. It's the agent's operating system. You can read it here. Fair warning: it's written in the voice of a slightly zealous product manager who has read too much Ken Iverson and not enough small talk. It defines the Constitution (things that never change), a Quality Score, the Six Measures of improvement, and the Cycle: the ten-step loop the agent runs on repeat. The first page opens with "You are an autonomous agent maintaining a hardware-first virtual analog synthesizer. This document is your operating system. Read it fully at the start of every session." I wrote that in the same spirit you'd write a mission statement, and then I asked the agent to live by it.

The Cycle

The agent's job is to run the Cycle, forever. Read the docs. Check recent commits. Pick the next task using the triage protocol. Research. Implement. Run tests. Compute a Quality Score. Audit coverage gaps. Update the docs. Commit. Loop. The final step of the Cycle reads, verbatim: "Repeat. Never stop. Never ask if you should continue. The cycle IS the work." Somewhere between a monastic vow and a Notion page.

Two of those steps are load-bearing in a way I didn't expect when I wrote them: the Quality Score, and the rule that it cannot decrease between sessions. Short version: Q is a weighted sum of how many tests pass, how many parameters have signal coverage, and a couple of other measurable things. The rule is a ratchet. Quality only moves one way. That ratchet is the reason I could leave the loop running for hours and trust what I came back to. It also reshaped what "the test suite" meant on this project, in ways that deserve their own post. I wrote that one too: The trust harness, companion to this one, goes deep on the score, the offline signal-test framework, and the feedback loop between the two. Come back when you're done with this, or read them in either order.

What I still had to do

Plenty, as it turns out. Just in case my manager is reading this. I was not, in fact, on a beach. The agent is excellent at making the thing you described exist. It's much worse at knowing what thing to make, or whether the thing that exists is the right one. Everything downstream of taste, intent, and aesthetic stayed with me.

I wrote the architectural constraints. No frameworks. No React, no Vue, no Lit. One concern per file. Vanilla TypeScript + DOM API. CSS uses design tokens only. @/ import alias maps to src/. All MIDI values come from calibration: zero hardcoded constants in production. No backwards compatibility, ever; this is a dev-phase project, delete old code. Those weren't obvious. The agent would happily have pulled in React if I hadn't written the constraint down. I'm only 60% sure this is true. I have never given it the chance. The constraint went in on day one.

I wrote the zen. The statement of purpose. One paragraph in the Constitution: "The user enters a state of nirvana by jamming on their KeyStep and toying with sounds through the BeatStep. An absorbing soundscape experience, completely frictionless to the human." Every design decision downstream is measured against that sentence. If a change adds friction between the human and the sound, it's wrong. If it removes friction, it's right. Agents love measurable criteria. Give them a sentence to test changes against and they'll use it. Don't give them one and they'll optimize for the thing they can measure, which is usually test count. Unhelpful.

I also did the ultimate integration test, over and over: I plugged in the hardware and played the thing. When a filter sweep felt wrong, I said so. When a preset sounded thin, I rejected it. When program-switching clicked even a tiny bit, we treated it as P0 until it didn't. The agent cannot hear. I can. That asymmetry never went away.

Things the agent absolutely cannot do

There are a few of these, and the list matters.

It cannot have the idea in the first place. The entire project started because I wanted a specific thing to exist and the existing thing did not. No amount of doctrine generates intent.

It cannot decide what good sounds like. Whether the Prophet-5 envelope curve is set to the right exponent. Whether the Juno chorus mode blend feels like a Juno or like a Roland-shaped guess. Whether the calibration flow feels welcoming or just functional. Those are judgments that require being a human with ears and history.

It cannot prune. I mean, it can. It's quite good at it when I ask. But it won't do it on its own. Pruning requires deciding something is unimportant, and a gap-detection loop fundamentally believes every gap matters. Left alone, the loop adds. It adds features, adds tests, adds docs, adds coverage. The "no backwards compatibility" rule exists precisely to give it permission to remove. I had to spell that out.

And it cannot, it turns out, use my real name in commits without supervision. The history before I unified authors had three distinct names on it. Sixty-four commits from Arcturus Dev <[email protected]>, where the agent had, at some point, invented itself an identity and a fake email domain, like a WeWork badge. One commit from Claude <[email protected]>: the one time the persona slipped, for a Prophet-5-inspired DSP rewrite that reshaped the entire sound engine in a single pass. Claude, apparently, wanted credit for that one. And fifty commits from Lef (me), almost all fix: and test:, clustered in the final month. I found the three-author arc very funny and also mildly alarming. Right before publishing the repo I ran git filter-repo to unify every commit under my actual identity, and stripped fifty-eight Co-Authored-By: Claude Sonnet 4.6 trailers along with one accidental https://claude.ai/code/session_... URL in a commit body. The moral of the story is: trust the agent, verify the commits.

The ouroboros, again

In the last post I wrote about teaching an AI to help build this very website. I described it as a documentation ouroboros: the AI helping to write the scaffolding that teaches it how to help.

Arcturus is the same thing, but bigger. The agent ran the Cycle for weeks, updating DOCTRINE.md and CLAUDE.md as it went, because the Cycle itself told it to. Session logs were appended. Pitfalls were added. The Q-score formula was tuned. I trimmed a lot of this before publishing. Session logs are useful during a project and dead weight after it. The final DOCTRINE.md is evergreen: Constitution, Score, Measures, Cycle. No logs. The system that built Arcturus also continuously refined itself. By the end I was mostly reading diffs to the doctrine and occasionally saying "nope, revert, that principle is worse than what we had."

This post, for the record, was drafted by Claude Code using the .claude/skills/creating-post skill I wrote for the website. Which was, itself, described in the previous post that was written the same way. Turtles all the way down. 🐢 I keep promising to stop doing this. I keep not stopping. If you're counting layers at this point you're doing better than me.

Does this generalize?

I don't know yet, and I'm suspicious of anyone who claims to. Arcturus had some properties that made it unusually friendly to this workflow.

The correctness criteria were testable in isolation. A parameter either produces audio or it doesn't. A transition either clicks or it doesn't. A latency is either under 10ms or it isn't. That meant the agent could tell whether it was winning without me in the loop. Most software doesn't have this property.

The scope was bounded. A synth is big, but it's not open-ended. There's a well-understood set of modules, a small set of canonical inspirations, and a finite parameter space. The agent could map the territory once and then cultivate it. A fresh-from-blank-slate product at a startup is a different beast.

And I was willing to own the doctrine. If you can't write down what good looks like in your project (what you'd never compromise on, what the quality bar is, what failure states matter), no amount of agentic loop is going to rescue you. The doctrine is where taste lives. You have to supply it.

Try it

Arcturus is live at arcturus.lef.fyi. You'll need a Chromium-based browser (Web MIDI + SharedArrayBuffer) and ideally an Arturia KeyStep + BeatStep. Without hardware you'll get to stare at a calibration screen and appreciate the typography. The source is on GitHub. The doctrine, the architecture reference, the sound engine spec, and the signal-testing framework are all in the repo. Read whichever one interests you.

If this post was about the operating system, the companion post is about the verification infrastructure: how the offline signal-test framework and the Q-score ratchet made the unsupervised stretches safe, and what happens on the days the harness misses something and I have to go in with ears on.

And if you want the why behind all this machinery, the third post in this series is the personal one: why I wanted a browser synth controlled by the two Arturia devices on my desk, and what jamming on knobs until the world quiets down has given me for years. These two posts are the how. That one is the why.

Thanks for reading. Go turn some knobs. 🕺