Talking to a codebase: notes from a week of voice sessions
A small experiment: one of us spent a working week using Sesori with the keyboard off. Not strictly off — there’s still typing on the laptop side when you’re sitting at a desk — but every interaction with the assistant happened by speaking. Every prompt, every approval, every “no, undo that, try the other thing.”
These are the notes from that week.
Day 1: the symbols problem
The first day was rough, in the way any new modality is rough. Voice transcription is genuinely good now, but it does not know that when you say “underscore,” you mean _, not the literal word underscore. It does not know whether “open paren” should be a paren or the words written out. We spent the first morning correcting the assistant’s interpretation of our own dictation more than actually working.
The fix wasn’t a technical one. It was just: stop dictating code. Let the assistant write the code. Use voice for the things voice is good at — intent, direction, evaluation — and let the model handle the syntax.
By lunchtime on day one, the rhythm had shifted from
“function space get user space open paren id colon string close paren…”
to
“Add a
getUserhelper that takes an id and returns the row from the users table. Throw if not found.”
The second version is shorter to say and easier to think about, and the result is the same.
Day 3: the rhythm
By Wednesday, voice felt natural for a particular slice of the work — the slice where you know what you want and you’re directing a more-or-less competent collaborator who can do the typing. The assistant writes a draft. We read it on the screen, decide whether it’s right, and either approve it or describe what’s off. “That’s good, but pull the validation into its own function.” “That’ll work. Add a test.”
The eyes still do all the reading. The voice does all the steering.
This is, looking back, the same shape as how a senior engineer often works with a junior one in a pair-programming session. The senior isn’t typing. They’re saying try this, no, undo, yes, that, while the junior drives the keyboard. We had accidentally rebuilt that dynamic, with a model in the junior seat.
The surprising thing
Voice forced us to be more deliberate before speaking, because we couldn’t do the thing we usually do — start typing a sentence, see how it comes out on the screen, hit backspace, restructure mid-thought. With voice you commit. So we found ourselves pausing for a second before each prompt, working out what we actually wanted, and then saying it.
The prompts got better. Significantly better. The same instinct that makes rubber-duck debugging work — the act of putting the problem into words — applies when the duck is an AI assistant and you’re saying the words out loud.
Where it broke down
Two places, both unsurprising.
The first was dense refactors. When we needed to move five things between four files in a specific order, the verbal description got tangled almost immediately. “Move the validation from user.ts line 14 — no, the second occurrence — into the new validators module, but keep the import in user.ts because the auth flow still…” It was easier to describe by pointing, which voice does not do.
The second was naming. We are particular about names, and the difference between getUser, findUser, loadUser, and fetchUser is real to us in a way it isn’t to the assistant. Negotiating naming over voice — when there’s no easy way to say “no, italic fetch, with the connotation of going somewhere to get it” — was harder than it should have been.
So: useful?
Yes, in a specific way. Voice is not a replacement for typing. It’s a complement to it — particularly good for the direction phase of work, less good for the fine-tuning phase. The combination of voice for intent and a screen for review is much better than either alone.
That’s the working theory after one week. For a complete guide to setting up OpenCode on your phone and using this workflow in practice, see OpenCode mobile.