« Back
Generated:
Post last updated:
much rather wake up, eat a coffee cake
Permalink Mark Unread

In an office building, in the middle of somewhere, a company sits.

It is building a product. An ambitious product. Software to end all software - a tool to integrate all the systems.

The team is (the expected amount) competent, the runway is short but exists, and the CEO has sold a dozen impossible features, two dozen possible but extremely difficult ones, some plausible ones, and one that the prototype can succesfully perform.

So business as usual for a fledgling product company.

And the company is having their - business as usual - weekly sprint planning.

Permalink Mark Unread

"Everyone's here, except Apricot?", Orange frowns. "He should be in office today. Anyone seen him?"

Permalink Mark Unread

"Not today, he missed lunch. But it's still five minutes until the scheduled meeting. He's never actually late."

Permalink Mark Unread

And about fifteen seconds later, Apricot stumbles out of a storage room of cleaning supplies. There are more and less subtle signs that he has just woken up.

"Good morning everyone!" he proclaims. "Orange, how did the call with the" - yaaaaawn - blinking - "eeehm, bank API provider go?"

Permalink Mark Unread

"Afternoon. Quite well, I'd say. We'll still need to negotiate an actual contract, but we're getting access to the sandbox environment today, so you can start integrating it immediately."

He flips open his laptop, navigating to the kanban board tracking high-level tasks. The section marked "backlog" has been expanding in a worrying pace. He nervously reorders some of the items on the bottom, before turning eyes to the "in progress" section. "Ok, lets go through the updates. Vanilla, you first?"

Permalink Mark Unread

Vanilla also has her laptop open.

"Morning! And yeah that works. Just a second." 

Keyboard noises. Vanilla quickly browses through the assigned tickets and checks what was marked completed last week.

"The new interface prototype is progressing. Completed navigable prototypes for three different approaches but we're not quite sure which approach to commit iterating on and for building the more complex features with."

"The next steps for the interface development are otherwise clear for this week but I'm unsure how last weeks plans compare to the new backlog items, might need to address that, and additionally choose one of the three approaches today."

Permalink Mark Unread

"We should probably have a meeting after the weekly to discuss the approaches, then. You and Raspberry, at least."

Orange proceeds on without waiting for any acknowledgement.

"Blueberry, you're next. How's the status page shaping up? The internal one, I mean."

Permalink Mark Unread

"Grafana configuration is now mostly done. I just got the http endpoint error and timing graphs to show on our custom status page as well."

Blueberry smiles briefly before checking the assigned ticket.

"I'm supposed to work on outage alarms next, but we haven't specified that at all. Do we actually want PagerDuty-style phone calls when something goes wrong? Seems a bit early for that, but it would be good to have the integration ready for when we'll have actual uptime targets. Not a priority right now, I'd say."

Permalink Mark Unread

"Agreed on prioritization. We need a way to notice if something's broken though. Maybe we should still send emails for those? Or we could have a status screen in the lobby?"

Permalink Mark Unread

Apricot thinks that the CEO looks a bit too eager to have a status screen.

"A Slack bot would probably be the best way to do this.", he interjects.

Permalink Mark Unread

Nod. "Let's do that."

Orange waits until Blueberry nods too, and the proceeds onto the next person.

"Raspberry? You we're looking at the interface, too?"

Permalink Mark Unread

The person in the room whose laptop contains the most stickers turns their gaze towards Orange.

"Yup definitely.

I think the multiple-prototype approach has already provided some info. I'd say it doesn't look reasonable to keep building on anything besides the chat-based solutions. We just can't fit the desired amount of breadth into a clean visual interface.

I'm kind of lost on what to fiddle with next. Could look into feature-specific interfacing?"

Permalink Mark Unread

"Fwiw I agree that the non-chat solutions probably aren't worth the hassle and even if we had a big UI team might still be worse than the chat-based ones. So I'd also vote for the chat based ones if we were counting. Dunno on what's most important to do next but you could maybe think through how interactive vs. noninteractive we want the chatting to be for multi-step operations. Maybe check out the recent advances in programming agents and think how that generalizes for our potential userbase?"

Permalink Mark Unread

"That sounds urgent enough to me. Orange, you agree with the chat / text-based UI direction?"

Permalink Mark Unread

"Ehh sure. We can go forward with that. By the way, I saw a cool computer use demo from a competitor last week, they had a... eeemm" *pause* "this thing in it. Sort of like a window that floated around and explained what it was doing. We should have that kind of thing as well."

Orange creates a new ticket. "I'm assigning it to you, Raspberry. Take a look at it when you have time to spare."

"Ok Apricot you next."

Permalink Mark Unread

Apricot opens up the real issue tracker. The one Orange doesn't know about. A keyboard shortcut brings up an automatically generated summary of all his ongoing work.

"Last week, we finished the remaining subtasks on cross-device authentication. We got QR-code, email magic link, passkey, SSO and plain old username-password with 2FA all working now. The login UI is still a bit ugly but fully functional and UX by itself is as good as it gets. Session revocation is also implemented, and also has bare-bones UI with just a list of devices and revoke buttons."

He reads the notes further.

"The email magic links are sent from a temporary address and sometimes go to junk mail. 2FA recovery code and password reset flows still need a bit of work. I need to think about the threat model for account recovery a bit more before I'm confident on actually implementing any of that."

mentally taps himself on the back That's a great excuse to take Wednesday off.

frown (a practiced one, and perhaps pretended too, he doesn't know anymore)

"Now that we have multidevice setup working, we need consider syncing between devices too. Ideally the user could control their PC from their phone, for instance."

A small pause, to see if anyone comments on that.

Permalink Mark Unread

No one does.

Permalink Mark Unread

"Ok that's all from me. Prune, you next?"

Permalink Mark Unread

Prune nods.

"The publish CD pipelines for are fully set up, except iOS. I'll set up that next. We'll need a notification on the web frontend whenever there's an update available, too."

Tap tap tap the table. A good way to remember things.

"The cloud-provided speech recognition is quite expensive. I'm going to experiment a bit with on-device support."

A quick glance at Apricot.

"The backend support for paying with a linked credit card is almost done. Just needs sensible limits on when it should confirm with the user. 'cot, could you handle that?"

Permalink Mark Unread

"Yep. We should probably make it dynamic using the bank account APIs, but for now I think we just make the default to be..." pause "maybe we should just ask this directly, when the user links the card?" frown "We still need a default, let's go with fifty. I'll handle that."

Permalink Mark Unread

"Thanks. That's all from me."

Permalink Mark Unread

"Ok thanks everyone. I gotta make some calls now, but I'd be quite interested to see the interface demo later today."

He looks expectantly at Vanilla.

Permalink Mark Unread

Nods.

"Yeah we'll get it prepared with Raspb."

Permalink Mark Unread

"I guess we can start from decrypting what the boss meant by the competitors thing, actually."

Navigates to competitors website.

Permalink Mark Unread

"Huh right these fellows at least get their marketing budget spent. 20 old world bucks that the feature doesn't actually exist in their prod."

 

 

Permalink Mark Unread

Click click.

"Not taking that one."

"Oh here it is." Play.

Permalink Mark Unread

Yawn. Blink blink.

"It's not doing anything new."

Permalink Mark Unread

"You say that every time."

"Look at this part though:"

Scrubs back a few seconds, points at the screen.

“See how it pops up a little sidebar when the agent starts touching the calendar integration? It’s showing the planned steps before and during the time when it does the stuff. That’s not nothing. Most agent demos just… go, and then you find out afterwards that it booked you a flight to London.”

Pauses the video.

“The actual overlay thing is whatever, Orange can have his floating window. But the preview-before-commit pattern for multi-step operations is exactly the kind of thing we were just talking about — how interactive vs. noninteractive the chat should be. This is one answer to that. You don’t make the user approve every step, but you show them the plan and let them veto.”

“We could proto that in like two hours if we scope it well. Show Orange that instead of the shiny narration thing and he probably won’t know the difference.“​​​​​​​​​​​​​​​​

Permalink Mark Unread

"Hmm. You're right yeah I can see something here now."

Opens up side be side Claude and Neovim in the product repo.

"Work on the ux and also demo prep for Orange, I'll lay the groundwork for this hacky proto and we can probs get it ready time."

Permalink Mark Unread

As others start discussing the next steps, Apricot gets not one but three smoothies from the fridge, quickly but carefully avoiding the ones with banana. He then flops onto a nearby beanbag chair, puts headphones on, fiddles a bit with his phone to get some music on, kicks off shoes, theatrically gurgles two of the smoothies, cracks his neck, adjusts position, downheartedly closes about twenty tabs of media more interesting than the work itself, and opens a new terminal window.

Time to get stuff done.

Start with the payment backend confirmation limits. User settings database needs a new column. The first idea is just confirm_payments_over INTEGER NOT NULL DEFAULT 50. Apricot mentally pinches himself. That will not do. Handling money is hard, any simple solution is always wrong.

What are the actual considerations about the limit, and storing it?

  • Currency - It's probably enough to have a single currency and convert as needed. But it has to be user-local currency.
  • Fractional denominations - Probably required for some currencies. YAGNI? Naah, better to support that immediately.
  • Default value - Should we differentiate between the field having initial default value, and user setting it to that value? Seems like a good idea. That will likely mean pushing the default value from db level to application level. Fortunately having a decent abstraction layer means a single source of truth can be retained.

A quick look at postgres docs shows that the built-in money type is plagued by locale settings. Apricot dislikes anything with locale settings. DECIMAL should do the job just fine.

So maybe the columns could be confirm_payments_over_amount DECIMAL and confirm_payments_over_currency TEXT. Using TEXT instead of enum or a fixed lenght string shouldn't be too expensive. Oh but now the names look all wrong, as the _amount doesn't seem like a suffix but a part of the name. It should be (((confirm payments) over) amount) but now it reads ((confirm payments) (over amount)). That's not good.

A small bikeshedding alarm is raised and quickly silenced by Apricot's brain. This is the actual fun part, after all.

Ok perhaps the name could be payment_confirmation_threshold_amount?. He quickly googles threshold just to make sure it was typed correctly. Such a long, ugly name. How about insignificant_payment_amount? A bit better maybe, aesthetically. Less descriptive. Increaed job security not funny. Thesaurus could have some synonyms for this. Perhaps minor? Nah. Ok just the simples thing, then purchase_confirm_amount and purchase_confirm_currency. That will do.

It would be nice to have both in the same field, to make sure they're not mutated separately. Not doable on the db level, but on Rust level, sure. Actually, perhaps this whole thing could be a reusable CurrencyAmount struct. Yes. That sounds good, we'll definitely need that in other places too.

Ok what about the defaults, then? I want to show the default value to the user. I don't want to hardcode a big list of defaults for each currency. And I want to have round numbers in each currency. Perhaps take 50 USD as the base value and convert to users currency, and round it to one significant digit? That sounds good, but we cannot just recompute it each time user does a payment, as the default needs to stay the same. This is getting complicated; can we ever change the default? Users might rely on the fact that it stays the same. Oh well, I think we need to prompt the users directly about that.

Ok so new attempt. Keep the amount and currency non-nullable, and populate them on user account creation. Keep a separate boolean flag about if the user ever set the value. That works, good.

Apricot writes the backend code for it. On user creation hook, he calls a currency conversion API and rounds to a single digit precision. Surely that'll work for all values just fine, it cannot change the value more than about 5%. Then he adds an API endpoint to fetch the value. This is rather simple, just return the data directly from the db.

After thinking for a while, he adds some check constraint on the db columns. The limit value must be positive, and the currency string non-empty.

Then for the change API. Is any validation needed? Likely not more than the check constraints above. Simple SQL update statement will do. And an insert to the audit events table too, with the conversion rate as well, in case there's ever a bug or lawsuit or vague mental gestures.

Ok then the settings UI. He hates frontend, but fortunately this is rather simple. What decimal separator is used? An input with type "number" and step of 0.01 should actually handle that just as well. Does it need a reset-to-default button? Hopefully not.

It's still missing tests, but time to commit first. New branch: feature/payment-confirm-limits. Pre-commit hook complains about formatting, so cargo fmt and re-run.

Then the test, This is something that an AI agent can be trusted with. After all, he has already manually checked that it works. He prompts claude to generate a couple of tests for the backend. The AI attempts to refactor the code so that the API call to currency conversion service could be mocked. Apricot doesn't like that - it makes the code less readable. But then again, it's an external API that requires an API token to access. We should probably wrap that ourselves anyway, for caching and fallover. Ok fine, proceed with the refactor.

Then some tests for the frontend too. These are always a mess, but he plugs (or rather lets claude code to plug) a couple of lines to the E2E test suite for this.

Ok that's all. Phew. Commit, push, new pull request, prompt claude to write PR description. He waits for the CI to color itself green and clicks "request review" from Prune.

That took almost an hour. Maybe he could be done for today? He'll head to for a late lunch, at least.

Permalink Mark Unread

Vanilla goes deep in thought over the prototype she just promised to build.

It's generally a hard problem to solve well. LLMs don't actually formulate or execute their plans in discrete, numbered steps, which is what the interface would definitely prefer. The fact that the planning and plan execution is disorganized is a large reason why the LLMs end up just .. buying tickets to London.

Oh well, time for the classic; a mediocre solution, for a mediocre problem. The model ("meta model") that they use to sanitize sanity-check take an "educated" guess at the personal assistant plans not containing anything actively harmful for the users goals (like accidentally emptying their mailbox; the product has to provide some additional value to beat an out of the box *claw solution), can also be used to take a guess at the plans steps.

Uhh actually what do the frontier labs do about this... idle clicking on Claude chat UI and ChatGPT UI. Right they use their chain-of-thought summarizer maodel to provide a sentence long description of the current thing the AI is doing. They could also do that but then the steps might not actually correlate between the planned step names and the summarized step names. Ugh.

The boss-mans not going to celebrate if the prototype got blocked 5 minutes in by a tradeoff decision...

Guess the meta model could just check against the list of steps every few seconds or so and then update based on what the current state from the CoT of the executor agent looks like. Perhaps have a summarizer in the middle, the summarization was in the backlog anyway. But since it's not done it needs to be skipped today.

Uhh so it's going to be:

- After the execution plan is generated, prompt the model for a simple step-split of the plan

- Periodically feed the execution plan + some of the latest state of the execution agent CoT to the meta model for the meta model to output which step of the plan is being currently executed... preferably the meta model should only output a step id here. So add step-id (preferably descriptive, like price-compare or confirm-purchase) generation to the first part.. ensure they are unique after they have been retrieved from the AI.

Fine enough for a prototype, probably.

And now it's time for some programming. 

Switches to Claude Code terminal. Let Claude Code make it's own plan, see if something is better than in what she came up with, then either merge or go with Vanilla's plan, and let cc write most of the code.