This post has the following content warnings:
Programming in a sane world
Permalink

Kalmin pushes their hair behind their ear with an annoyed huff. The computer in the corner hums. Just a few more tests they need to get passing, and their implementation will be complete. And it's only — they glance at the clock in the corner of their monitor — 22 in the morning. Fuck.

They push the curtain aside, and peer over the dark landscape. Already, they can see a pre-dawn brightness creeping across the sky.

Total: 23
Posts Per Page:
Permalink

"Come on, Kalmin," Regi encourages. "We need to go to bed. Don't make me take the lead."

She points out that the proxy server is not more important than their health.

Permalink

Kalmin hits the key combination to lock their computer, and shrugs off the blanket that they'd wrapped around their shoulders.

"Fine," they agree. "We can finish up tomorrow."

They stretch, their elbow popping. They get up and walk over to the sink to brush their teeth, but now that their flow is broken, they can feel the fatigue they'd been ignoring. They silently prompt Regi to take over, and start to doze.

Permalink

Regi brushes their teeth with precise, efficient movements. She rinses their toothbrush, and sets it in its designated location with the correct orientation. She strips off their robe and throws it into the laundry hamper in one smooth motion, and then slips into bed.

She adjusts the pillows to support their neck correctly, and then closes her eyes and relaxes all of their voluntary muscles and focuses on their breathing. They are asleep in 30 seconds.

Permalink

It is Emmy who rises. She lets off a huge yawn, and throws off their covers. She pulls the blinds open, and lets the late morning sun warm her skin.

She throws on a skirt, grabs their computer, and pops down to the cafeteria for breakfast, because Regi will be stern at her if she doesn't eat. She grabs a bowl of spicy oatmeal and a bagel spread with cream cheese, and reads flash fiction while she eats.

Permalink

After breakfast, she heads back up to their room. She slots their computer back into the dock, and switches back to their IDE. Apparently the fuzzer found a failure overnight.

She pops open the debugger, and runs the test case through to the crash.

"Huh. A memory allocation failure in the request parser?" she says to herself.

She switches back over to the parsing code in question, and highlights all of the places where it allocates.

"Kalmin, you're an idiot."

Permalink

"What did I do?" Kalmin asks, still waking up.

Permalink

Regi prompts Emmy with a wordless reminder to be kind, and then fades back again.

Permalink

"There's no limit on how many headers the client can send," she says, pointing at the part of the code in question. "So this loop is unbounded. And since it allocates more space from the arena for each decoded header, it can use all of the assigned memory."

Permalink

"Oh, I see. Yeah, that's a problem. I don't think there's anything we can do in that case to continue processing, though. We can't drop headers because that would alter the request," they point out.

Permalink

Emmy squints at the code.

"Can we re-architect this bit to stream the headers? So that they'll be passed on to the upstream server with a constant amount of local storage?"

Permalink

Kalmin shakes their head. "No, because the standard doesn't guarantee how soon routing-related headers need to appear. If the routing header comes at the end, we have to buffer the whole header block."

Permalink

Emmy snaps her fingers decisively. "Then I agree, there's nothing to do except return an error."

She hits a key chord and defines this failure to be expected. When the parser runs out of memory, the failure will just bubble up to the protocol layer, which will return an error code and close the connection. Their IDE autofills a new debugging metric so they will be able to check if this happens in the real world, and then re-runs their test suite.

Permalink

She tabs over to their list of test cases, and jumps to the first failing one.

"What's next, header compression?"

Permalink

"Yeah, that sounds right to me. I was in the middle of reading the compressor documentation when we left off last night," Kalmin agrees.

Permalink

They work on the rest of the implementation together, and then post it publicly. It fits their use case perfectly, but maybe someone else will find it useful too.


 

Permalink

Approximately 0.0014 of requests to their server farm fail. This is normal — there are plenty of sources of malformed, malicious, or just plain broken requests on the Network. But it would be nice if it were lower, and Uþenor doesn't have any urgent reliability fires to put out at the moment.

He pulls up their aggregate fleet statistics. At any given time, 0.01 of the servers have debug logging for any given component turned on. High enough to get useful statistical aggregates, low enough not to negatively impact throughput or latency very much.

He runs a query — which debug trace events are most strongly correlated with failures anywhere in the protocol stack?

It's a query he runs about 31 times a year, but the answers are always changing as their fleet and the software running on it evolve.

Today, the answer comes from their authentication proxy — a tracepoint called "client sent more headers than there is available memory". That seems straightforward enough.

Permalink

In their fleet, this proxy sits between them and an exterior API, adding in their authentication information. So all the 'clients' should be their own code, connecting out. Uþenor opens their code and finds the API definition, and then asks the IDE what the maximum bound on the size of an outgoing request is.

His computer churns for a moment, querying and aggregating all of the places that call into this API. After a moment, the IDE gives its answer: +∞.

Uþenor frowns. He clicks into the answer to see the corresponding proof tree. It looks like most of the headers have a fixed maximum size, but there is one header where the computer was unable to establish an upper bound. He clicks through the layers of the derivation, and eventually ends up at the customer name field of the database.

He blinks and backs up a few steps to read more carefully.

Permalink

It looks like one of the headers they send embeds the client's name, for auditing purposes. That's straightforward enough. But the customer name field is unbounded in size, so a customer could theoretically enter a name long enough to cause memory exhaustion on the proxy.

If one customer's requests were failing consistently, though, he would have expected that to trigger an alert.

He checks what the longest name in the database is. It is "ᛗᛟᚾ-Ekrishameni Joint Projects Legacy Sponsorship Fund (Distant Island Accountancy Division)". He checks to see whether requests made on their behalf are failing.

Not all of them are, but their requests are associated with a higher failure rate than other customers. About 0.24 of their requests fail. The service generates a lot of retries, and things grind on, but he's definitely on to something.

Permalink

The question is why their requests are failing intermittently. Perhaps there's another variable-length header which sometimes goes over the limit?

He switches back to looking at the proof tree, and this time asks the IDE to also show the minimum value. There are a handful of headers where the maximum and the minimum differ. Mystery solved, he thinks.

Just to be sure, he filters the failing requests for this client by the lengths of the other headers. Strangely, all but one turn out to just sit on the minimum value. The one actually variable one changes, but long values aren't perfectly correlated with failures, which is weird.

Permalink

He selects one of the failing requests, and drops it into his debugger. He runs it up to the point of failure, and then single-steps backwards to see what was happening right before that.

The header data is weird-looking. Most of the headers should be plain text, but this is clearly not text. He looks at the stack trace for context, and realizes that the headers are compressed.

Permalink

He stares at the failing requests and plays with the numbers for a moment, before he realizes what's happening.

How much the headers can be compressed depends on how much structure the compressor can squeeze out of them — which in turn means that it is the content of the headers that determines how long the compressed headers are, and therefore whether the proxy runs out of memory.

That's so cursed.

Permalink

He drums his fingers on the table for a moment, thinking about how to resolve this.

He drops a constraint in the database interface, asserting that no customer names are longer than 300 characters. The constraint sits after the database layer, so customers with longer names will still be able to sign up, it will just send an alert when they do. He writes an explanatory message on the alert, and attaches a default patch to bump it up to 1000 characters when it eventually becomes a problem.

With a maximum size for the customer names now known, he asks his IDE to produce the maximum possible header size again. It thinks for a while and then gives him an answer. He adds another constraint to the external API definition that this value must be less than the memory capacity assigned to the proxy instances.

That constraint immediately fails and pages him, which is good. He goes into the proxy configuration, and bumps the memory usage settings up by a factor of 1.2. The constraint is satisfied.

Permalink

Over the next day, their average request failure rate drops down to 0.0011. Until a power brownout takes out one of their operations centers, and then Uþenor has other problems.

Here Ends This Thread
Total: 23
Posts Per Page: