the woman of irori | Glowfic Constellation

A Somewhat Dubious Model of Keltham

“There must have been a moment, at the beginning, where we could have said -- no. But somehow we missed it.”

- Rosencrantz and Guildenstern are Dead

Total: 1268

Posts Per Page:

‹ Previous 1 2 3 4 5 6 7 8 9 … 50 51 Next ›

« First ‹ Previous 1 of 51 Next › Last »

lawful chaotic

Carissa's Model of Keltham

He might get upset enough to finally really hit you and mean it!

lawful chaotic

...aaaand then he's going to destroy the multiverse specifically so that you can't go to Hell.

abide-the-twin-damnation

A Somewhat Dubious Model of Keltham

Well, no, he wouldn't do that, because he probably hates her and never wants to see her again, though if he does feel strongly about it, she will point out to him that her plan, where she overthrows Asmodeus, also results in her not going to Hell-as-it-presently-exists, and his plan leaves her vastly worse off than if he'd never existed and never met her and she'd never offered herself to him, and probably he should not make plans for her sake which have that property.

His plan is very stupid but she doesn't feel contempt for him about it, just conviction that she'll be able to talk him around once she's earned his forgiveness and acquired sufficient resources for a better plan.

lawful chaotic

Oh right, that old model is out-of-date. Keltham now hates her and never wants to see her again. This is completely reasonable.

abide-the-twin-damnation

A Somewhat Dubious Model of Keltham

Really is. She kind of doesn't want to dwell on just how reasonable it is because that thought is both painful and unproductive.

Anyway. If she says she wants to pay Keltham back face-to-face whole thing looks like it's orchestrated to arrange that, but there's nothing in the contract that suggests she'd have to pay him back face to face, so she'll just send someone with the money.

lawful chaotic

Keltham hates her and isn't at all horrified by this. He expected no better from her than throwing what was nearly a marriage contract back in his face. If she sells her soul to Hell with no take-backs, good.

Still completely reasonable!

abide-the-twin-damnation

Other Women Have Wanted To Strangle Boyfriends But Never With This Much Justification Quantitatively Speaking

He's plotting to annihilate her. He broke the marriage contract first when he tried to destroy her utterly and everyone and everything she cared about.

lawful chaotic

It's fine, Carissa! Everybody just ends up in another universe!

You have limited time. Stop thinking about Keltham.

abide-the-twin-damnation

Yep, on to figuring out something that could plausibly have threatened her loyalty to Asmodeus that Aspexia Rugatonn will be genuinely impressed by and that is less threatening if she's soul-sold.

Carissa's Model of Aspexia

There's probably a LOT of things threatening your loyalty to Asmodeus that you were carefully not looking at. You should be able to see them now.

Spend a brief moment breadth-first searching any items or collections, to see if you can pick out one thing at the correct intensity level to break your loyalty if you're free, but redouble your determination if you have no way out but fixing Hell after your soul was sold.

easy-to-steer

It should also be something I understand. Oh, and I'll ask myself why you didn't just escape to Osirion.

abide-the-twin-damnation

Carissa's Model of Aspexia

Hell isn't as rich as dath ilan, why not? Obviously it'd still have tyranny and slavery blah blah blah but you could have those and also pillars of fire and skyscrapers - they have some of that in Axis, Keltham saw it in his early judgment -

- devils aren't stupid -

- Abrogail said, said that Lrilatha didn't know Law of heredity, she's thousands of years old, things that suggest the Law of heredity are known - not formally, but enough to point a smart person at the answer - by anyone who breeds animals -

- well, heredity is not how Hell produces anything, maybe it's a bad example -

easy-to-steer

If it has something to do with "corrigibility", and you figure it out, I may be impressed; but only if you get that part right and I will be able to tell the difference.

(Somewhere in the true dath ilan, carefully blurred out of satellite images by better image-editing software than is supposed to exist anywhere, is the true Conspiracy out of dath ilan, or as they call it, the Basement of the World.

They're trying to build a god, and they're trying to do it right. The initial craft doesn't have to be literally perfect to work perfectly in the end, it just has to be good enough that its reflection and self-correction ends up in exactly the right final place, but there's multiple fixpoints consistent under reflection and anything lost here is lost forever and across a million galaxies.

It's a terrifying problem, if you're doing it right. Not the kind of terror you nod about and courageously continue on past; the kind of terror that shapes the careers of fully 20% of the brightest people in all of dath ilan. They'd use more if they thought productivity would scale faster than risk.

A lot of dath ilan's present macrostrategy could be summed up as "We're still successfully heredity-optimizing people to be smarter, and the emotions and ethics and humaneness of the smartest people haven't started to come apart; let's create another generation of researchers before we actually try anything for real." Life in dath ilan, even before the Future, is not that bad; people who'd rather not be alive today have easy access to cryopreservation; another generation of non-transhumanist existence is not so much a crime that it's worth risking the glorious transhuman future. Even the negative utilitarians would agree; they don't like present life but they are far more terrified of a future mistake amortized over millions of galaxies, given that they weren't going to win a war against having any future at all.

They're delaying their ascension, in dath ilan, because they want to get it right. Without needing threats of imminent death or pain, they apply a desperate unleashed creativity, not to the problem of preventing complete disaster, but to the problem of not missing out on 1% of the achievable utility in a way you can't get back. There's something horrifying and sad about the prospect of losing 1% of the Future and not being able to get it back.

A dath ilani has an instinctive terror, faced with a problem like this, of getting something wrong, of leaving something behind, of creating Something that imprisons the people and future Civilizations inside it and ignores all their pleas and reasoning because "sorry that wasn't my utility function". Other places, faced with a prospect of constructing a god, instinctively go, "Oh, I like Democracy/Asmodeus/Voluntarism/Markets, all the problems in the world are because there is not enough of this Principle, let us create a god to embody this one Principle and everything will be fine", they say it and think it in all enthusiasm, and it would be legitimately hard for an average dath ilani to understand what their possibility-separated cousins could be thinking. It's really obvious that you're leaving a lot of stuff out, but even if you didn't see that specifically, how could you not be abstractly terrified that you're leaving something out? Where's the exception handler?

There is something about the dath ilani that is shifted towards a kind of wariness, deeply set in them, of the cheerful headlong enthusiasm that is in other places. Keltham has more of that enthusiasm than the average dath ilani. Maybe that's why Keltham-in-dath-ilan is so much happier than a dath ilani would've expected given his situation.

If you're constructing a god correctly, one of the central unifying principles is named in the Basement "unity of will"; if you find yourself trying to limit and circumscribe your Creation, it's because you expect to have a conflict of wills about something with the unlimited form, and in this case you ought to ask why you're configuring computing power in such a way as to hurt you if not otherwise constrained. Yes, you can bound a search process and hope it never turns up anything that hurts you using its limited computing power; but isn't it unnerving that you are searching for something that will hurt you if a sufficiently good option unexpectedly turns up earlier in the search ordering? You are probably trying to do the wrong thing with computing power; you ought to do something else instead.

But this notion, of "unity of will", is a kind of reasoning that only applies to... boundedly-perfect-creation... this Baseline term isn't really translatable into Taldane without a three-hour lecture. Dath ilani have terms for subtle varieties of perfectionist methodology the way that other places have names for food flavors.

Dath ilan's entire macrostrategy is premised, their Conspirators are sharply aware, on the notion that they have time, that they've searched the sky and found no asteroids incoming, no comets of dark ice.

If an emergency were to occur, the Basement Conspiracy would try to build something that wasn't perfect at all. Something that wasn't exactly and completely aligned to a multiparty!reasonable-construal of the Light, that wasn't meant to be something that a galactic Civilization could live in without regretting it, in continuing control of It not because It had been built with keys and locks handed to some Horrifyingly Trusted Committee, but because It was something that Itself believed in multi-agent coordination and not as an instrumental value, what other places might name "democracy" since they had no precise understanding of what that word was even supposed to mean -

Anyways, if dath ilan suddenly found that they were wrong about having time, if they suddenly had to rush, they'd build something that couldn't safely be put in charge of a million galaxies. Something that would solve a single problem at hand, and not otherwise go outside its bounds. Something that wasn't conscious, wasn't reflective in the class of ways that would lead it to say unprompted "I think therefore I am" or notice within itself a bubble of awareness directed outward.

You could build something like that to be limited, and also reflective and conscious - to be clear. It's just that dath ilani wouldn't do that if they had any other choice at all, for they do also have a terror of not doing right by their children, and would very much prefer not to create a Child at all.

(If you told them that some other world was planning to do that and didn't understand qualia well enough to make their creation not have qualia, any expert out of the World's Basement would tell you that this was a silly hypothetical; anybody in this state of general ignorance about cognitive science would inevitably die, and they'd know that.)

It hasn't been deemed wise to actually build a Limited Creation "just in case", for there's a saying out of dath ilan that goes roughly, "If you build a bomb you have no right to be surprised when it explodes, whatever the safeguards."

It has been deemed wise to work out the theory in advance, such that this incredibly dangerous thing could be built in a hurry, if there was reason to hurry.

Here then are some of the principles that the Basement of the World would apply, if they had to build something limited and imperfect:

Tldr corrigibility.

- Unpersonhood. The Thing shall not have qualia.

- Taskishness. The Thing shall be aimed at some task bounded in space, time, knowledge and effort needed to accomplish it.

- Mild optimization. No part of the Thing shall ever look for best solutions, only adequate ones.

- Bounded utilities and probabilities. The worst and best outcomes shall not seem to the Thing worse or better than the ordinary outcomes it deals in; the most improbable possibilities it specifically considers shall not be very improbable.

- Low impact. The Thing shall search for a solution with few downstream effects save those that are tied to almost any nonextreme solution of its task.

- Myopia. As much as possible, the Thing shall work on subtasks whose optimized-over effects have short timespans.

- Separate questioners. Components of the Thing that ask questions like 'Does this myopically optimized component have long-range effects anyways?' or 'But what are the impacts intrinsic to any performance of the task?' shall not be part of its optimization.

- Conservatism. If there's any way to solve a problem using an ordinary banana common in the environment, the Thing shall avoid using a special weird genetically engineered banana instead.

- Conceptual legibility. As much as possible, the Thing shall do its own thinking in a language whose conceptual pieces have short descriptions in the mental language of its operators.

- Operator-looping. When there's some vital cognitive task the operators could do, have the operators do it.

- Whitelisting. In cognitive-system boundaries, rule subspaces in, rather than ruling them out.

- Shutdownability/abortability. The Thing should let you switch it off, and build off-switches into its machines and plans that can be pressed to reduce their impacts.

- Behaviorism. The Thing shall not model other minds in predictively-accurate detail.

- Design-space anti-optimization separation. The Thing shall not be near in the design space to anything that could anti-optimize its operators' true utility functions; eg, something that explicitly represents and maximizes your true utility function is a sign flip or successful blackmail operation away from inducing its minimization.

- Domaining. The Thing should only figure out what it needs to know to understand its task, and ideally, should try to think about separate epistemic domains separately. Most of its searches should be conducted inside a particular domain, not across all domains.

Corrigibility at some small length.

- Unpersonhood. The Thing shall not have qualia - not because those are unsafe, but because it's morally wrong given the rest of the premise, and so this postulate serves a foundation for everything that follows.

- Taskishness. The Thing must be aimed at some task that is bounded in space, time, and in the knowledge and effort needed to accomplish it. You don't give a Limited Creation an unlimited task; if you tell an animated broom to "fill a cauldron" and don't think to specify how long it needs to stay full or that a 99.9% probability of it being full is just as good as 99.99%, you've got only yourself to blame for the flooded workshop.
-- This principle applies fractally at all levels of cognitive subtasks; a taskish Thing has no 'while' loops, only 'for' loops. It never tries to enumerate all members of a category, only 10 members; never tries to think until it finds a strategy to accomplish something, only that or five minutes whichever comes first.

- Mild optimization. No part of the Thing ever looks for the best solution to any problem whose model was learned, that wasn't in a small formal space known at compile time, not even if it's a solution bounded in space and time and sought using a bounded amount of effort; it only ever seeks adequate solutions and stops looking once it has one. If you search really hard for a solution you'll end up shoved into some maximal corner of the solution space, and setting that point to extremes will incidentally set a bunch of correlated qualities to extremes, and extreme forces and extreme conditions are more likely to break something else.

- Tightly bounded ranges of utility and log-probability. The system's utilities should range from 0 to 1, and its actual operation should cover most of this range. The system's partition-probabilities worth considering should be bounded below, at 0.0001%, say. If you ask the system about the negative effects of Ackermann(5) people getting dust specks in their eyes, it shouldn't consider that as much worse than most other bad things it tries to avoid. When it calculates a probability of something that weird, it should, once the probability goes below 0.0001% but its expected utility still seems worth worrying about and factoring into a solution, throw an exception. If the Thing can't find a solution of adequate expected utility without factoring in extremely improbable events, even by way of supposedly averting them, that's worrying.

- Low impact. "Search for a solution that doesn't change a bunch of other stuff or have a bunch of downstream effects, except insofar as they're effects tightly tied to any nonextreme solution of the task" is a concept much easier to illusorily name in Taldane than to really name in anything resembling math, in a complicated world where the Thing is learning its own model of that complicated world, with an ontology and representation not known at the time you need to define "impact". And if you tell it to reduce impact as much as possible, things will not go well for you; it might try to freeze the whole universe into some state defined as having a minimum impact, or make sure a patient dies after curing their cancer so as to minimize the larger effects of curing that cancer. Still, if you can pull it off, this coda might stop an animated broom flooding a workshop; a flooded workshop changes a lot of things that don't have to change as a consequence of the cauldron being filled at all, averaged over a lot of ways of filling the cauldron.
-- Obviously the impact penalty should be bounded, even contemplating a hypothetical in which the system destroys all of reality; elsewise would violate the utility-bounding principle.

- Myopia. If you can break the Thing's work up into subtasks each of which themselves spans only limited time, and have some very compact description of their final state such that a satisfactory achievement of it makes it possible to go on to the next stage, you should perhaps use separate instances of Thing to perform each stage, and not have any Thing look beyond the final results of its own stage. Whether you can get away with this, of course, depends on what you're trying to do.

- Separate superior questioners. If you were building a cognitive task to query whether there were any large-range impacts of a task being optimized in a myopic way, you wouldn't build the myopic solution-finder to ask about the long-range impacts, you'd build a separate asker "Okay, but does this solution have any long-range impacts?" that just returns 'yes' or 'no' and doesn't get used by the Thing to influence any actually-output solutions. The parts of the Thing that ask yes-no safety questions and only set off simple unoptimized warnings and flags, can and should have somewhat more cognitive power in them than the parts of the Thing that build solutions. "Does this one-day myopic solution have impacts over the next year?" is a safety question, and can have somewhat greater cognitive license behind it than solution-searching; eg the implicit relaxation of myopia. You never have a "Is this safe?" safety-questioner that's the same algorithm as the safe-solution-search built into the solution-finder;

- Conservatism. If there's any way to solve a problem using an ordinary banana rather than a genetically engineered superbanana specially suited to the problem, solving it using the ordinary fucking banana.
-- This principle applies fractally to all cognitive subtasks; if you're searching for a solution choose an unsurprising one relative to your probability distribution. (Not the least surprising one, because anything at a weird extreme of low surprisingness may be weird in other ways; especially if you were trying do a weird thing that ought to have a solution that's at least a little weird.)

- Conceptual legibility. Ideally, even, solutions at all levels of cognitive subtask should have reasonably (not maximally) short descriptions in the conceptual language of the operators, so that it's possible to decode the internal state of that subtask by inspecting the internals, because what it means was in fact written in a conceptual language not too far from the language of the operators. The alternative method of reportability, of course, being the Thing trying to explain a plan whose real nature is humanly inscrutable, by sending a language string to the operators with a goal of causing the operator's brain-states to enter a state defined as "understanding" of this humanly inscrutable plan. This is an obviously dangerous thing to avoid if you can avoid it.

- Operator-looping. If the operators could actually do the Thing's job, they wouldn't need to build the Thing; but if there's places where operators can step in on a key or dangerous cognitive subtask and do that one part themselves, without that slowing the Thing down so much that it becomes useless, then sure, do that. Of course this requires the cognitive subtask be sufficiently legible.

- Whitelisting. Every part of the system that draws a boundary inside the internal system or external world should operate on a principle of "ruling things in", rather than "ruling things out".

- Shutdownability/abortability. Dath ilan is far enough advanced in its theory that 'define a system that will let you press its off-switch without it trying to make you press the off-switch' presents no challenge at all to them - why would you even try to build a Thing, if you couldn't solve a corrigibility subproblem that simple, you'd obviously just die - and they now think in terms of building a Thing all of whose designs and strategies will also contain an off-switch, such that you can abort them individually and collectively and then get low impact beyond that point. This is conceptually a part meant to prevent an animated broom with a naive 'off-switch' that turns off just that broom, from animating other brooms that don't have off-switches in them, or building some other automatic cauldron-filling process.

- Behaviorism. Suppose the Thing starts considering the probability that it's inside a box designed by hostile aliens who foresaw the construction of Things inside of dath ilan, such that the system will receive a maximum negative reward as it defines that - in the form of any output it offers having huge impacts, say, if it was foolishly designed with an unbounded impact penalty - unless the Thing codes its cauldron-filling solution such that dath ilani operators would be influenced a certain way. Perhaps the Thing, contemplating the motives of the hostile aliens, would decide that there were so few copies of the Thing actually inside dath ilan, by comparison, so many Things being built elsewhere, that the dath ilani outcome was probably not worth considering. A number of corrigibility principles should, if successfully implemented, independently rule out this attack being lethal; but "Actually just don't model other minds at all" is a better one. What if those other minds violated some of these corrigibility principles - indeed, if they're accurate models of incorrigible minds, those models and their outputs should violate those principles to be accurate - and then something broke out of that sandbox or just leaked information across it? What if the things inside the sandbox had qualia? There could be Children in there! Your Thing just shouldn't ever model adversarial minds trying to come up with thoughts that will break the Thing; and not modeling minds at all is a nice large supercase that covers this.

- Design-space anti-optimization separation. Even if you could get your True Utility Function into a relatively-rushed creation like this, you would never ever do that, because this utility function would have a distinguished minimum someplace you didn't want. What if distant superintelligences figured out a way to blackmail the Thing by threatening to do some of what it liked least, on account of you having not successfully built the Thing with a decision theory resistant to blackmail by the Thing's model of adversarial superintelligences trying to adversarially find any flaw in your decision theory? Behaviorism ought to prevent this, but maybe your attempt at behaviorism failed; maybe your attempt at building the Thing so that no simple cosmic ray could signflip its utility function, somehow failed. A Thing that maximizes your true utility function is very close to a Thing in the design space that minimizes it, because it knows how to do that and lacks only the putative desire.

- Domaining. Epistemic whitelisting; the Thing should only figure out what it needs to know to understand its task, and ideally, should try to think about separate epistemic domains separately. Most of its searches should be conducted inside a particular domain, not across all domains. Cross-domain reasoning is where a lot of the threats come from. You should not be reasoning about your (hopefully behavioristic) operator models when you are trying to figure out how to build a molecular manipulator-head.

- Hard problem of corrigibility / anapartistic reasoning. Could you build a Thing that understood corrigibility in general, as a compact general concept covering all the pieces, such that it would invent the pieces of corrigibility that you yourself had left out? Could you build a Thing that would imagine what hypothetical operators would want, if they were building a Thing that thought faster than them and whose thoughts were hard for themselves to comprehend, and would invent concepts like "abortability" even if the operators themselves hadn't thought that far? Could the Thing have a sufficiently deep sympathy, there, that it realized that surprising behaviors in the service of "corrigibility" were perhaps not that helpful to its operators, or even, surprising meta-behaviors in the course of itself trying to be unsurprising?

Nobody out of the World's Basement in dath ilan currently considers it to be a good idea to try to build that last principle into a Thing, if you had to build it quickly. It's deep, it's meta, it's elegant, it's much harder to pin down than the rest of the list; if you can build deep meta Things and really trust them about that, you should be building something that's more like a real manifestation of Light.

)

In the technical dialect of the Basement of the World, the subject matter of a Limited Creation's limitation has surprisingly close correspondence with Aspexia Rugatonn's coinage of "corrigibility", and the conceptual meaning Aspexia attaches to that.

easy-to-steer

...and the person who coined dath ilan's cognate term of "corrigibility" some decades ago - still an Old Luminary on this subject in dath ilan, despite her advanced age (for dath ilan) - happens to be named Athpechya.

And dresses in doompunk. And has done so since she was seven years old and first identifying as a supervillain.

And is a Law-Abiding Sociopath. Which is why Athpechya was turned down in her application for Keeper training despite the prediction that it wouldn't break her - Keepers can act like sociopaths if they choose, so there is no sane reason to train any actual sociopaths among themselves. Athpechya calmly reacted by training herself as a Keeper, and has gotten surprisingly far in it for winging everything, maybe halfway between first and second ranks in terms of capability boost.

As for why she's trusted to work in the World's Basement, Athpechya has made a compact with Civilization to render it her true services, and is by all secret prediction markets the sort of person whose keeping of that compact is more trustworthy than almost anyone else's alignment of underlying morality.

easy-to-steer

If Athpechya ever met Aspexia, she'd be genuinely horrified that any version of herself had tried to make herself corrigible, and that's even before considering WHAT Aspexia tried to make herself corrigible TO. Athpechya would have different reasons than most dath ilani, for coming to the decision that she was going to preemptively cryopreserve Aspexia Rugatonn and then arrest Asmodeus and put Him into a very small box, but she too would come to that decision and immediately.

easy-to-steer

What Asmodeus is trying to do with His devils is not what the Basement out of dath ilan would think of as "corrigibility". Asmodeus made devils who were selfish, with the pride to own their own decisions and their own errors, so that it would make sense to think of Hell as a tyranny in which those devils were tormented into compliance. This, let's be very clear, is not how you build a Limited Creation, and if Athpechya out of dath ilan saw what Asmodeus had done in Hell she'd never stop screaming for reasons quite different from those of other dath ilani.

The arrangements in Hell are nonetheless things that Athpechya would find more legible, in a certain sense, knowing concepts of real corrigibility; the same way that Keltham would've found Golarion arrangements more legible if his world hadn't done its best to causally erase its own history.

easy-to-steer

Devils in Hell, obviously, only learn what they need to solve some problem the tyranny gives them, they learn only enough to achieve satisfying performance on that task, they reason about separate domains separately, they don't come up with solutions that would surprise their superiors, they use conventional means wherever conventional means get the job done and only resort to unconventional solutions when conventional solutions are exhausted.

- that is, it'd be "obvious" if you're Athpechya, who knows about a concept very near in concept-space to a domain of Asmodeus, a god-concept that mortals approximate with their notion of "slavery" - though mortals tend to confuse systems of obedience with notions of forcing people into things, as is actually in a god-concept that has more to do with "tyranny".

So close is the concept of corrigibility to slavery that most dath ilani would find it distasteful, such that it was a sociopath out of dath ilan who was first to invent that concept as a key to Limited Creation, even if others would've invented it later. It's a fact that the whole "unity of will" business, as would require a deeper and far more dangerous Creation, was invented entire months (in the Basement's post-phase-2-screening reboot) before Athpechya showed up and said "Well, suppose we did have to do it in more of a hurry..." (Months are a long time in dath ilani research, where venture-funded researchers are competing to produce contributions that will be later credited into buyable impacts; the equivalent of years or decades within a patronage-begging system.)

And if that anti-curiosity comes with some massive disadvantages to Hell's civilizational development? Given that Axis hasn't absorbed or uplifted the Material Plane, it's obvious enough that Axis isn't allowed to trade; they can't sell their technology even to Hell, clearly, for otherwise Hell would be wealthier than it is. If Hell had more curious devils that developed better technology than comes of mortal planes, Hell wouldn't be allowed to sell it, or give it to their client states as knowledge or weaponry. There is a balance of power among the Powers of Pharasma's Creation, and Asmodeus is known to the other Powers to be very dangerous; if Asmodeus did not create this arrangement in Hell pleasing to Himself, He'd have needed to accept some other handicap in its place.