Continuing from: Day 24 (20) / Morning Lecture
Now, the nice thing about working with infinities is that you don't have to be very precise at all, and you'll still get exactly the right answer at the end!
Instead of taking x^N, let's use the 'rising powers', okay that didn't translate so probably is not a thing in Golarion, or at least the Share Language donor didn't know it.
Instead of taking x * x * x * ... * x, with N copies of x, let's consider the expression x * (x + 1) * (x + 2) ... until you've got N of those.
Let's denote that x`N`.
Obviously if x is any finite fraction of the way towards infinity, x`N` = x^N exactly. Like, when you're working with trillions, a trillion squared is nearly the same number as a trillion times a trillion plus one, proportionally speaking; that's approximately true, so with infinities it's exactly true.
Example with x=10, N=3:
10`3`
= 10*11*12
= (10)(10+1)...(10+(3-1)) (3 terms total)
= (x)(x+1)...(x+(N-1)) (x`N` with x=10, N=3)
Now observe that:
11`3` - 10`3`
= 11*12*13 - 10*11*12
= (13 - 10) * (11 * 12)
= 3 * 11`2`
By pointwise generalization from 10 to x and 3 to N:
(x + 1)`N` - (x)`N`
= (x + 1)(x + 2)...(x + N) - (x)(x+1)...(x + N - 1)
= ((x + N) - x) * (x+1)...(x + N - 1)
= N * (x+1)`N - 1`
Rearrange:
(x + 1)`N` - (x)`N` = N * (x+1)`N - 1`
==>
x`N` + N*(x+1)`N-1` = (x+1)`N`
We now have an N-1 term in our rule, which would be bad if N were 0. Eliminate all subtraction whenever possible! In case something ends up 0 or negative that didn't start out that way!
In this case we'll just redescribe the number N as being some other number N plus one, so N-1 goes to "N" and N goes to "N+1". We coulda done that originally, but it woulda been needlessly more symbols.
x`N` + N*(x+1)`N-1` = (x+1)`N`
==>
x`N+1` + (N+1)*(x+1)`N` = (x+1)`N+1`
Sanity check with x=10, N=2:
x`N+1` + (N+1)*(x+1)`N` = (x+1)`N+1`
=> 10`2+1` + (2+1)*(10+1)`2` = (10+1)`2+1`
=> 10*11*12 + 3*(11*12) = 11*12*13
PASSED
Then, by induction on f(x) = x`N+1`:
0`N+1` = 0 (base case)
(x+1)`N+1` = x`N+1` + (N+1)*(x+1)`N`
==>
x`N + 1` = (N+1) * 1`N` + (N+1) * 2`N` + ... (N+1) * x`N`
(can optionally add a starting term of (N+1)*0`N+1' since this always equals zero)
Sanity check with x=3, N = 2:
(3)(4)(5) = 3(0)(1) + 3(1)(2) + 3(2)(3) + 3(3)(4)
60 = 0 + 6 + 18 + 36
PASSED
Divide both sides by N+1:
x`N+1` / (N + 1) = 1`N` + 2`N` + 3`N` + ... + x`N`
Approximations are exactly correct at infinity:
(INF)`N` = (INF)^N
...and this is also exactly true once you get any substantial fraction of the way towards infinity. Like, even (1,000,000)`3` starts to look pretty close to (1,000,000)^3.
As for all the finite numbers at the start of the sequence, where the equality isn't exact, they collectively comprise 0% of anything and can be ignored.
Conclusion:
0^N + 1^N + 2^N + 3^N + ... + INF^N = INF^(N+1) / (N+1).
So if you stack up an infinite number of squares with sides 0, 1, 2, 3... you will get exactly a third of an infinitely large cube, and more generally, if you stack up an infinite number of N-hypercubes with sides 0, 1, 2, 3... you'll get exactly 1 / (N+1) of an infinite (N+1)-hypercube.
Which, in case anybody's forgotten by now, was what they needed to prove that:
1/INF * [ (0/INF)^N + (1/INF)^N + (2/INF)^N + ... + (INF/INF)^N ] = 1/(N+1)
...which in turn is how you:
- Represent an infinite number of possible hypotheses, about how far from 0% LEFT to 100% LEFT the first Ball 0 could have landed;
- Each hypothesis with equal prior probability 1/infinity;
- And then update on seeing N balls go LEFT;
- As would have a likelihood of p^N for each hypothesis a fraction p of the way between the left side and right side.
Also in case anybody's forgotten, if you imagine N balls plus the 0 ball getting randomly ordered, the chance of the 0 ball coming rightmost in the ordering - hence, all other balls being LEFT of it - is 1/(N+1). Which is how they originally knew that 1/(N+1) was the desired end result, for the likelihood of seeing all N balls land on the LEFT.
Also also in case anybody's forgotten, the reason they needed this sum was to renormalize the relative probabilities into absolute probabilities. That's another step along the direct path to deriving a prediction about the next ball observed.
After seeing N balls go LEFT, the posterior distribution ends up proportional to a distribution where each hypothesis p has weight p^N.
But if you forget to normalize that prior-times-likelihood distribution, and ask what happens if every part of the distribution then contributes p likelihood to the next ball going LEFT, you'd first add one extra factor of p everywhere, and then calculate:
1/INF * [ (0/INF)^(N+1) + (1/INF)^(N+1) + ... + (INF/INF)^(N+1) ] = 1/(N+2)
After seeing 100 balls go LEFT, you shouldn't calculate a 1/102 chance of the next ball going LEFT - you should think it's more likely than not for the next ball to go LEFT - so something has gone wrong here; and what's wrong is that we didn't normalize the prior-times-likelihood distribution.
The pre-normalized distribution sums to 1/(N+1); the normalized distribution should sum to 1. So we divide everything through by 1/(N+1), meaning we multiply by (N+1), to make it sum to 1. For this example, that means multiplying everything uniformly by 101 to normalize it. Which yields a 101/102 chance of the next ball going LEFT, which is what it should be.
Though, Keltham quickly remarks, the principle of making the posterior sum to 1, is something that only applies within that whole metahypothesis. If you see balls going LEFT RIGHT LEFT RIGHT repeating, you might form, after a few steps, the hypothesis that the ball goes LEFT RIGHT repeating, which can't be expressed within that whole metahypothesis. All of this reasoning is taking place inside the metahypothesis about the balls having some stable but otherwise not-further-scrutable fractional propensity to go LEFT or RIGHT, with all such propensities between 0 and 1 being equally likely on priors. This whole metahypothesis then may do better or worse compared to some other hypotheses.
He's not saying that the metahypothesis absolutely has to be true, by saying that the sum of all the hypotheses inside it should be 1. If you started believing that, you'd never in principle be able to notice any other pattern like LEFT RIGHT repeating.
Rather, this whole metahypothesis is itself being entertained as one of many in-principle possible metahypotheses about how things could be, and the probabilities of all the hypotheses inside it summing to 1, are to be understood as being what we get when we entertain the possibility, condition on, reason inside, the metahypothesis being true.
After seeing N balls go left, the whole metahypothesis has racked up a joint probability of 1/(N+1) and scored log(1/(N+1)) in terms of how well it's doing. If you previously started out with any prior credence on an incompatible metahypothesis, like 'Actually maybe the ball just always goes LEFT or RIGHT, both equally credible on priors,' that other metahypothesis may be doing better. In fact, the always-left-or-always-right metahypothesis has now racked up probability 1/2 and a score of log(1/2) = -1, which is probably quite a bit better than 1/(N+1) if you've seen a lot of balls go LEFT. You'd probably be believing in that metahypothesis a bit more, now, even if it started out less meta-credible than the other.
"Why do we need that - other hypothesis, metahypothesis - about there being hypotheses where the ball always goes LEFT or RIGHT? Aren't those hypotheses already inside - the metahypothesis about all the fractions of times a ball could go LEFT or RIGHT?"
"Like, 0/INF is that it never goes left, and INF/INF is that it always goes left... am I missing something?"
"Both those hypotheses have prior probabilities of 1/INF, remember? That's equal to zero. So the metahypothesis thinks it's infinitely unlikely that the fraction of times the ball goes left is exactly equal to 1 or 0. It never notices that's true, no matter how many examples it sees."
"If it sees 100 balls go left, it'll get a joint prediction of 1/101. If it sees 10,000 balls go left, it'll get a joint prediction of 1/10,001. In the limit, the probability that it assigns to a sequence of left-going balls, continuing forever, is 0. Before you've rolled a single ball, the metahypothesis is already absolutely certain that it won't just go left every time."
"The alternative metahypothesis that the ball has a 50% chance of always going left, and 50% chance of always going right, loses all of its probability and scores negative infinity as soon as it sees any sequence with a mix of left and right balls. But if you present that metahypothesis with a sequence of balls going left forever, no matter how long that continues, it predicted that with 50% probability from the start."
Keltham will now zoom in closer on what happens to the 'any fraction between 0 and 1 is equally likely' metahypothesis when it updates.
Let's say that we see 10 balls go LEFT.
After seeing 10 balls go LEFT, the metahypothesis as a whole has a cumulative score of 1/(N+1) = 1/11, or equivalently, if you asked this metahypothesis at the dawn of time what probability it assigned to the first 10 balls going left, it would answer "1/11". Equivalently, isomorphically, these are stating the same thing, if you multiply out the prior distribution (1/INF everywhere) and the likelihood distribution (p^10 everywhere) you'd get a new distribution p^10/INF which, if you summed over the infinitely many terms inside it, would sum to 1/11.
Continuing to reason within this metahypothesis, one of the internal subhypotheses must by assumption be true, and so within the metahypothesis the posterior probabilities of the (infinite collection of) subhypotheses must sum to 1.
Which you'd obviously do by dividing all the terms through by 1/11, that is, multiplying them by 11.
The distribution...
(0/INF)^10 (1/INF)^10 (INF/INF)^10
---------- + ---------- + ... + ------------
INF * 1/11 INF * 1/11 INF * 1/11
...will, indeed, sum to 1.
To be clear, by having the distribution sum to 1, we are assuming-for-the-sake-of-extrapolation that some hypothesis in the collection must be true. But no matter how far the posterior updates on finite data, every individual hypothesis in the collection will still have posterior 0.
After rolling 1000 balls and seeing 985 go left, there's finite probability between 0.98 and 0.99, but 0 probability on the fraction being exactly 0.985.
That's just how this metahypothesis rolls. If you wanted a metahypothesis that could put little finite dots of probability mass on all the big round numbers like 0.985, 1/2pi, and so on, that would be a much more complicated ask.
What we ask about, having updated, is not so much the probability of individual hypotheses - which is always 0 - but the new shape of the curve.
If you see 10 balls go LEFT, the individual hypothesis 0.9 has now scored around 1/e, 37% or so, compared to 1 to the individual hypothesis 1.0. If you see 100 balls go LEFT, raise all that to a power of 10; now 0.9 will have less than 0.01% the cumulative likelihood predicted as 1.0. On the scale of the curve, this means that over 99.99% of the probability is now concentrated into the interval between 90% and 100%. Though, of course, still 0% is concentrated at exactly 100%.
Keltham will Prestidigitate-sketch a few rough images of what curves like these should maybe probably look like, after updating on various numbers of LEFT and RIGHT balls observed. Keeping in mind that Keltham's not a 'computer', and can't draw the curves perfectly.
If you're willing to assume your audience can already freely wield the machinery of derivatives and antiderivatives and understand on an intuitive level everything that they mean, then sure, you can solve the problem more quickly using overpowered tools like that.
Integrating x^N gives you x^(N+1)/(N+1), which evaluated from 0 to 1 will give you 1/(N+1).
Given a mix of N LEFTs and M RIGHTs, however, and the need to renormalize the posterior over that, they might find themselves in a bit more trouble when it comes to deriving the integral...
Integral of p from 0 to 1: [ (1 - p)^M p^N dp ] = M! * N! / (M + N + 1)!
Keltham doesn't actually remember the exact proof, it's been a while, but he's pretty sure the introduction he used was the one his teachers used on him. So it's possibly the right introduction for getting the more complicated proof in the least painful way, once he starts trying to remember that?
And in any case, Keltham hasn't yet been told he's allowed to assume that everyone in this class has a full intuition for derivatives and integrals yet. Keltham hasn't taught them a calculus class himself, and who knows what weird malfunctions exist in the Golarion way of teaching it.
Why, it wouldn't shock him, at this point, if people are just being told to memorize formulas and not really taught to intuit how anything works!
So he's just talking directly about the infinity of possible hypotheses between 0 and 1, and how to sum up priors-times-likelihoods and posterior-predictions from those, rather than using calculus to abstract over that. Abstracting over that only works to teach mathematical intuition if people already know what's being abstracted away.
Keltham is in favor of people understanding things using calculus and continuous distributions, to be clear. So long as they can also understand the ideas in terms of a countably infinite collection of individual hypotheses each with probability 1/INF. You don't want to start identifying either representation or methodology of analysis with the underlying idea!
Which idea is just: a metahypothesis where any fraction from 0 to 1 seems equally plausible on priors. Those get updated on observations of LEFT and RIGHT, that have different likelihoods for different propensity-fractions; the allocation of posterior probability over fractions changes and gets normalized back to summing to 1; that changes the new predictions for LEFT and RIGHT on successive rounds, having updated on all of the previous rounds.
Keltham will now go up to the wall and spend a bit of time figuring out how to derive the rest of the Rule of Succession, which there's no simple or obvious way to prove using calculus known to anyone in this room anyways.
Thankfully, they know what all the correct answers have to be! Using much simpler combinatoric arguments, about new balls ending up randomly ordered anywhere between the left boundary, all the LEFT balls, the 0 ball, all the RIGHT balls, and the right boundary.
Eventually Keltham does succeed in deriving (again) (but this time proving it using dubious infinitary arguments, instead of clear and simple combinatorics, so they can see what's happening with priors and posteriors and likelihoods behind the scene) that indeed:
If you start out thinking any fraction of LEFT and RIGHT between 0 and 1 is equally plausible on priors, and you see experimental results going LEFT on N occasions and going RIGHT on M occasions, the prediction for the next round is (N+1)/(N+M+2) for LEFT and (M+1)/(N+M+2) for RIGHT.