Elementary statistical stuff
Moderator: Alyrium Denryle
Elementary statistical stuff
I've been having trouble with some fairly simple mathematical stuff. The basic trouble stems from the Infinite Dice found in an RPG I'm fond of. It uses six-sided die, but is still tricky, as I'll show you below.
First, these "infinite" dice are rolled as norma. Any result from 1 to 5 is tallied together normally. At six, however, the die is removed and instead two new dice are rolled. For example, if you roll two dice and get a 2 and a 6, you keep the 2 but roll two more dice, and if one of them should become a 6, you roll two more, et cetera.
As a consequence, the roll is open-ended and can potentially reach infinite value. This system is good for introducing uncertainty into a game.
Now, the idea in this game is to simulate abstracted "skills" and determine whether an attempted action has been successfully performed, or not. Typically, three "infinite" dice are rolled and compared to the skill value. If the result is below or equal, the action succeeded.
Now... how do I calculate probabilities based on that? For example, say that I have a "skill value" of 15; what's my probability of succeeding at a roll of three "infinite" dice?
First, these "infinite" dice are rolled as norma. Any result from 1 to 5 is tallied together normally. At six, however, the die is removed and instead two new dice are rolled. For example, if you roll two dice and get a 2 and a 6, you keep the 2 but roll two more dice, and if one of them should become a 6, you roll two more, et cetera.
As a consequence, the roll is open-ended and can potentially reach infinite value. This system is good for introducing uncertainty into a game.
Now, the idea in this game is to simulate abstracted "skills" and determine whether an attempted action has been successfully performed, or not. Typically, three "infinite" dice are rolled and compared to the skill value. If the result is below or equal, the action succeeded.
Now... how do I calculate probabilities based on that? For example, say that I have a "skill value" of 15; what's my probability of succeeding at a roll of three "infinite" dice?
Björn Paulsen
"Travelers with closed minds can tell us little except about themselves."
--Chinua Achebe
"Travelers with closed minds can tell us little except about themselves."
--Chinua Achebe
Let's start by getting the probability of each number on a single die (we only need to go as high as 13 for purposes of this, but you'll see how it extends)
After the first roll, 1-5 have probability 1/6 and 1/6 possibility of a split.
The average value of the split is 2 times the average value of the first roll (each of the two rolls is identical to the first).
we can do a little algebra to get the mean:
AVERAGE = (1 + 2 + 3 + 4 + 5 + 2*(AVERAGE))/6
AVERAGE = AVERAGE/3 + 15/6
AVERAGE = (3/2)(5/2) = 15/4 = 3.75, not all that impressively more than the normal average of 3.5.
This is the average roll, not a specific probability...
as for the individual probabilities, you'll want to make a table calculating the probability of having a total of N splits:
no splits: 5/6
one split: 1/6 * (5/6)^2... i.e. first roll splits, neither of the splits splits.
two splits: 1/6 * 2 *(5/6)(1/6) * (5/6)^2... i.e. first roll splits, either the first of the split dice splits, or the second of the split dice splits.. and then neither of the resulting dice split.
Beyond 14 splits, you will cease to have contributions to the probability of rolling a 15 because if each die came up as a 1 then you'd still go over.
So, it's a finite problem, but large if done naively.
Is there a cleverer way to do it? Yes, using networks and some algebra... but it's more complicated.
After the first roll, 1-5 have probability 1/6 and 1/6 possibility of a split.
The average value of the split is 2 times the average value of the first roll (each of the two rolls is identical to the first).
we can do a little algebra to get the mean:
AVERAGE = (1 + 2 + 3 + 4 + 5 + 2*(AVERAGE))/6
AVERAGE = AVERAGE/3 + 15/6
AVERAGE = (3/2)(5/2) = 15/4 = 3.75, not all that impressively more than the normal average of 3.5.
This is the average roll, not a specific probability...
as for the individual probabilities, you'll want to make a table calculating the probability of having a total of N splits:
no splits: 5/6
one split: 1/6 * (5/6)^2... i.e. first roll splits, neither of the splits splits.
two splits: 1/6 * 2 *(5/6)(1/6) * (5/6)^2... i.e. first roll splits, either the first of the split dice splits, or the second of the split dice splits.. and then neither of the resulting dice split.
Beyond 14 splits, you will cease to have contributions to the probability of rolling a 15 because if each die came up as a 1 then you'd still go over.
So, it's a finite problem, but large if done naively.
Is there a cleverer way to do it? Yes, using networks and some algebra... but it's more complicated.
To be even more specific, construct a spreadsheet in which each cell represents a state of the system -- one dimension is the number of dice left to be rolled, and the other is the total score so far.
In each cell put the probability that this state will be passed through.
How do you do this? Use the formula
P(S, R) = (P(S, R-1) + sum(i=1...5)(P(S-i, R+1)))/6
with S the score so far and R the rolls remaining. The first part takes into account the possibility of increasing the rolls by rolling a six; the second part takes into account increasing the score by rolling anything else.
Spread this over a region bounded by the cells of 0 score or 0 rolls on one corner and the maximum roll you're interested in on the opposite corner
Then you put a '1' in the (0,1) slot for a single roll, or in the (0, 3) slot for a three-die roll....
and then you'll see the probability for getting any given result laid out along the 0 rolls left axis.
The answer to the question you asked would be the sum of the entries from (3,0) to (15,0).
In each cell put the probability that this state will be passed through.
How do you do this? Use the formula
P(S, R) = (P(S, R-1) + sum(i=1...5)(P(S-i, R+1)))/6
with S the score so far and R the rolls remaining. The first part takes into account the possibility of increasing the rolls by rolling a six; the second part takes into account increasing the score by rolling anything else.
Spread this over a region bounded by the cells of 0 score or 0 rolls on one corner and the maximum roll you're interested in on the opposite corner
Then you put a '1' in the (0,1) slot for a single roll, or in the (0, 3) slot for a three-die roll....
and then you'll see the probability for getting any given result laid out along the 0 rolls left axis.
The answer to the question you asked would be the sum of the entries from (3,0) to (15,0).
- Kuroneko
- Jedi Council Member
- Posts: 2469
- Joined: 2003-03-13 03:10am
- Location: Fréchet space
- Contact:
The expected value for n 6-sided infinite dice is indeed 15n/4, as drachefly calculated. Given n dice that do not split, the number of ways k can occur is the coefficient A(n,k) of x^k in (x+x^2+x^3+x^4+x^5)^n. Recognizing the multinomial series, this can be shown to be A(n,k) = \sum_{j=0}^n{ (-1)^k C(n,j)C(k-5j-1,k-5j-n) }, where C(⋅,⋅) is the binomial coefficient. Most of the terms in the summation do not contribute, so calculating it is actually simpler than it looks. Let's call the probability that n dice will have exactly m splits N(n,m). If one lets B(n,k) = \sum_{j=0}^n{ A(n,j) }, then the probability of getting a roll less than or equal to k is obviously P(n,k) = N(n,0)B(n,k)/5^n + N(n,1)B(n+1,k)/5^{n+1} + N(n,2)B(n+2,k)/5^{n+2} + ... . The values of B(n,15) for 0<n<16 are 5,25,125,555,1753,3745,5595,6075,4915,2993,1365,455,105,15,1, and zero everywhere else. The values of N(n,m) are not too difficult to calculate for n = 3, but I'm more interested in the general case. I'll come back to it when I have the time.
- Faram
- Bastard Operator from Hell
- Posts: 5271
- Joined: 2002-07-04 07:39am
- Location: Fighting Polarbears
drachefly, Kuroneko
You guys scare me, I cannot understand what you just did. Not that I even tried.
You guys scare me, I cannot understand what you just did. Not that I even tried.
[img=right]http://hem.bredband.net/b217293/warsaban.gif[/img]
"Either God wants to abolish evil, and cannot; or he can, but does not want to. ... If he wants to, but cannot, he is impotent. If he can, but does not want to, he is wicked. ... If, as they say, God can abolish evil, and God really wants to do it, why is there evil in the world?" -Epicurus
Fear is the mother of all gods.
Nature does all things spontaneously, by herself, without the meddling of the gods. -Lucretius
"Either God wants to abolish evil, and cannot; or he can, but does not want to. ... If he wants to, but cannot, he is impotent. If he can, but does not want to, he is wicked. ... If, as they say, God can abolish evil, and God really wants to do it, why is there evil in the world?" -Epicurus
Fear is the mother of all gods.
Nature does all things spontaneously, by herself, without the meddling of the gods. -Lucretius
- Kuroneko
- Jedi Council Member
- Posts: 2469
- Joined: 2003-03-13 03:10am
- Location: Fréchet space
- Contact:
If p = 5/6 and q = 1/6, the recursion N(n,m) = \sum_{k=1}^m{ C(n,k) p^{n-k} q^k N(2k,m-k) } is easy to crunch on a computer. The following are the result for N(3,k) for k=0..12: 125/216, 625/2592, 3125/31104, 109375/2519424, 390625/20155392, 4296875/483729408, 1955078125/470184984576, 1396484375/705277476864, 10791015625/11284439629824, 5125732421875/10968475320188928, 30360107421875/131621703842267136, 45343017578125/394865111526801408, 14736480712890625/255872592269367312384, while the numbers of ways for n dice (2<n<16) to give a number less than 15 being: 125, 555, 1753, 3745, 5595, 6075, 4915, 2993, 1365, 455, 105, 15, 1. For example, the probability that three dice do not split is 125/216, and given that they do not split the chance that they give a result no greater than 15 is 125/5^3 = 1. Summing up over all 13 possibilities, the exact probability that three six-sided infinite dice give a result less than or equal to 15 is 220338478078181111717/255872592269367312384, which is about 86.11%.
- Kuroneko
- Jedi Council Member
- Posts: 2469
- Joined: 2003-03-13 03:10am
- Location: Fréchet space
- Contact:
Here's the theoretical probability of success on a roll of three die for a given skill level. They were computed up to the 14th split, meaning the first column was computed exactly and then rounded to the given accuracy, while the second column is not so precise. But one can still expect about eight significant figures even on the last one, so it doesn't really matter--real dice are not so free of bias as to conform to the theoretical probabilities that precisely anyway. If the skill level is less than 3, the probability is obviously zero. At level 32, probability of failure is about 1:410.
Code: Select all
Skill Probability Skill Probability
3 0.00462962962963 18 0.93443738710635
4 0.01890432098765 19 0.94887846588940
5 0.04825745884774 20 0.95995718060006
6 0.09857530927704 21 0.96853184309759
7 0.17623497474026 22 0.97526809879586
8 0.27425675761272 23 0.98053143289542
9 0.38469468457422 24 0.98463864437347
10 0.49852451682333 25 0.98785227817520
11 0.60551935831302 26 0.99037872190187
12 0.69411151068915 27 0.99237162864844
13 0.76512796619991 28 0.99394276645260
14 0.82004602700010 29 0.99518292610286
15 0.86112575060881 30 0.99616400973047
16 0.89156328323673 31 0.99694182305504
17 0.91566803166137 32 0.99755932844732
- Kuroneko
- Jedi Council Member
- Posts: 2469
- Joined: 2003-03-13 03:10am
- Location: Fréchet space
- Contact:
If you're wondering how the recursion N(n,m) = \sum_{k=1}^m{ C(n,k) p^{n-k} q^k N(2k,m-k) }, p = 5/6 (probability a die does not split), q = 1/6 (probability that a die splits), where N(n,m) is again the probability that n die will have exactly m splits, think of it this way:
(1) It is possible to have exactly one split on the first roll, and then the two replacement dice cascade into m-1 splits. Since there are C(n,1) = n possibilities as to which die split, the probability of the former event is C(n,1)p^{n-1}q^1, and of the latter N(2,m-1).
(2) It is also possible to have exactly two splits on the first roll, and then the four replacement dice giving m-2 splits. There are C(n,2) = n(n-1)/2 possibilities as to which two dice split, which probability C(n,2)p^{n-2}q^2, followed by N(4,m-2).
(3) Three dice may split on the first roll and the six replacement dice giving m-3 splits. Thus, C(n,3)p^{n-3}q^3 N(6,m-3).
...
(m) Exactly m might split on the first roll, with the 2m replacement dice giving no splits. Thus, C(n,m)p^{n-m}q^m N(2m,0).
Summing over all C(n,k) p^{n-k} q^k N(2k,m-k) gives the above equality. Note that N(n,0) = (5/6)^n. That's how the probabilities 25/216, 625/2592, 3125/31104, ... were generated (it helps to have a computer for this). Since exactly m splits on n dice gives a total of n+m dice with five possible values each, what this allows one to do is to reduce the problem to a substantially easier one: what is the probability of rolling exactly k using m five-sided die? If that number is B(m,k)/5^m, then the answer to the original question is P(n,k) = N(n,0)B(n,k)/5^n + N(n,1)B(n+1,k)/5^{n+1} + ..., i.e., the probability of no splits and rolling a k on the n dice, plus the probability of one split and rolling a k on the n+1 dice, plus ... .
(1) It is possible to have exactly one split on the first roll, and then the two replacement dice cascade into m-1 splits. Since there are C(n,1) = n possibilities as to which die split, the probability of the former event is C(n,1)p^{n-1}q^1, and of the latter N(2,m-1).
(2) It is also possible to have exactly two splits on the first roll, and then the four replacement dice giving m-2 splits. There are C(n,2) = n(n-1)/2 possibilities as to which two dice split, which probability C(n,2)p^{n-2}q^2, followed by N(4,m-2).
(3) Three dice may split on the first roll and the six replacement dice giving m-3 splits. Thus, C(n,3)p^{n-3}q^3 N(6,m-3).
...
(m) Exactly m might split on the first roll, with the 2m replacement dice giving no splits. Thus, C(n,m)p^{n-m}q^m N(2m,0).
Summing over all C(n,k) p^{n-k} q^k N(2k,m-k) gives the above equality. Note that N(n,0) = (5/6)^n. That's how the probabilities 25/216, 625/2592, 3125/31104, ... were generated (it helps to have a computer for this). Since exactly m splits on n dice gives a total of n+m dice with five possible values each, what this allows one to do is to reduce the problem to a substantially easier one: what is the probability of rolling exactly k using m five-sided die? If that number is B(m,k)/5^m, then the answer to the original question is P(n,k) = N(n,0)B(n,k)/5^n + N(n,1)B(n+1,k)/5^{n+1} + ..., i.e., the probability of no splits and rolling a k on the n dice, plus the probability of one split and rolling a k on the n+1 dice, plus ... .
If you didn't understand my instructions on how to make the spreadsheet, I have made it myself for my own curiosity (I have a thing for random number distributions). So, if you want to get the general answer without having to grok what Kuroneko is getting at (which is very interesting, but it might not be your cup of tea), I can post it in either excel or appleworks format... on monday (must leave for class momentarily).
- Kuroneko
- Jedi Council Member
- Posts: 2469
- Joined: 2003-03-13 03:10am
- Location: Fréchet space
- Contact:
Hmm... solving the recurrence exactly turned out to be quite possible. If N(n,m) is the probability that n dice have exactly m splits, with the probability of split for each die q (p = 1-q), then the recurrence relation is N(n,m) = \sum_{k=1}^m{ C(n,k) p^{n-k} q^k N(2k,m-k) }, as above. This turns out to be N(n,m) = n C(n+2m-1,m-1)/m p^{n+m} q^m. For the six-sided die above, p = 5/6 and q = 1/6. For example, N(3,5) = 3 C(12,4)/5 (5/6)^8 (1/6)^5 = 4296875/483729408 ~ 0.89% is the probability that three six-sided dice will split five times.
That simplifies computations greatly, but the number of ways the dice will give a number less than k still involves a double summation. Perhaps it has a simpler form as well.
That simplifies computations greatly, but the number of ways the dice will give a number less than k still involves a double summation. Perhaps it has a simpler form as well.