Is it possible to distinguish between digits of PI and digits of a pseudorandom sequence, without using any PI digit lookup table or calculating digits of PI? For example, a 100,000 digit sequence of consecutive digits in PI from any starting point compared to a sequence of 100,000 pseudorandom digits.
I have an algorithm I made that gets something around a ~1% bias, varying on comparison sequence length, when comparing digits of PI to digits of my pseudorandom sequence so was just curious! :D
There are definitely some periodic patterns in pi, the algorithm I made is a compression algorithm that compresses based on periodicity, and pi is about 1% more compressed than the pseudorandom sequence up to the number of digits I have been able to analyze so far,
1million digit sequence length.
I tested it at many different scales and starting points in the sequence, here are the results if interested:
The column on the left, 0, 100, 1000, 10000 etc.. is the first element of the sequence that was compressed, and then the top row:
100, 1000, 10000 (up to 800000) is the total length of consecutive digits that were compressed.
pseudorandom sequence results:
formatting link
pi sequence results:
formatting link
graph showing the percent different compression of the pi sequences versus the pseudorandom sequences:
formatting link
I would like to test it on some sequences with longer lengths than
1million, ideally up to 10million at least, to see what that curve does.
My compression algorithm can identify periodic patterns in sequences so that is what was detected in the pi sequence, whereas the pseudorandom sequence is the noisiest sequence I compressed so far.
If anyone has a sequence they think might be noisier (less periodic) than the pseudorandom sequence I tested please send me a link to it and I will compress it to see how it compares to the pseudorandom one.
Also any other sequences considered random that you think might have some patterns are welcome :D
I will release the algorithm opensource at some point if it is useful.
It looks like that as the comparison sequence lengths increase, the PI digits appear closer to random, ie in the graph on the right here:
formatting link
There is an apparent decay towards 1 which if reached would be an equal compression in my algorithm for pi and the pseudorandom sequence. I tested up to 1million digits just now and there is further decay towards 1, 800000 digit sequence had a 1.01284538 ratio difference and 1million digit sequence compression had a 1.008441754 ratio difference between the pi and pseudorandom sequence. For a
10million length sequence I think it will approach 1 but I don't know for sure.
Anyway, for sequences length ranges between 10,000 and 1million, there is a clear difference between the compression for pi and the pseudorandom sequence, tested over multiple starting ranges within 1 million digit sequences, so to check if a sequence is from pi or from a pseudorandom sequence, it can be subdivided if necessary into 1million digit lengths and checked with this algorithm.
The fact that there is a small periodicity in the pi sequence that diminishes with sequence length is interesting I think.
You misunderstand the nature of probability if you think that a low probability of occurrence of some sequence means that it should NEVER occur over a much lower sample interval than the sequence probability.
"Near equal" distribution is what what random requires. There are lots of requirements and so far the decimal expression of pi fits them all. Your search for periodicity will clearly show some results for shorter sequences. That is actually required by the rules of random sequences. But as you found, as you increase the length of the data examined, the detected periodicity fades away. It will never reach zero though (or 1 as the case may be) because once it is there, any finite sequence will still contain that. But as you increase the length of the sequence the random segments with periodicity will show up at all the other rates making them all equal as you approach infinity.
If you really want to show this, take the 10 million digit sequence and look for the actual periods in arbitrary sized pieces. Each one will show periods with different cycle lengths.
Thanks I agree on that as being likely, but also I am using the pseudo random sequence as a comparison to cancel out the effects of randomly appearing periodicity in each sequence length, and found that pi has more periodicity than the pseudorandom sequences checked so far (up to length 1million digits), even though over longer sequences it likely will converge to have the same periodicity as the pseudorandom sequence I agree.
But how do you explain the possible increased "local" periodicity of pi compared to a pseudorandom sequence?! I would like to get some real world random noise data to compare, at least 10million digits.
I think your "periodicity" is not real. You are seeing expected small apparent "predictability" which is not valid for longer sequences, or even the same across smaller sequences. You can't seem to accept that. If there were some predictability in the digits of pi, you would see the same results for random length sequences.
No, there are no periodic patterns in pi - at least, none found so far (and others have tested /far/ better than you). It has not been proven to be "normal" (meaning, in effect, that it's digits are random), but it is strongly suspected.
Any given pseudorandom sequence will, of course, have some pattern.
And you will always expect to see some difference in the compressibility of different sequences, even if they are random. The question is, are those differences statistically relevant?
For example, if you toss a perfect coin 100 times, you should not be surprised if you get 55 heads and 45 tails, and that is not an indication of bias in the coin.
I am sure google will tell you. I believe you can easily find example sequences to download for your tests.
Here's some update showing higher resolution, a sine wave pattern seems to have appeared in the graphs, some of the samples have only 5 samples for averaging so I think it may need more averaging still to show if there is a definite pattern and not just noise.
The higher resolution graphs seem to show that there might be a sine wave modulation in the bias of compression difference between pi and the pseudorandom sequence over the compressed sequence length used.
So the compression difference modulates up and down as compressed sequence length increases!
The patterns are nothing compared to the patterns in other irrationals.
e = 2.7 1828 1828 4590 4523
4 digits in a row repeat! And 45x2=90 and 45/2=23 (after a carry increments the 3).
sqrt(2) = 1.4 14 213562373095 0488 01688
2 digits repeat at the start. Then 0 and 88, first with 2^2 in the middle (0 4 88), then with 2^4 in the middle (0 16 88).
You find similar patterns in other square roots and the golden ratio (related to sqrt of 5). But those numbers are all equal to mathematical expressions. There is no mathematical expression equal to pi. It's impossible for a mathematical expression to produce randomness, but pi is defined by physical space, and physical space is intrinsically capable of randomness.
Have you found obvious periodic patterns in pi? In the first 32 digits there are no zeros and too many threes, but in the long run it appears random AFAIK, and allegedly according to statistical analysis.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.