algorithm that can distinguish between a sequence of PI digits and a sequence of pseudorandom digits of equal length

J

Jamie M 10 years ago

Hi,

Is it possible to distinguish between digits of PI and digits of a pseudorandom sequence, without using any PI digit lookup table or calculating digits of PI? For example, a 100,000 digit sequence of consecutive digits in PI from any starting point compared to a sequence of 100,000 pseudorandom digits.

cheers, Jamie

Vote

J

Jamie M 10 years ago

I have an algorithm I made that gets something around a ~1% bias, varying on comparison sequence length, when comparing digits of PI to digits of my pseudorandom sequence so was just curious! :D

cheers, Jamie

Vote

J

John Larkin 10 years ago

I read somewhere that the digits of pi pass all known tests for randomness. So, no.

Vote

J

Jamie M 10 years ago

Hi,

There are definitely some periodic patterns in pi, the algorithm I made is a compression algorithm that compresses based on periodicity, and pi is about 1% more compressed than the pseudorandom sequence up to the number of digits I have been able to analyze so far,

1million digit sequence length.

I tested it at many different scales and starting points in the sequence, here are the results if interested:

The column on the left, 0, 100, 1000, 10000 etc.. is the first element of the sequence that was compressed, and then the top row:

100, 1000, 10000 (up to 800000) is the total length of consecutive digits that were compressed.

pseudorandom sequence results:

formatting link

pi sequence results:

formatting link

graph showing the percent different compression of the pi sequences versus the pseudorandom sequences:

formatting link

I would like to test it on some sequences with longer lengths than

1million, ideally up to 10million at least, to see what that curve does.

My compression algorithm can identify periodic patterns in sequences so that is what was detected in the pi sequence, whereas the pseudorandom sequence is the noisiest sequence I compressed so far.

If anyone has a sequence they think might be noisier (less periodic) than the pseudorandom sequence I tested please send me a link to it and I will compress it to see how it compares to the pseudorandom one.

Also any other sequences considered random that you think might have some patterns are welcome :D

I will release the algorithm opensource at some point if it is useful.

cheers, Jamie

Vote

J

Jamie M 10 years ago

Hi,

It looks like that as the comparison sequence lengths increase, the PI digits appear closer to random, ie in the graph on the right here:

formatting link

There is an apparent decay towards 1 which if reached would be an equal compression in my algorithm for pi and the pseudorandom sequence. I tested up to 1million digits just now and there is further decay towards 1, 800000 digit sequence had a 1.01284538 ratio difference and 1million digit sequence compression had a 1.008441754 ratio difference between the pi and pseudorandom sequence. For a

10million length sequence I think it will approach 1 but I don't know for sure.

Anyway, for sequences length ranges between 10,000 and 1million, there is a clear difference between the compression for pi and the pseudorandom sequence, tested over multiple starting ranges within 1 million digit sequences, so to check if a sequence is from pi or from a pseudorandom sequence, it can be subdivided if necessary into 1million digit lengths and checked with this algorithm.

The fact that there is a small periodicity in the pi sequence that diminishes with sequence length is interesting I think.

cheers, Jamie

Vote

J

Jamie M 10 years ago

Here is a very low probability of occurrence semi-periodicity in the digits of pi from the page:

formatting link

4444444444444 : from 917,885,346,865-th of 1/pi 4444444444444 : from 1,828,219,364,949-th of 1/pi

Thirteen consecutive digits of 4 in the sequence of 1/pi, and they repeat again at 1.99177 (almost 2x) the sequence index.

cheers, Jamie

Vote

A

Adrian Jansen 10 years ago

You misunderstand the nature of probability if you think that a low probability of occurrence of some sequence means that it should NEVER occur over a much lower sample interval than the sequence probability.

Vote

D

DAB 10 years ago

Have you seen this:

Vote

J

Jamie M 10 years ago

Hi,

Ya was reading that today hehe

cheers, Jamie

Vote

J

Jamie M 10 years ago

Hi,

Just because there is a near equal (approaching equal) distribution of digits 1 to 9 in pi, doesn't mean that it can't be partly periodic still too.

cheers, Jamie

Vote

D

DAB 10 years ago

I stumbled on it yesterday.

Vote

R

rickman 10 years ago

"Near equal" distribution is what what random requires. There are lots of requirements and so far the decimal expression of pi fits them all. Your search for periodicity will clearly show some results for shorter sequences. That is actually required by the rules of random sequences. But as you found, as you increase the length of the data examined, the detected periodicity fades away. It will never reach zero though (or 1 as the case may be) because once it is there, any finite sequence will still contain that. But as you increase the length of the sequence the random segments with periodicity will show up at all the other rates making them all equal as you approach infinity.

If you really want to show this, take the 10 million digit sequence and look for the actual periods in arbitrary sized pieces. Each one will show periods with different cycle lengths.

Rick

Vote

J

Jamie M 10 years ago

Hi,

Thanks I agree on that as being likely, but also I am using the pseudo random sequence as a comparison to cancel out the effects of randomly appearing periodicity in each sequence length, and found that pi has more periodicity than the pseudorandom sequences checked so far (up to length 1million digits), even though over longer sequences it likely will converge to have the same periodicity as the pseudorandom sequence I agree.

But how do you explain the possible increased "local" periodicity of pi compared to a pseudorandom sequence?! I would like to get some real world random noise data to compare, at least 10million digits.

cheers, Jamie

Vote

R

rickman 10 years ago

I think your "periodicity" is not real. You are seeing expected small apparent "predictability" which is not valid for longer sequences, or even the same across smaller sequences. You can't seem to accept that. If there were some predictability in the digits of pi, you would see the same results for random length sequences.

Rick

Vote

D

David Brown 10 years ago

No, there are no periodic patterns in pi - at least, none found so far (and others have tested /far/ better than you). It has not been proven to be "normal" (meaning, in effect, that it's digits are random), but it is strongly suspected.

Any given pseudorandom sequence will, of course, have some pattern.

And you will always expect to see some difference in the compressibility of different sequences, even if they are random. The question is, are those differences statistically relevant?

For example, if you toss a perfect coin 100 times, you should not be surprised if you get 55 heads and 45 tails, and that is not an indication of bias in the coin.

I am sure google will tell you. I believe you can easily find example sequences to download for your tests.

Vote

D

David Eather 10 years ago

to be 90% certain you are detecting a bias and not a random artifact you would need to test something like 10,000 samples

Vote

J

Jan Panteltje 10 years ago

On a sunny day (Tue, 27 Oct 2015 18:04:37 -0700) it happened Jamie M wrote in :

The circle is round. If Pi was not what it is it would not be round and you could tell where you were at the surface.

Vote

J

Jamie M 10 years ago

Hi,

Here's some update showing higher resolution, a sine wave pattern seems to have appeared in the graphs, some of the samples have only 5 samples for averaging so I think it may need more averaging still to show if there is a definite pattern and not just noise.

Previous low resolution graphs:

formatting link

New higher resolution graphs:

formatting link

Updated data:

formatting link

cheers, Jamie

Vote

J

Jamie M 10 years ago

The higher resolution graphs seem to show that there might be a sine wave modulation in the bias of compression difference between pi and the pseudorandom sequence over the compressed sequence length used.

So the compression difference modulates up and down as compressed sequence length increases!

formatting link

cheers, Jamie

Vote

T

Tom Del Rosso 10 years ago

The patterns are nothing compared to the patterns in other irrationals.

e = 2.7 1828 1828 4590 4523

4 digits in a row repeat! And 45x2=90 and 45/2=23 (after a carry increments the 3).

sqrt(2) = 1.4 14 213562373095 0488 01688

2 digits repeat at the start. Then 0 and 88, first with 2^2 in the middle (0 4 88), then with 2^4 in the middle (0 16 88).

You find similar patterns in other square roots and the golden ratio (related to sqrt of 5). But those numbers are all equal to mathematical expressions. There is no mathematical expression equal to pi. It's impossible for a mathematical expression to produce randomness, but pi is defined by physical space, and physical space is intrinsically capable of randomness.

Have you found obvious periodic patterns in pi? In the first 32 digits there are no zeros and too many threes, but in the long run it appears random AFAIK, and allegedly according to statistical analysis.

What kind of compression algorithm?

Vote

algorithm that can distinguish between a sequence of PI digits and a sequence of pseudorandom digits of equal length

Join the Discussion

Didn't find your answer?