Hi,
The prime numbers can be stored with a density approaching 1 prime per bit by using a primorial set of equations. When the primes are compressed like this, any patterns in the distribution of the primes becomes apparent to see compared to the uncompressed primes in the natural numbers. Compression algorithms are related to pattern recognition so this maximally compact representation of the primes will have the best chance to find patterns in the prime numbers.
The way it works is using a primorial set of formulas to encode the primes into a binary sequence. If the prime occurs at a given index it is a 1 in the sequence and if there is no prime at a given index then there is a 0. The decoder only needs to know which specific primorial was used to encode the binary sequence, and it can then decompress the binary sequence directly into the prime numbers.
Here is an example of compressing the primes into one bit per prime using primorial 30
First step is to find all coprimes of primorial 30 and remove all the unique prime factors.
So for primorial 30 this creates a list with 8 elements:
1,7,11,13,17,19,23,29This has a count of 8 and that is the maximum number of primes that can occur in any 30 digit spacing to infinity with the digit spacings being ie 1 to 30, 31 to 60, 61 to 90 etc.. That is an interesting result in itself to show that there can only be at most 8 primes in any block of 30 numbers in that sequence.
The primorial list of 8 elements are used to create this set of equations:
y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29
These give the offsets within each 30 number block of where the prime numbers can occur, ie the primes can only occur at those 8 specific offset (1,7,11,13 etc) locations in each 30 number block, except for the first block that has primes 2,3,5 (prime factors of 30)
Now that a primorial has been chosen and the list of elements have been found, the next step to create the binary sequence is to use each of the eight equations to set the bits in the binary sequence to 1 or 0 depending on if there is a prime at each location. Ie for y = 30x + 1 for x=0 to x=maxSize, check all y if they are prime and create the binary sequence using all 8 equations.
Here is how the primes are compressed with this primorial 30 binary compression:
input primes:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113output binary sequence before being written has 8 bits per block of 30, ceil(113/30)=4 blocks required max = 8*4= 32 bits
block1 block2 block3 block4
00000000 00000000 00000000 00000000Each block corresponds to a consecutive 30 number group 1 to 30, 31 to
60, 61 to 90, 91 to 120The prime factors of 30 which are 2,3,5 aren't encoded into the block,
So to fill block1, set x to zero and check each y if it is prime:
y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29
y=1 y=7 y=11 y=13 y=17 y=19 y=23 y=29
All are prime except for 1, so the first 8bit binary block looks like this:
01111111For the second binary block do the same thing but set x to 1 and check each y if it is prime:
y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29
y=31 y=37 y=41 y=43 y=47 y=49 y=53 y=59
All are prime except for 49, so the second 8bit binary block looks like this (where the zero indicates offset 30x+19=49 is not prime)
11111011For the third binary block do the same thing but set x to 2 and check each y if it is prime:
y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29
y=61 y=67 y=71 y=73 y=77 y=79 y=83 y=89
All are prime except for 77, so the third 8bit binary block looks like this (where the zero indicates offset 30x+17=77 is not prime)
11110111For the fourth binary block do the same thing but set x to 3 and check each y if it is prime:
y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29
y=91 y=97 y=101 y=103 y=107 y=109 y=113 y=119
All are prime except for 91 and 119 so the fourth 8bit binary block looks like this (where the zeros indicate offsets 30x+1=91 and 30x+29 are both not prime)
01111110So all four blocks can be put together (or constructed up to "maxsize")
block1 block2 block3 block4
00000000 00000000 00000000 00000000becomes:
block1 block2 block3 block4
01111111 11111011 11110111 01111110This encodes all primes up to 119 with just 32bits stored.
It is interesting to look for patterns in this maximally compressed representation of the primes, ie with an additional compression algorithm applied to it etc, or by turning it into a raster with each bit representing a pixel or each byte representing a greyscale pixel etc (just for fun)
SECOND STEP:
To decode the binary sequence it must be know what primorial was used to create it and then it can be decoded. So ideally the sequence will be stored as a file named: compressedPrimes30Primorial.bin etc to indicate which primorial was used, as different primorials will make different binary sequences of compressed primes (larger primorials will encode the primes more efficiently, as the ratio of the count of coprimes of large primorials to the primorial size approaches log(n), just like the prime number counting formula.
So once the primorial is known and a binary sequence to decode exists, the first step is to regenerate the primorial 8 equations again, just like the encode step above, by finding the coprimes of the primorial, and removing the prime factors to get the same 8 equations:
y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29
The decoder will provide a "lookup table" of all the primes with the highest compression possible.
First step is to decode block1, so set x=0 and then take the block1 compressed digits: 01111111
y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29
The 8bits serves as a mask to remove the first equation, and then solve the equations for x=0
y = 30x + 1 (masked - no calculation done as not prime) y = 7 y = 11 y = 13 y = 17 y = 19 y = 23 y = 29
So before entering those primes first the special prime factors of primorial 30 (2,3,5) have to be entered into the prime list being constructed, then the 7 new primes can be added to give the sequence of the primes up to 29 (just from 8 encoded bits!)
2,3,5,7,11,13,17,19,23,29Now more primes can be appended on or looked up at a desired range by using the compressed binary sequence.
Ie to add on more primes set x=1 and then take the block2 compressed digits: 11111011
The 8bits serves as a mask to remove the sixth equation, and then solve the equations for x=1
y = 31 y = 37 y = 41 y = 43 y = 47 y = 30x + 19 (masked - no calculation done as not prime) y = 53 y = 59
Ok so now we can add 7 more primes.
2,3,5,7,11,13,17,19,23,29 becomes 2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59looks good so far just like the primes so the decoder works fine at least on paper. Using a very large primorial is a good idea too as the prime density per bit (or bits required per prime) is more compact.
Very large primes can still be encoded with just a single bit, and only BigInteger math is required once in the last step for the formula y = m * x + b but the binary sequence size is small! :D
cheers, Jamie