encoder and decoder for compact representation of the prime numbers

Hi,

The prime numbers can be stored with a density approaching 1 prime per bit by using a primorial set of equations. When the primes are compressed like this, any patterns in the distribution of the primes becomes apparent to see compared to the uncompressed primes in the natural numbers. Compression algorithms are related to pattern recognition so this maximally compact representation of the primes will have the best chance to find patterns in the prime numbers.

The way it works is using a primorial set of formulas to encode the primes into a binary sequence. If the prime occurs at a given index it is a 1 in the sequence and if there is no prime at a given index then there is a 0. The decoder only needs to know which specific primorial was used to encode the binary sequence, and it can then decompress the binary sequence directly into the prime numbers.

Here is an example of compressing the primes into one bit per prime using primorial 30

First step is to find all coprimes of primorial 30 and remove all the unique prime factors.

So for primorial 30 this creates a list with 8 elements:

1,7,11,13,17,19,23,29

This has a count of 8 and that is the maximum number of primes that can occur in any 30 digit spacing to infinity with the digit spacings being ie 1 to 30, 31 to 60, 61 to 90 etc.. That is an interesting result in itself to show that there can only be at most 8 primes in any block of 30 numbers in that sequence.

The primorial list of 8 elements are used to create this set of equations:

y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29

These give the offsets within each 30 number block of where the prime numbers can occur, ie the primes can only occur at those 8 specific offset (1,7,11,13 etc) locations in each 30 number block, except for the first block that has primes 2,3,5 (prime factors of 30)

Now that a primorial has been chosen and the list of elements have been found, the next step to create the binary sequence is to use each of the eight equations to set the bits in the binary sequence to 1 or 0 depending on if there is a prime at each location. Ie for y = 30x + 1 for x=0 to x=maxSize, check all y if they are prime and create the binary sequence using all 8 equations.

Here is how the primes are compressed with this primorial 30 binary compression:

input primes:

2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113

output binary sequence before being written has 8 bits per block of 30, ceil(113/30)=4 blocks required max = 8*4= 32 bits

block1 block2 block3 block4

00000000 00000000 00000000 00000000

Each block corresponds to a consecutive 30 number group 1 to 30, 31 to

60, 61 to 90, 91 to 120

The prime factors of 30 which are 2,3,5 aren't encoded into the block,

So to fill block1, set x to zero and check each y if it is prime:

y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29

y=1 y=7 y=11 y=13 y=17 y=19 y=23 y=29

All are prime except for 1, so the first 8bit binary block looks like this:

01111111

For the second binary block do the same thing but set x to 1 and check each y if it is prime:

y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29

y=31 y=37 y=41 y=43 y=47 y=49 y=53 y=59

All are prime except for 49, so the second 8bit binary block looks like this (where the zero indicates offset 30x+19=49 is not prime)

11111011

For the third binary block do the same thing but set x to 2 and check each y if it is prime:

y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29

y=61 y=67 y=71 y=73 y=77 y=79 y=83 y=89

All are prime except for 77, so the third 8bit binary block looks like this (where the zero indicates offset 30x+17=77 is not prime)

11110111

For the fourth binary block do the same thing but set x to 3 and check each y if it is prime:

y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29

y=91 y=97 y=101 y=103 y=107 y=109 y=113 y=119

All are prime except for 91 and 119 so the fourth 8bit binary block looks like this (where the zeros indicate offsets 30x+1=91 and 30x+29 are both not prime)

01111110

So all four blocks can be put together (or constructed up to "maxsize")

block1 block2 block3 block4

00000000 00000000 00000000 00000000

becomes:

block1 block2 block3 block4

01111111 11111011 11110111 01111110

This encodes all primes up to 119 with just 32bits stored.

It is interesting to look for patterns in this maximally compressed representation of the primes, ie with an additional compression algorithm applied to it etc, or by turning it into a raster with each bit representing a pixel or each byte representing a greyscale pixel etc (just for fun)

SECOND STEP:

To decode the binary sequence it must be know what primorial was used to create it and then it can be decoded. So ideally the sequence will be stored as a file named: compressedPrimes30Primorial.bin etc to indicate which primorial was used, as different primorials will make different binary sequences of compressed primes (larger primorials will encode the primes more efficiently, as the ratio of the count of coprimes of large primorials to the primorial size approaches log(n), just like the prime number counting formula.

So once the primorial is known and a binary sequence to decode exists, the first step is to regenerate the primorial 8 equations again, just like the encode step above, by finding the coprimes of the primorial, and removing the prime factors to get the same 8 equations:

y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29

The decoder will provide a "lookup table" of all the primes with the highest compression possible.

First step is to decode block1, so set x=0 and then take the block1 compressed digits: 01111111

y = 30x + 1 y = 30x + 7 y = 30x + 11 y = 30x + 13 y = 30x + 17 y = 30x + 19 y = 30x + 23 y = 30x + 29

The 8bits serves as a mask to remove the first equation, and then solve the equations for x=0

y = 30x + 1 (masked - no calculation done as not prime) y = 7 y = 11 y = 13 y = 17 y = 19 y = 23 y = 29

So before entering those primes first the special prime factors of primorial 30 (2,3,5) have to be entered into the prime list being constructed, then the 7 new primes can be added to give the sequence of the primes up to 29 (just from 8 encoded bits!)

2,3,5,7,11,13,17,19,23,29

Now more primes can be appended on or looked up at a desired range by using the compressed binary sequence.

Ie to add on more primes set x=1 and then take the block2 compressed digits: 11111011

The 8bits serves as a mask to remove the sixth equation, and then solve the equations for x=1

y = 31 y = 37 y = 41 y = 43 y = 47 y = 30x + 19 (masked - no calculation done as not prime) y = 53 y = 59

Ok so now we can add 7 more primes.

2,3,5,7,11,13,17,19,23,29 becomes 2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59

looks good so far just like the primes so the decoder works fine at least on paper. Using a very large primorial is a good idea too as the prime density per bit (or bits required per prime) is more compact.

Very large primes can still be encoded with just a single bit, and only BigInteger math is required once in the last step for the formula y = m * x + b but the binary sequence size is small! :D

cheers, Jamie

Reply to
Jamie Morken
Loading thread data ...
7,19,23,29,31,37,41,43,47,53,59

As the density of primes drops asymptotically towards zero for large primes, it doesn't matter how large a number of you choose, the packing efficiency will also drop towards zero.

Sylvia.

Reply to
Sylvia Else

I do not think so, you could just give n for the nth prime. This is the most compact representation. Decoding will be difficult, of cou rse. If you cannot access a full table you could store only the primes for power s of two, or similar, then you have a starting point for the search.

Andreas

Reply to
acd

Hi,

Ya the packing density is optimal for an infinite primorial as there are log(n) coprimes for the infinite primorial (just the sequence of primes)

This is similar to base(e) being the highest information density base, ie it is higher than base2 or base3 (base3 is the highest information density storage integer)

cheers, Jamie

Reply to
Jamie Morken

You can't just treat infinity as if it were a number. It's not. There is no such thing as an infinite primorial. All primorials are finite.

Sylvia.

Reply to
Sylvia Else

A primorial number is a product of primes, if there are infinite primes, then there is a primorial that is the product of them all.

cheers, Jamie

Reply to
Jamie Morken

I found lots of the same stuff of what I was talking about for encoding and decoding the primes:

formatting link

formatting link

formatting link

formatting link

formatting link

formatting link

cheers, Jamie

Reply to
Jamie Morken

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.