Thanks for the comments (Andrew & JJ). Let me explain a bit. You asked for this! :-)
I don't need an FFT because I don't need to multiply numbers. The sieving can be performed on 'small' integers (64-bits gives a very large scope) using addition, subtraction, left shifting by one bit and comparisons.
No multiplication and no floating-point makes it much easier.
Proth sieving has some unique characteristics that make it different from the Mersenne primes research.
The main goal is to maximise numbers sieved based on time. This splits into two seperate goals:-
- Maximising the speed of each individual siever.
- Maximising the number of sievers on the FPGA.
Obviously pipelining may be required to complete one iteration of the loop within one clock cycle, but then that may increase the number of slices used by each siever, bringing down the number of sievers that I can place on the FPGA.
I'm playing around with a Spartan-3 starter kit as that is the only platform I have for development. This is a hobby project (not University project), if I can get reasonable performance then I may consider moving up to a larger FPGA with more gates (to get more sievers).
I'll only ever be working with dev kits. I don't intend (unless the performance is stunning) that I'd ever consider to build anything with multiple FPGAs, although my inner-geek would love to have a box with lots of blinkenlights and serious amounts of processing power.
The 50MHz comment was purely my brain not working. I had glanced at the description and the mentioning of a 50MHz oscillator had stuck in my brain. I then assumed that the final solution would be limited to 50MHz. I've no idea how quick an XC3S200 can actually run, or would run with this 'design'. (could anyone give a rough guess?)
As for the actual problem I want to solve, here's a 'quick' description of Proth sieving and why I want to do it.
Proth numbers are numbers of the form k*2^n+1 where
2^n > k. k should be odd (for obvious reasons).
A Sierpinski number (of the second kind) is a value of k so that k*2^n+1 is composite for all k.
k=78557 will always produce composite numbers for any positive n you pick. This has been proven.
To prove that a k is not a Sierpinski number one must find a value for n such that k*2^n+1 produces a prime number.
The Sierpinski conjecture is that 78557 is the lowest value of k possible. Of the 39278 other odd candidates, primes have been found for all but 9.
The conjecture is regarded as true by most people and so primes for the other 9 k-values are expected to be found.
There's more on the Sierpinski problem here:-
formatting link
(That URL looks a bit suspect but trust me on this. There used to be 17 candidate k-values but the above project has found primes for 8 of them.)
The above site is running a distributed attack (a la SETI, Cancer@Home, etc) to find primes.
The numbers involved are extremely large. The most recent prime was found for k=27653 at n=9167433.
That's 27653*2^9167433+1, a nice 2,759,677 digits in decimal.
Trying to work with those kinds of things on a FPGA would be madness.
Primality of these numbers can be proven with a Proth Test.
formatting link
can be useful too if you want to read up any more on this.
Trying to prove primality of 2.7 million digit numbers is not an easy task. It can take a month on a 3GHz Intel P4. The equation involved is: let N=k*2^n+1 and if:- a^((N-1)/2) mod N = N-1 then N is prime. The value of a used is usually the lowest odd prime that is not a factor of k. Since the remaining k-values being tested are all prime then a=3 usually works.
Now consider the info above. In the above case N is a 2.7 million digit number. (N-1)/2 may have the same, or one digit less than N. Still an insanely large number.
3^99 (a 2 digit number) results in a 49 digit number.
3^999 (a 3 digit number) results in a 478 digit number.
Now consider 3^N where N has 2.7 million digits.
Big.
Anyway, on to the sieving.
So the project has produced a large number of k,n pairs that need to be tested. Each Proth test takes a long time.
Sieving is the process of removing candidate numbers by trial division with known primes. For example, one of my computers is checking the range of primes from 882000000000000 to 883000000000000 and, amongst others, it's found that 882347066940847 is a factor of 24737*2^80174983+1.
It has found 17 unique factors in the week that it has been running. That's 17 k,n pairs that do not need to be proth tested. Those proth tests would have run for 17 months on the same machine only to not find any primes.
This is the value of sieving.
OK, so an FPGA won't be good at dealing with large (i.e.
10 million bit) numbers. So why am I here clogging up this newsgroup with my ramblings?
It's because Proth numbers (k*2^n+1) have some properties that make sieving easy.
Say I'm checking when k=10223 and my prime of choice is p=882319864863617.
I compute the remainder of (k*2)+1 mod p. Which, when testing such large primes, is simply (k*2)+1.
r = 10223*2+1 = 20447 (that's for n=1)
Starting with (k*2^n)+1
- subtract the 1 we added on =k*2^n
- double the number (k*2^n)*2 = k*2^(n+1)
- add one = (k*2^(n+1))+1
More importantly, these actions can be performed on the remainder without having to keep track of the actual value of k*2^n+1. So:-
r_next = (2*(r-1))+1 mod p = 2r-1 mod p
The 'mod p' part is quite expensive. Since we are only dealing with a doubling of the previous r (which we knew was guaranteed to be less than p) we know that, at most, r could grow to less than 2p. This is why 'mod p' is replaced by a simple subtraction of p if r >= p.
If r is ever equal to 1 then it will always remain 1, we can stop processing (for the FPGA I wouldn't even bother passing this on as a work unit).
If r is ever equal to 0 then it means that p is factor of k*2^n+1 at whatever n the algorithm has iterated to. We can stop here and record the value of n.
So with relatively little effort we can check for factors for very large values of n without having to keep track of the large (multi-million digit) values. The snag is that we have to compute r for every n along the way.
This is why I can't run a coarser run by skipping values. Each iteration must be performed and the value of r checked. There is no way to predict future values of r without computing them.
There's another stop condition, if we ever get back to the r value we started at (when n=1) then we know we can stop. The r values will just follow the same pattern again. At this point I'd like to stop and record the fact that no factor was found (otherwise I would have stopped before) for this k,p pair.
The modulus operation involved has an important bearing on the size of this loop. Say I'm checking a number against p=3. Consider all of the possible values of r and what they transform to:-
2x-1 mod 3
0 -> (2*0)-1 mod 3 == 2.
1 -> (2
*1)-1 mod 3 == 1.
2 -> (2*2)-1 mod 3 == 0.
The maximum number of values r can have is p, but r=1 is a special case, so the maximum loop size would be p-1.
Some values of p can produce smaller sub-loops.
2x-1 mod 7:
0-> (2*0)-1 mod 7 == 6.
1-> (2
*1)-1 mod 7 == 1.
2-> (2*2)-1 mod 7 == 3.
3-> (2
*3)-1 mod 7 == 5.
4-> (2*4)-1 mod 7 == 0.
5-> (2
*5)-1 mod 7 == 2.
6-> (2*6)-1 mod 7 == 4.
0->6->4->0
1->1
2->3->5->2
So if my initial value of r had been 2 I could loop endlessly without ever finding a factor (r=0). This is why it's important to check to see if we've returned to the initial value of r.
The sieving effort involved in the SoB project above has put an arbitrary limit on the upper bound of n of 50 million. Proth testing a k,n pair where n~50M would take a long long time, but sieving up to that range is relatively cheap. If, two years down the road, the project is looking at n values up at that range we'll be thankful that we've sieved up there.
As I said above, the maximum size of the loop is p-1. So for p=882319864863617 this would mean a maximum loop size of 882,319,864,863,616.
This is why 99.9% of the runs will completely iterate up to n=50,000,000.
It's actually closer to 99.999999%. From the numbers above there is a 1 in 17,646,397 chance that p will be a factor of k*2^n+1 with n between 1 and 50,000,000. (This isn't strictly true due to the possibility of smaller sub-loops, but gives you an idea of the numbers involved).
Also, by recording the value of r when I do reach the end of the loop it means I can continue this at a later date. I'd only need r at n=1 (trivial to compute) and the value of r at a given n and I can continue on my merry way.
The faster we can sieve, the faster we can rule out more and more k,n pairs. The less pairs we have the faster sieving becomes (although only slightly) and so we can sieve out higher and higher primes.
*THIS* is why I'm interested in an FPGA helping out. They have the capability of being much much faster than traditional CPUs.
If you've read this far then thank you. Reward yourself with a coffee/tea/walk/cigarette/nap/whatever.
Ta, -Alex