fastest FPGA

Hello. I've a task to make attempt to crack some cryptographical hash-function by using brute-force attack. So I wish to implement it in FPGA. How can I get fastest FPGA at the modern market? Altera Nios II dev kit stratix 2 edt (EP2S60) is the right choice? By the way, are these devices (EP2S60) can be overclocked? If yes, how?

Reply to
hypermodest
Loading thread data ...

To a first approximation, by spending the most money.

No.

Yes.

By increasing the clock frequency beyond the Fmax reported by the development tools.

Reply to
Eric Smith

OK, but if to use single-chip solution?

Reply to
hypermodest

Do you intend the brute force method to use many, many parallel units? How much do you want to spend?

You may run into power issues before you can consider overclocking. You could design so much high speed logic in a huge part that you can only run x% of the part at Fmax or all of the part at x% of Fmax.

Reply to
John_H

First, you have to decide how much logic you need, i.e. how much money you want to spend. Then you have to look at the two leading manufacturers, which are -in order of size and speed- Xilinx and Altera

or -if you need lots of multipliers and/or accumulators- the appropriate size Virtex-4 SX part.

If you are after max speed, you hardly need a microprocessor, but both companies offer a soft microprocessor, it's called MicroBlaze in Xilinx.

Good luck, sounds like a fun project. Peter Alfke, Xilinx

Reply to
Peter Alfke

H.Modest,

First, any processor, soft or hard is far too slow to be of any use (be it NIOSII or 405PPC). The design will have to be done massively parallel, all in logic.

Virtex 5 is sampling now (65nm), and represents the latest that technology can offer.

There are the 5VLX30, 50, 85 and 110 sampling...

With the 5VLX110 you have:

17280 slices (Virtex-5 slices are organized differently from previous generations. Each Virtex-5 slice contains four 6 input LUTs and four flip-flops

64 DSP48E (Each DSP48E slice contains a 25 x 18 multiplier, an adder, and an accumulator.)

128 36b BRAM blocks (Virtex-5 block RAMs are fundamentally 36 Kbits in size. Each block can also be used as two independent 18-Kbit blocks)

etc.

With the clock tree supporting up to 550 MHz speeds (of course the design has to meet timing, and so on to find the speed it can really operate at).

formatting link

With the 6 LUT able to be a SRL32 (32 bit shift register), there are all sorts of tricks one can use to speed up cyrpto algorithms. Combined with the DSP48E, I suspect that Virtex 5 will see some of the speediest encryption and decryption cores in the future.

Farms of FPGAs to do brute force decryption have been proposed, and some have actually been built (or so the conference papers claim). The use of FPGA farms for decryption is no longer "new" and may be the reason why triple DES is no longer recommended for new equipment (cracking

2E112 brute force is now considered "too easy"?). Even AES128 is being skipped for new designs by those who feel that the difference between 2E112, and 2E128 is just not enough...(for example, we use AES256 for our bitstream decyptor).

There is even a cracking farm that proposes the use of low cost Spartan

3 FPGAs (cracking on a budget?) at modest speeds. The Spartan 3 is probably three times slower (at best), but since a cracking farm requires very little communications, being massively parallel means just having many devices. Why not pick the least expensive device, and just use a ton of them?

If you desire the fastest possible logic, with the lowest power, right now the Virtex 5 65nm FPGA can not be beat (as there is no other 65nm offering at this time by anyone, anywhere).

Until there is something to compare it with, you really have only one choice.

There is no such thing as "over-clocking" a FPGA: either it meets timing and works, or it doesn't. You may have to have very exotic cooling in order not to melt down the device, at speeds like 550 MHz with all of the logic toggling. The Industrial temp spec is the junction must be kept below 100C. Commercial grade must be kept below 85C.

You could increase the clock rate till the device fails to operate correctly, or can not be cooled, but in this application it would be very difficult to know if it wasn't operating correctly! Best to design it to work where it is supposed to work.

Austin

hypermodest wrote:

Reply to
Austin Lesea

Woah. Is that really the case? These "farms" must be enormous-- if you have 2 million instances of your cracking units running at, say, 500 MHz (both of which seem sortof optimistic to me), you're still "only" doing a quadrillion attempts per second. That's still many, many orders of magnitude off the search space in any reasonable amount of time. Obviously, I'm not a crypto guy, but are these proposal's really for "brute force" farms?

--Josh

PS- too lazy to g

85C.
Reply to
Josh Model

I got a few questions for you:

1st: do you really need to brute force it? 2nd: how much time do you got? 3rd: budget? I assume you are not working for NSA or similar YAT (Yet Another TLA) 4th: do you really have the ambition to learn FPGA development for a simple homework?

Given the simplicity of the algorithm and given that your search space contsains "only" 2**64 keys, yes it can be cracked... but I am pretty sure that your simple hash function can be cracked by other means than brute force.

Having said that, there are a lot of people on the Internet (some even on this newsgroup) doing this kind of thing with very cheap FPGAs. I am sure the Xilinx/Altera/Lattice/Actel/Quicklogic/Ateml guys are more than happy to point out that their latest FPGA is the best one on the market. But the question is, even if you could afford the 10,000 USD it would cost, would you really need it? could you really handle that beast?

regards, -Burns

Reply to
burn.sir

Right on John ... :)

As I've noted before, you seriously need to "derate" large Xiinx FPGAs for designs that have very high percentage of active logic. Since they assume around 15-20% of the design will be active, it's very easy to be unable to get power into the devices within the spec'ed voltage margins, or to keep it cool if you do. Doing single point thermal monitoring on the die, may not be enough to readily identify that other portions of the die are well above that limit temp once you are agressively cooling the part.

Packing the device from edge to edge with active logic will cause problems. At both high speeds and high density, many of the larger parts are simply not usable.

Reply to
Totally_Lost

Hej,

That farm Austin is talking about sounds like our project ;-) Have a look at it at

formatting link
There you can find some conference papers as well.

Right. But it's cheap.

Exactly. We have a system build of 120 Spartan3-1000 FPGAs doing all communication in full parallel on a 64-bit backplane.

[promotion] Currently we are working on a application development framework for both, the FPGAs and the host computer. If you are interested in the platform, feel free to drop me an email or contact Jan Pelzl instead (pelzl @ crypto.rub.de without the blanks, of course), he is responsible for the project. [/promotion]

Cheers /Chris

--
Christian Schleiffer
Communication Security (COSY)
Dept. of Electr. Eng. & Information Science
Ruhr-University Bochum, Germany
http://www.crypto.rub.decschleiffer@crypto.rub.de
Reply to
Christian Schleiffer

sorry, make that 2**60. Of course, a professional cryptographer could get that down to 2**10 or so in matter of minutes

Speaking of those shady types, I happen to know a really good one. I can ask him for help in exchange for that EP2S60 board ;)

(kidding, kidding. besides, who wants a FPGA that is not supported in the QII webpack?)

-Burns

Reply to
burn.sir

I have no other idea :)

Let's say, one week..

Between $1k and $2k.

Yes, FPGA is really interesting area for me.

You're probably right. But good and modern crypto hash-functions are designed exactly to prevent any cryptanalysis, making brute-force as last resort. Also, I have no enough knowledge in cryptography enough to do complex analysis.

I don't know yet..

Reply to
hypermodest

Between $1k and $2k.

I'm not the pioneer :-)

formatting link

Reply to
hypermodest

It will be cool to use as many parallel units as they will fit in single FPGA chip.

Between $1k and $2k.

Reply to
hypermodest

I cannot do any cryptanalysis yet.. And I need to test about 2^48 values. In another words, I need counter, I need block taking counter value at input and generating hashed value at output, I need comparator to test if the result is correct and I need something to stop all this stuff and begin to yell :-)

Reply to
hypermodest

Well, lets say you have a search space of 2**60. You create a hash block that tests one key every clock cycle. you clock your system at 200 MHZ, and you put 256 of these blocks in the same FPGA:

2**60 / (200 000 000 * 256 * 3600 * 24 * 7) ~= 37 weeks (*)

but for 2**48 you wil need only 1.5 hours.

Of course, if you are totally new to FPGAs, there is no chance in hell that you can design a cracker so powerful in a week either.

I know people do this for $99 (Xilinx S3 start kit). And I know people do it even better for $300-500 (some Terasic Altera kit).

I think some major security chip was cracked using the former kit. But dont expect to break AES, 3DES or even vanilla DES with that kind of hardware.

Me too. I could do [and too often do] things like this just to learn new stuff.

But are you here to learn or to solve a problem?

The hash function you presented in sci.crypt looks anything but modern. I wouldnt even use it to hash strings into a hash-table :)

If you just want to learn, get a simple starter kit from Xilinx or Altera. But dont expect to master them in a month or so. The stratix kit is overkill for you (and most of us for that matter).

And while we are at it, AES128 is still safe. I bet the X-men will disagree, but mostly for marketing reasons :)

burns

  • I have a feeling i did something wrong, these calculation use to end up with answers around millions of years or so :(
Reply to
burn.sir

I'm here to learn by solving problems :-)

Anyhow, no one answering me there. "+" operation in one place is making vacuum in my idea-generating part of mind.

That's why I'm here, to decide what to take.

Reply to
hypermodest

Again, I have no idea how to do this, but having hope though.

Huh, no thanks.

What is QII webpack?

Reply to
hypermodest

hypermodest schrieb:

So to get that job done in a week you need to evaluate 500 Million hash functions a second?

These are 48-bit hashes? What is your hash function?

MD5 is almost 1gpbs in about 1000 luts. You need something like 25 instances of that. Easy.

Kolja Sulimma

Reply to
Kolja Sulimma

I've packed some big FPGAs pretty densely with designs working at some pretty high clock rates (usually in the neighborhood of the maximum a 16 bit carry chain with inputs registered in adjacent slices can be clocked in that device) and have never had to derate a part. I've had a few that I needed to aggresively cool, but none that I've ever had to derate. Most of those designs required hand placement and careful design to meet timing. I certainly wouldn't call the large parts unusable for dense high performance designs. Demanding of careful design, yes. Unusable, no.

Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.