Looking for fast AES cores with low latency

Hi,

Since the initial rash of AES / Rijndael cores a few years ago, I haven't seen much research at the high speed end.

Does anyone know how low the latency is for a recent high-end core in a current FPGA family? A quick web search reveals plenty of heavily pipelined implementations with poor latency, but none that are really quick in terms of latency.

Thanks, Allan

Reply to
Allan Herriman
Loading thread data ...

What kind of frequency / latency are you looking for ?

Most core can pretty easily be "de-pipelinined" to diminish latency but degrade frequency ...

Sylvain

Reply to
Sylvain Munaut

I realized the AES algorithm several months ago and tried to find out the highest frequency. However, using the GF calculation, the cost of FPGA resource may be less.

Reply to
IDDLife

Hi Allan, the minimum latency of an AES-Core (at a reasonable clock frequency) is limited by the number of rounds (iterations) needed. That number depends mainly on the keylength. 128 Bit Key : Round Number 10 192 Bit Key : Round Number 12 256 Bit Key : Round Number 14

There is an initial Round 0, but the latency of that can be eliminated by design. So the latency for a simple AES-128 Core will always be at least 10 clock cycles. If you have enough chip area to unroll the rounds, only the initial latency (for the first conversion) needs that number of clock cycles. All following blocks are calculated on each following clock cycle because of the data pipelining in the unrolled architecture.

You may take a look at this paper:

formatting link

Please keep in mind that the clock frequencies given in this paper are examples only for the old Virtex-E FPGAs. Actual FPGAs perform much better.

Best regards Eilert

Allan Herriman schrieb:

Reply to
backhus

I did some tests today...

I unrolled our (conventional) 14 round implementation into one big mess of combinatorial logic with FFs at either end and ran it through the tools:

V5, using 8.2 software: Par spat the dummy after six hours, claiming it was too hard. I added a bunch of area constraints. It's still running.

StratixII gave sixty-something ns (=14MHz clock) in the slowest speed grade, but that was without timing constraints. A version with a 30ns clock constraint is still running.

14MHz results in feedback modes giving about 1.8Gb/s encryption throughput. I guess that's enough for GbEthernet, but we already know GbE can be done with a conventional pipelined AES implementation.

I'll post tomorrow on the results.

Regards, Allan

Reply to
Allan Herriman

Allan, you want to encrypt the data from the GbEthernet interface? Whether the GbEthernet interface is in the same the FPGA board? If not, even you find the maximum frequency for the AES algorithm, you should consider the delay of the OS.

Reply to
IDDLife

Hi Allan. You hit the point. That is exactly why there are no such designs in real life.

You always can trade comb. delay vs. latency, but you have to look for the solution that suits your needs the best.

Now look at your example result with 14MHz. Theoretical data throughput is the same as in an iterative design running at 14*14Mhz, which I think is a clock frequency that can be achieved by modern FPGAs. But: With the iterative design you save about 90% area and don't have to worry so much about moving the data from one clock domain to another. In the best possible case you can run the AES and all other circuits at the same (high) clock frequency.

The same thing is also valid for the S-Boxes in an AES design. Often made with Blockrams, out of convenience. But there are solutions published that use very small combinatorical circuits. These solutions have the disadvantage of large delays (20 to 30 ns) thus reducing the clock rate of the whole AES design. Now what do you do in such a case? Find out how to pipeline that solution. If you can increase the clock frequency into a range where it fits into the overall design, you can save all the valuable and rare BRAMs. It may cost you some clock cycles of additional latency, but it depends on the application if that is a problem or not.

So back to your original postings title: Complex cores with low latency have high combinatorical delays. The problems that arise from such solutions are in most cases larger than the benefits, if there are any at all.

Have a nice synthesis Eilert

Allan Herriman schrieb: ...snip...

...snip...

Reply to
backhus

Ah yes. I have come to the same conclusion.

A few years ago, I designed what I believe was the first 10Gb/s AES256 encryptor on the market. It used CTR mode, because that was the only mode suitable to run at those rates in the FPGAs that were available then. I recall thinking that feedback modes (e.g. CFB) would be possible at

10Gb/s in FPGAs in a few years time. I'll try this test again when the next generation of FPGAs come out.

For the crypto naive: The throughput of a block cypher with feedback is determined by the delay through the block cypher calculation. Pipelining is good for getting impressive clock numbers, but it actually hurts throughput.

formatting link

Thanks to all who responded, Allan.

Reply to
Allan Herriman

(snip)

(snip)

You should be able to process multiple data streams, though. (Similar to the multithreading processors popular a few years ago.) I would expect that anyone needed such high speed would have more than one document to encrypt or decrypt.

-- glen

Reply to
glen herrmannsfeldt

I wish it could work that way. But the problem is that you don't use

10Gb/s encryptors to encrypt "documents", just a single continuous stream (or context) at 10Gb/s.

Well, at least that's the way our customers use them.

On a brighter note, it is possible to interleave CFB, so that each "engine" has to sample every 2nd (for two way interleave) 128 bit block. This is discussed in Schneier.

Regards, Allan

Reply to
Allan Herriman

(snip)

I would think that would be dangerous with block chaining, if you miss one block you are stuck. Presumably with error correction it can work.

Still, it seems surprising that you have only one stream.

I would think you would want more than two way, though.

-- glen

Reply to
glen herrmannsfeldt

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.