Bitstream compression

Hey all--

So my experience with the native bitstream compression algorithms provided by the FPGA vendors has been that they don't actually achieve all that much compression. This makes sense given that the expansion has to be accomplished in the FPGA hardware, and so can't be all that aggressive.

I generally find myself programming FPGAs from microcontrollers, and so I've got clock cycles and RAM to throw at the problem to try to do better. We've had some limited luck implementing really stupid RLE, just compressing the runs of 0x00 and 0xFF and calling it a day. But I was wondering whether anyone's found a better, more universal approach.

The ideal would be a compression/expansion library with fairly good compression ratios, where the expansion side could operate without malloc and on-the-fly on a stream of data rather than requiring large buffers. The compression side could use phenominal amounts of compute power and RAM, since it would be happening on the PC. But the goal would be able to decompress on something like an LPC1754 (32kB RAM total).

Anyone have any thoughts?

--
Rob Gaddi, Highland Technology
Email address is currently out of order
Reply to
Rob Gaddi
Loading thread data ...

It won't be generally possible to beat vendor provided basic compression more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't really improve anything.

Just try 7zip on the FPGA binaries and see for yourself.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

Just did, on an FPGA with an admittedly small fill ratio. The uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx bitstream compression gets it down to 814,336 bits, or about 100kB.

7-Zip knocked it down to 16kB.

Another design uses a decent about of an XC6S45. Native size is

11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip gives me 395kB.

I've got a project coming up around an Altera Arria II, with 30Mb of configuration. If I could get a 3:1, 4:1 compression ratio it would be a pretty radical savings on flash size.

--
Rob Gaddi, Highland Technology
Email address is currently out of order
Reply to
Rob Gaddi

A design done for a client, using Virtex 6, with gzip -9 for compression. Figures are from memory and are approximate.

Uncompressed: 7.5MBytes compressed: a few 10s of kB (empty FPGA) compressed: 500kBytes (mostly empty FPGA) compressed: 2MBytes (mostly full FPGA) compressed: 7.5MBytes (with bitstream encryption)

Note: there's no point using compression with encryption, as the encrypted images don't compress.

I used gzip as the decompresser is open source and fairly "light".

The Xilinx built-in compression doesn't do a good job because it merely joins identical frames. The chance of getting identical frames is low for any design that uses a reasonable amount of the fabric. (If you're not using most of the fabric, you could be using a smaller, cheaper device.) OTOH, if you are using bitstream encryption, the built-in compression is the only compression you can use and it will be better than nothing.

Regards, Allan

Reply to
Allan Herriman

From my memory, Altera has a much better "native" compression than Xilinx, with typical compression-ratios of about 50%.

Regards,

Thomas

formatting link

Reply to
Thomas Entner

n

ip

The algorithm that 7-Zip uses internally for .7z file compression is LZMA:

formatting link

One characteristic of LZMA is that its decoder is much simpler than the encoder, making it well-suited for your application. You would be most interested in the open-source LZMA SDK:

formatting link

They provide an ANSI C implementation of a decoder that you can port to your platform. Also, there is a reference application for an encoder (I believe also written in C). I have used this in the past to compress firmware for embedded systems with good success. I use the encoder as a post-build step to compress the firmware image into an LZMA stream (note that the compression is not done with 7-Zip, as the .

7z format is a full-blown archive; the reference encoder just gives you a stream of compressed data, which is what you want). The resulting file is then decompressed on the embedded target at firmware update time. The decoder source code is most amenable to porting to 32- bit architectures; I have implemented it on the LPC2138 ARM7 device (with the same 32 KB of RAM as your part) as well as a few AVR32UC3 parts.

A couple other things: I originally did this ~4 years ago with a much older version of the SDK; it's possible that things have changed since then, but it should still be worth a look. LZMA provides good compression ratios with a decoder that in my experience runs well on embedded platforms. Secondly, you do have to be careful with the parameters you use at the encoder if you want to bound the amount of memory required at the decoder. More specifically, you need to be careful which "dictionary size" you use for the encoder. As you might expect, a larger dictionary gives you better compression ratios, but the target running the decoder will require at least that much memory (e.g. an 8 KB dictionary size will require at least 8 KB of memory for the decoder).

Jason

Reply to
Jason

The problem is that Hufman based algorithms don't gain that much. I recall someone has had lots of succes using an RLE like scheme on nibbles (4 bit) instead of whole bytes.

Look here:

formatting link

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
 Click to see the full signature
Reply to
Nico Coesel

Interesting.

I tried to compress JBCs from heavily used Altera Cyclone, got only about x1.5 of compression.

As for compression algorithm, something like a bit oriented LZSS or LZW would be easy to implement. The decompression part is trivial. If you don't care about speed, the compression part is very simple, too. I don't know if the further sophistication of the algorithm would do much of a difference.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

Lempel-Ziv-Oberhumer (LZO) might also be worth investigating.

formatting link
?Oberhumer

The LZO library implements a number of algorithms with the following characteristics:

  • Compression is comparable in speed to deflate compression.
  • Very fast decompression
  • Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level).
  • Requires no additional memory for decompression other than the source and destination buffers.
  • Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression.

Regards.

Reply to
Noob

depending on compression level).

destination buffers.

compression speed, without affecting the speed of decompression.

Some of the following benchmarks may be of interest.

formatting link
formatting link
formatting link

Regards.

Reply to
Noob

depending on compression level).

destination buffers.

compression speed, without affecting the speed of decompression.

Got the following in an email from Magnus Eriksson, who pled lack of Usenet access:

--------------

A bunch of years ago I read about something called "pucrunch", for compressing programs for use on the C64, or in modern parlance perhaps "a CPU and memory restricted target system" -- it has a 1 MHz CPU with few registers, and 64K RAM in total.

Pucrunch uses LZ77 and run-length encoding, with some tweaks; that is what the author decided was the right compromise between compression ratio and memory usage. And it is a "cruncher", which means that the end result is a self-decompressing binary, one that unpacks a program to RAM, and the decompressing code has to stay out of its way as far as possible -- I believe the generic 6502 decompression routine takes up only _354 bytes_ of code, if I'm reading the code comments correctly.

The catch is that you'll have to write your own decompression routine (but that should hardly come as a surprise). There is pseudocode in the article, and 6502 and Z80 code linked. The compressor is just one file of C, so it should be easy to test the compression ratio first, at least.

It will almost certainly be worse than 7-zip, but just how well it does on a bitstream might be interesting to see.

You can find it here:

Article: An Optimizing Hybrid LZ77 RLE Data Compression Program, aka Improving Compression Ratio for Low-Resource Decompression

formatting link

some related things in the parent directory too:

formatting link

Hope that might be of some help, or inspiration to further experimentation.

Take care, Magnus

Reply to
Rob Gaddi

I've done a very simple byte-oriented RLL thing on Xilinx chips and got config files that were 20 to 40% as big as the original binaries. Even a pretty-full chip has long runs of 0x00 and 0xFF bytes in the bitstream. The uP code to decompress this and bang the FPGA is pretty simple. In serial bit-bang mode, decompressing and configuring is faster than uncompressed, because spitting out a long string of identical bits can be done as a fast special case... just wiggle the config clock!

John

Reply to
John Larkin

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.