compressing Xilinx bitstreams

John Larkin · 2004-06-17T21:41:01+00:00

Forgive me if this has been asked before, but does anybody havecomments or links to simple methods of compressing/decompressingXilinx configuration bitstreams? I've been perusing a few of my .rbtfiles, and they have long bunches of 1s and 0s (interestingly,different designs seem to have more 1s, others mostly 0s.) I'd thinkthat something very simple might achieve pretty serious (as, maybe2:1-ish) compression without a lot of runtime complexity. We generallyrun a uP from EPROM, with the uP code and the packed Xilinx configstuff in the same eprom, with the uP bit-banging the Xilinx FPGA atpowerup time. So a simple decompressor would be nice.I did google for this... haven't found much.Thanks,John

N

Nico Coesel 22 years ago

Did you ever try to compress these files? I totally agree with you that these files _look_ easy to compress, but they aren't. I tried RLE, but that will only save 5% to 10%. ZIP does a little better. I just tried to compress a .bit file for a 400k gate Xilinx device and it reduces the size by 26% but you'll need to have room for the ZIP decompression code...

Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl

Vote

T

Tim 22 years ago

But with a little knowledge of the structure maybe we can do better than blind RLE or whatever. Surely any structure which the eye can see can be efficiently encoded?

e.g. "There will be lots of repeats for unused LUTs. These are coded as abc and should be decoded as xyz"

Vote

F

Frank Benoit 22 years ago

I used something similar with a Spartan2-50, MC9S12dp256 (Star12 Motorola

256kb Flash).

rbt -> bin-file

compress with LZ77
split it in blocks with max 16kb (because of star12 banking)
convert to C-Sources const unsigned char data_00[]={ ......... };
compile it into the Star12 application
on startup of Star12 whole RAM is used for decompression. program fpga over jtag
startup of µC application code

But sorry, I have no good Information about the ratio. In the beginning I used an XC2S200 which image was definitly too big and the design was nearly empty (ratio 25%). Now i use the XC2S50 and the design is using

98% of the fpga. But the only thing I know, the compressed image uses max. 3 banks, so ratio between 45-68% of original size.

Frank

Vote

N

Nico Coesel 22 years ago

Another poster claims huge space savings by using a special tool. I haven't looked into it.

That's the problem: the routing software smears the entire design over the entire FPGA if it can. You can specify to leave unused space from the bit-file, but you'll see the length varies with every routing run. Perhaps the best space saver is to constrain the router to use only a part of the FPGA which just is big enough to contain your design. Next specify to leave out the unused stuff.

Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl

Vote

Z

Zak 22 years ago

Probably because the looks for repeating bytes, while here we have only repeating stretches of 0's. What might work is to re-code the file into numbers giving the number of 0 bits between 1's as a first step:

00100000101000000000010000011000000000001 would turn into 2 - 5 - 1 - 10 - 5 - 0 - 11.

Stretches of 0 more than 254 long could be encoded as 255, meaning 255 zeroes and no 1, whith the next number to give more 0's. 1-[255 0s]-1 would code to 255 0 in that case.

The resulting bytes are probably easier to huffman compress. Or it may pay to do this for 0 runs up to 16 long, and coding these as bytes with values 0-15 (not as nibble pairs, subsequent nibbles probably do not have any relationship).

Thomas

Vote

J

John Larkin 22 years ago

I tried my simple run-encoder. On various designs I have around, it achieved compression ratios of (best) 0.56 and worst 1.04 (ie, compressed was bigger than uncompresssed!) The worst was on a fairly dense XC2S400 bga part, whose rbt file had hardly any long runs of anything. Even pkzip only managed to crunch the binary config image to

0.74 on this one. It looks to me that the newer Xilinx chip files tend to be less compressible... seem to have fewer runs. So maybe there's no very-simple-to-unpack thing that's generally useful.

Needs more thought someday, I guess.

John

Vote

N

Nico Coesel 22 years ago

This makes sense. Haven't tried is though. I presume(d) ZIP looks at the bits instead of the bytes. Still, don't feel lucky because you seen a lot of contiguous '1's and '0's.

Here is a wild idea: Another way of compressing the file may be by stripping the frame headers (which are repeated at the start of each frame, these can easely be added during decompression) and sorting the resulting data. Next step is compressing it, but not by going from left to right, but going from top to bottom and compress column after column. Because of the sorting, least changes from 0 to 1 are to be expected in a column. Decompressing however would require a fair amount of memory, so the data also has to be divided in blocks so only a block at a time needs to be decompressed. IIRC it doesn't matter in which order the data frames are loaded as long as the command frames are at the right place.

Xilinx has some thorough information on their programming datastream on their website.

Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl

Vote

K

Kolja Sulimma 22 years ago

As noted before, Ralph Kuhnert, a student of mine, did.

formatting link

He achieved 30% to 70% compression just using RLE on XC4K data. You probably applied the RLE on bytes as a previous poster suggested. That does not help because the Xilinx data is not byte aligned. (In the histogramms you can see for example that for all designs runs of 19 consecutive 1s are quite common. This probably represents some CLB data, an unsued LUT or something like that.) You need to encoded the individual bits. What worked very well for XC4K is to use 4 Bits per codeword to encode either a zero followed by 0 to 13 ones or 14 ones.

Kolja Sulimma

Vote

R

rickman 22 years ago

I see that no one has addressed the basic issue of just how compressable these files are and when you can expect to achieve good compression and bad compression.

The runs of 1's and 0's are typically located in areas of the bitstream that represent unused portions of the chip. So for designs that are sparce, you can get high levels of compression, not unlike the basic form of compression that Xilinx provides in the Virtex chips (frame compression). But as the utilization of the chip goes up, the bitstream becomes more random and the compressability of the bitsteam goes down. It largely does not matter how you compress the data, once the chip is largely used, you won't be able to get much compression.

So in the end, compression will help you reduce the size of your bit steam when the design is much smaller than the chip (where you could use a smaller chip), but if your design grows the compression will be reduced and you will end up needing nearly as large a memory as an uncompressed bit stream. So reducing the size of the chip may be a better solution if your memory will ultimately limit the size of your design. A smaller FPGA reduces the size of the bit stream and also costs less.

Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX

Vote

compressing Xilinx bitstreams

Join the Discussion

Didn't find your answer?