compressing Xilinx bitstreams

J

John Larkin 22 years ago

Forgive me if this has been asked before, but does anybody have comments or links to simple methods of compressing/decompressing Xilinx configuration bitstreams? I've been perusing a few of my .rbt files, and they have long bunches of 1s and 0s (interestingly, different designs seem to have more 1s, others mostly 0s.) I'd think that something very simple might achieve pretty serious (as, maybe

2:1-ish) compression without a lot of runtime complexity. We generally run a uP from EPROM, with the uP code and the packed Xilinx config stuff in the same eprom, with the uP bit-banging the Xilinx FPGA at powerup time. So a simple decompressor would be nice.

I did google for this... haven't found much.

Thanks,

John

Vote

J

Jim Granville 22 years ago

There was discussion on this some months ago. Might show up here ?

formatting link

You can run std ZIP tools on the files, to get a quick 'practical limit' indication. We did some work with Run length compression, which is very simple (simple enough to code into CPLD), but has medium compression gains. ISTR about half the gains of ZIP ?

It could be improved with a RLC-Compiler/optimiser that looked for the best pattern/lengths for that chip, or even bitstream, as you could store the RLC params as a 'header', but we did not go that far.

Be a good project for Xilinx to do as an app note :)

It would make sense to target to the pattern-repeat sizes on devices like FPGAs.

-jg

Vote

C

Clark Pope 22 years ago

The bit generation tool has an option to compress the .bit file. I use this when I'm loading over JTAG to save time. I assume Xilinx has info on in system programming with a compressed .bit file.

However, I've observed the same phenomenon as you: when I zip a .bit file it is usually less than 50% of the original size. My guess is even a trivial run length encoding compression would be helpful. There are plenty of resources for Lempel Ziv compression on the web:

see

formatting link

If you get it working please post/send the result.

Vote

A

Austin Lesea 22 years ago

John,

I think that I had heard that zipping, and unzipping bit files led to the most compression (2:1 or better). (classic unix or windows zip/unzip)

I think that a zip/unzip routine would be a great example of something a uP could do without an unreasonable amount of memory (ROM+RAM) support.

Austin

John Lark> Forgive me if this has been asked before, but does anybody have

Vote

J

John_H 22 years ago

First, please be aware that the ACSII .rbt file is 8x the simple .bin file size. Check the bitgen options and you'll find the ability to generate the straight binary file - 1s and 0s at the bit level, not the ASCII character level. Compression beyond that may be what you're looking for, but please - start with the binary file.

Vote

J

Jim Granville 22 years ago

Yes, but a compress targeted to FPGA content should be more efficent, and use less resource than a generic compress.

One engineer's reasonable is anothers excessive :)

There are two main classes of uC loader:

Ones that store the compressed stream on-chip, and so can expect to have good random access, for things like decompress tables. Large-code uC also tend to have larger RAM
Ones that store the compressed stream in low-cost serial flash. In this class, table handling is not as easy. uC used here could be as miniscule as the PIC10F in SOT23 PIC10F starts at 16 bytes RAM and 256 Words Code space... CPLDs are also used for loaders, and they can do simple decompress.

=jg

Vote

S

Steve Casselman 22 years ago

VCC did a package called HOTMan that does compression. It takes the bit file and turns it into a compressed file that looks like...

int testArray[2669]=\ {

0xddedda78,0xe55c8c5f,0xefe1c079.... }

We get at least 4 to 1 and small designs in big chip can get 50 to 1. The above format allows you to compile the design into a C/C++ program.

Steve

Vote

T

Tim Wescott 22 years ago

No links, but have you considered simple run-length limiting? I can think of at least one scheme that would be guaranteed sub-optimal from a compression standpoint but that wouldn't take much code -- just encode any string of 0xff or 0x00 bytes as that byte followed by a count -- so that 0x00 0x00 0x00 0x00 becomes 0x00 0x04, for instance. You have the overhead that 0x00 becomes 0x00 0x01, and you also can't encode anything that spans bytes -- but you may be happy with it none the less.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

J

John Larkin 22 years ago

Of course. We have a little utility, vaguely like a linker, that gobbles up Motorola .s28 files and Xilinx .rbt files and builds a rom image, all properly squashed into bits. It's cute... it even saves the beginning of the rbt ASCII header in the rom image for FPGA version verification. My observation was that the bits themselves include long runs of 1s or 0s.

I'd like to design a board using a 28-pin eprom (space is at a premium here) but plan hooks for using a bigger Xilinx chip some day, and then I'd run out of rom space to store the config bits. So having a compression scheme would give us the margin to use the small eprom.

Suppose the compressed data were an array of bytes. If the MS bit of a byte were 0, the remaining 7 bits are to be loaded verbatum; if the MS bit is a 1, the other 7 bits specify a run of up to 63 1's or 0's.

Something like that; the exact numbers may need tuning. Very easy to unpack, not hard to encode. I'd have to test some actual config files to see how good something like this could compress.

John

Vote

G

Greg Neff 22 years ago

See:

formatting link

It should be straightforward to generate some RLL compression and decompression code. You might want to test the algorithms on a PC to make sure that the decompressed output ends up the same as the uncompressed input. A garbled bitstream can have the same effect as the MC6800 HCF opcode...

================================

Greg Neff VP Engineering

*Microsym* Computers Inc. snipped-for-privacy@guesswhichwordgoeshere.com

Vote

P

Paul Leventis (at home) 22 years ago

Hi John,

Can't help you on the Xilinx front, but many of Altera's newest chips (Cyclone, Stratix II) support on-the-fly decompression of the bitstream. The Quartus software compresses the bitstream which is then programmed into the device using pretty much any of the many methods of programming available, and the chip's configuration controller will decompress the bitstream that it sees. This typically achieves a 1.9-2.3:1 compression ratio, depending on the device utilization, RAM contents and such.

Some of our programming devices also can decompress bitstreams on-the-fly, allowing bitstream compression for other chip families that do not support decompression internally.

See the Configuration Handbook Volume 2

formatting link

for a detailed description of device programming and compression options.

Regards,

Paul Leventis Altera Corp.

Vote

J

Jim Granville 22 years ago

Has anyone taken the simple step of run ZIP on some Altera compressed files, to see how much more compression is possible ?

-jg

Vote

A

Allan Herriman 22 years ago

This 'compression' merely merges identical frames. The probability of getting identical frames in a well utilised FPGA isn't very high, so this doesn't result in much reduction in file size.

Some experiments I did a few years ago (on Virtex-E and Virtex-2 files) indicated that the this compression made subsequent compression by tools such as gzip *worse*. It is, however, the only way to speed up JTAG loading.

Regards, Allan.

Vote

K

Kolja Sulimma 22 years ago

Interesting. A student of mine did the same as a semester project. The goal was to find a compression method simple enough that the logic to programm an FPGA from a NOR-Flash would fit into an XC9536.

For XC4K FPGAs he was very successfull. He achieved a compression in the range of 50% to 65% size compared to the original with runlength encoding of 1s only. This is almost as good as zip.

The Virtex family seems to use it's configuration bits a lot more efficiently. (encoded switch configurations ?). He could not find any simple solution for those.

Kolja Sulimma

Vote

J

Jim Granville 22 years ago

Maybe you could have allowed him to also use a 9572 ( or XC2C64) ?

If there was a big change between families, it sounds like Xilinx followed the same path, and did a simple reduction in bit-encode with some small RLC - after all, a 9536 level resource will be miniscule in a FPGA.

Vote

N

Neil Glenn Jacobson 22 years ago

While this doesn't exactly answer your question, the new Xilinx XCFP serial PROMs support storage of compressed bitstream data. The data is compressed when you translate to the PROM format and the PROM does the decompression before delivery to the FPGA.

formatting link

John Lark>Forgive me if this has been asked before, but does anybody have

Vote

A

Antonio Pasini 22 years ago

My application uses a 128Kb flash micro (M16C) to program a SpartanIIE-100 in slave parallel mode. Back "then", external serial config. memory price was outrageusly high. Besides, by keeping all inside the main micro, I now can remote upgrade my FPGA code very very easily.

To free some more Kb for a new requirement, I compressed the bit-stream with a simple LZW implementation, and decompressed on the fly before sending to FPGA.

Here are some results from a real design.

Starting from uncompressed design BIN file: CAMERA.BIN: 107980 bytes LZW compression with different "dictionary table" bit length:

BIT SIZE

10 73853 11 63283 12 61442 13 58012 14 56803

Same design, but BIN file has been compressed by ISE:

CAMERA_COMP.BIN 97944 (what I was using before; leaved 30 Kb for my app. code) After LZW compression:

BIT SIZE

10 73475 11 65442 12 62865 13 58795 14 57687

As expected, you get slightly better results starting from uncompressed BIN stream, when using simple compression algorithms.

Even after adding code decompression, I saved tens of flash Kb to implement more features at no cost. Initial configuration has slowed down, of course: before I was pumping out data to FPGA as fast as possible with string move assembly instructions. Now I have to decompress on the fly. Times stays in the hundreds of ms range, tough.

RAM USAGE during decompression:

from bit width of string table, you define the table size this way (smaller prime number larger than 2^bits):

BITS TABLE SIZE

14 18041 13 9029 12 5021 11 2053

Vote

R

roller 22 years ago

"John Larkin" escribió en el mensaje news: snipped-for-privacy@4ax.com...

try searching for RLE (run length encoding) that's the encoding used for .PCX graphic files

Vote

N

Nico Coesel 22 years ago

Tried it but found the files aren't reduced in size much and more important, the software required to decompress the file eats away all the savings for a 400k device. In other words: Unless you have more than around half a million gates of configuration data, it's not worth it.

Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl

Vote

J

John Larkin 22 years ago

OK, bear with me on this. Here's a piece of a .rbt for a Spartan XL...

01111111111111111111111111111111111111111111111111111111111011111111111111111111111111110111111110111111011111111110111111110101011101111110111111011111111111111111110011111111111111111111111111111111111111111111111111111110101 01111111111111111111111111111111111111111111111111111111111111111111111111111101111111111111111111111111110111111101111111111111111110111111111111110111111111111011111101111111111111111111111111111111111111111111111111111110011 01111111111111111111111111111111111111111111111111111100011111111111111111101111111100111111110011111111111111011101111111111100111011110011111011111111111111111111111110110111001111111111111111111111110111111011111111111111011 01111111111111111111111111111111111111111111111111111111011111111111111111101111111101011111111110011111111111111100111111111111011111111101111111111111111111110111101111111111110111111111111111111111111111111111111111111111110 01111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111011111111111111111011010111111110011111111011111111111011111011111011110101111111000111111111011111111111101111111111111110101101111 00111111111111111111111111111111111111111111000111111111111111111111111111111111111111111111111111111111110111111111110110111111011111111111111111111101111111111111111101111111110111111100011111111111111111111111111101101100000 01111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111101110111101011111111111111111111111111111111111111101111111101111111111111111111111111111111111111110111111100 00011111111111111111111111111111111111111111111111111111111100111111011111111111001111110110101111001111111101111111111111001111111100111111111001111101101011110110011111101010111101111111111111111111111010111100111111111111000 01101011111111111111111111111111111111111111111111111111111110101111111111111111101011111110011110111111110101001110111111101011011100111111111010010111001111110110101101111111111111111111111111111111110011111101111111010100111 01111011111111111111111111111111111111111111111111111111111111111111011111111111111111110111111111111111110110111111111111101011011111111111111111111101111111111111111101111111111101111111111111111111111111111111111111011111010 01101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110011111111111111111111111111111111111111111001111111111111111110111111111111111111111111111111111111111111111111100111110011000 01111111111111111111111111111111111111111111111111111111111011111111101111111110111111111011111111101111011110111111111110111101111001101111101111111110101011111011010111101111111110111111101111111111111111111110111111110100011 01101111111111111111111111111111111111111111111011111111111111111111111111111111111111111111111111111111111101011010111111111111110111101111111111111101011011111111111111011110111111111111111111111111111111111111111111111110101 00111100111111111111111111111111111111111111111011111111110100111111100011111101001111111000111111111111111110101011111101101011110010011111011011111111101011110110101111010001111110111111111111111111111111111101111100111110111

Where there are lots of 1's. Other hunks of this file are almost all

1's. So what we need is a not-very-general compression scheme, with the only "dictionary" entry being "the following is a hunk of 1's". So the decompressor could be very simple.

Interestingly, this is for a Spartan 2:

00000000000001001000000000000000 00000000000000000000000000000000 00000000000100100000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000001001000000000000000 00000000000000000000000000000000 00000000000100100100000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000001001100000000000000 00000000000000000000000000000000 11111111000100110000000100000100 00000000010001000000000000010000 00000000000001110100100000000000 11010100000000000011010000000000 00000001000000000000000000001000 00111111110001000000000000000000

Which has long runs of zeroes!

Just eyeballing these files, it looks like something very simple could get at least a 2:1 squash factor.

John

Vote

compressing Xilinx bitstreams

Join the Discussion

Didn't find your answer?