Forgive me if this has been asked before, but does anybody have comments or links to simple methods of compressing/decompressing Xilinx configuration bitstreams? I've been perusing a few of my .rbt files, and they have long bunches of 1s and 0s (interestingly, different designs seem to have more 1s, others mostly 0s.) I'd think that something very simple might achieve pretty serious (as, maybe
2:1-ish) compression without a lot of runtime complexity. We generally run a uP from EPROM, with the uP code and the packed Xilinx config stuff in the same eprom, with the uP bit-banging the Xilinx FPGA at powerup time. So a simple decompressor would be nice.
There was discussion on this some months ago. Might show up here ?
formatting link
You can run std ZIP tools on the files, to get a quick 'practical limit' indication. We did some work with Run length compression, which is very simple (simple enough to code into CPLD), but has medium compression gains. ISTR about half the gains of ZIP ?
It could be improved with a RLC-Compiler/optimiser that looked for the best pattern/lengths for that chip, or even bitstream, as you could store the RLC params as a 'header', but we did not go that far.
Be a good project for Xilinx to do as an app note :)
It would make sense to target to the pattern-repeat sizes on devices like FPGAs.
The bit generation tool has an option to compress the .bit file. I use this when I'm loading over JTAG to save time. I assume Xilinx has info on in system programming with a compressed .bit file.
However, I've observed the same phenomenon as you: when I zip a .bit file it is usually less than 50% of the original size. My guess is even a trivial run length encoding compression would be helpful. There are plenty of resources for Lempel Ziv compression on the web:
see
formatting link
If you get it working please post/send the result.
First, please be aware that the ACSII .rbt file is 8x the simple .bin file size. Check the bitgen options and you'll find the ability to generate the straight binary file - 1s and 0s at the bit level, not the ASCII character level. Compression beyond that may be what you're looking for, but please - start with the binary file.
Yes, but a compress targeted to FPGA content should be more efficent, and use less resource than a generic compress.
One engineer's reasonable is anothers excessive :)
There are two main classes of uC loader:
Ones that store the compressed stream on-chip, and so can expect to have good random access, for things like decompress tables. Large-code uC also tend to have larger RAM
Ones that store the compressed stream in low-cost serial flash. In this class, table handling is not as easy. uC used here could be as miniscule as the PIC10F in SOT23 PIC10F starts at 16 bytes RAM and 256 Words Code space... CPLDs are also used for loaders, and they can do simple decompress.
No links, but have you considered simple run-length limiting? I can think of at least one scheme that would be guaranteed sub-optimal from a compression standpoint but that wouldn't take much code -- just encode any string of 0xff or 0x00 bytes as that byte followed by a count -- so that 0x00 0x00 0x00 0x00 becomes 0x00 0x04, for instance. You have the overhead that 0x00 becomes 0x00 0x01, and you also can't encode anything that spans bytes -- but you may be happy with it none the less.
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Of course. We have a little utility, vaguely like a linker, that gobbles up Motorola .s28 files and Xilinx .rbt files and builds a rom image, all properly squashed into bits. It's cute... it even saves the beginning of the rbt ASCII header in the rom image for FPGA version verification. My observation was that the bits themselves include long runs of 1s or 0s.
I'd like to design a board using a 28-pin eprom (space is at a premium here) but plan hooks for using a bigger Xilinx chip some day, and then I'd run out of rom space to store the config bits. So having a compression scheme would give us the margin to use the small eprom.
Suppose the compressed data were an array of bytes. If the MS bit of a byte were 0, the remaining 7 bits are to be loaded verbatum; if the MS bit is a 1, the other 7 bits specify a run of up to 63 1's or 0's.
Something like that; the exact numbers may need tuning. Very easy to unpack, not hard to encode. I'd have to test some actual config files to see how good something like this could compress.
It should be straightforward to generate some RLL compression and decompression code. You might want to test the algorithms on a PC to make sure that the decompressed output ends up the same as the uncompressed input. A garbled bitstream can have the same effect as the MC6800 HCF opcode...
================================
Greg Neff VP Engineering
*Microsym* Computers Inc. snipped-for-privacy@guesswhichwordgoeshere.com
Can't help you on the Xilinx front, but many of Altera's newest chips (Cyclone, Stratix II) support on-the-fly decompression of the bitstream. The Quartus software compresses the bitstream which is then programmed into the device using pretty much any of the many methods of programming available, and the chip's configuration controller will decompress the bitstream that it sees. This typically achieves a 1.9-2.3:1 compression ratio, depending on the device utilization, RAM contents and such.
Some of our programming devices also can decompress bitstreams on-the-fly, allowing bitstream compression for other chip families that do not support decompression internally.
See the Configuration Handbook Volume 2
formatting link
for a detailed description of device programming and compression options.
This 'compression' merely merges identical frames. The probability of getting identical frames in a well utilised FPGA isn't very high, so this doesn't result in much reduction in file size.
Some experiments I did a few years ago (on Virtex-E and Virtex-2 files) indicated that the this compression made subsequent compression by tools such as gzip *worse*. It is, however, the only way to speed up JTAG loading.
Interesting. A student of mine did the same as a semester project. The goal was to find a compression method simple enough that the logic to programm an FPGA from a NOR-Flash would fit into an XC9536.
For XC4K FPGAs he was very successfull. He achieved a compression in the range of 50% to 65% size compared to the original with runlength encoding of 1s only. This is almost as good as zip.
The Virtex family seems to use it's configuration bits a lot more efficiently. (encoded switch configurations ?). He could not find any simple solution for those.
Maybe you could have allowed him to also use a 9572 ( or XC2C64) ?
If there was a big change between families, it sounds like Xilinx followed the same path, and did a simple reduction in bit-encode with some small RLC - after all, a 9536 level resource will be miniscule in a FPGA.
While this doesn't exactly answer your question, the new Xilinx XCFP serial PROMs support storage of compressed bitstream data. The data is compressed when you translate to the PROM format and the PROM does the decompression before delivery to the FPGA.
formatting link
John Lark>Forgive me if this has been asked before, but does anybody have
My application uses a 128Kb flash micro (M16C) to program a SpartanIIE-100 in slave parallel mode. Back "then", external serial config. memory price was outrageusly high. Besides, by keeping all inside the main micro, I now can remote upgrade my FPGA code very very easily.
To free some more Kb for a new requirement, I compressed the bit-stream with a simple LZW implementation, and decompressed on the fly before sending to FPGA.
Here are some results from a real design.
Starting from uncompressed design BIN file: CAMERA.BIN: 107980 bytes LZW compression with different "dictionary table" bit length:
BIT SIZE
10 73853
11 63283
12 61442
13 58012
14 56803
Same design, but BIN file has been compressed by ISE:
CAMERA_COMP.BIN 97944 (what I was using before; leaved 30 Kb for my app. code) After LZW compression:
BIT SIZE
10 73475
11 65442
12 62865
13 58795
14 57687
As expected, you get slightly better results starting from uncompressed BIN stream, when using simple compression algorithms.
Even after adding code decompression, I saved tens of flash Kb to implement more features at no cost. Initial configuration has slowed down, of course: before I was pumping out data to FPGA as fast as possible with string move assembly instructions. Now I have to decompress on the fly. Times stays in the hundreds of ms range, tough.
RAM USAGE during decompression:
from bit width of string table, you define the table size this way (smaller prime number larger than 2^bits):
Tried it but found the files aren't reduced in size much and more important, the software required to decompress the file eats away all the savings for a 400k device. In other words: Unless you have more than around half a million gates of configuration data, it's not worth it.
--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl
Where there are lots of 1's. Other hunks of this file are almost all
1's. So what we need is a not-very-general compression scheme, with the only "dictionary" entry being "the following is a hunk of 1's". So the decompressor could be very simple.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.