compressing Xilinx bitstreams

Forgive me if this has been asked before, but does anybody have comments or links to simple methods of compressing/decompressing Xilinx configuration bitstreams? I've been perusing a few of my .rbt files, and they have long bunches of 1s and 0s (interestingly, different designs seem to have more 1s, others mostly 0s.) I'd think that something very simple might achieve pretty serious (as, maybe

2:1-ish) compression without a lot of runtime complexity. We generally run a uP from EPROM, with the uP code and the packed Xilinx config stuff in the same eprom, with the uP bit-banging the Xilinx FPGA at powerup time. So a simple decompressor would be nice.

I did google for this... haven't found much.

Thanks,

John

Reply to
John Larkin
Loading thread data ...

There was discussion on this some months ago. Might show up here ?

formatting link

You can run std ZIP tools on the files, to get a quick 'practical limit' indication. We did some work with Run length compression, which is very simple (simple enough to code into CPLD), but has medium compression gains. ISTR about half the gains of ZIP ?

It could be improved with a RLC-Compiler/optimiser that looked for the best pattern/lengths for that chip, or even bitstream, as you could store the RLC params as a 'header', but we did not go that far.

Be a good project for Xilinx to do as an app note :)

It would make sense to target to the pattern-repeat sizes on devices like FPGAs.

-jg

Reply to
Jim Granville

The bit generation tool has an option to compress the .bit file. I use this when I'm loading over JTAG to save time. I assume Xilinx has info on in system programming with a compressed .bit file.

However, I've observed the same phenomenon as you: when I zip a .bit file it is usually less than 50% of the original size. My guess is even a trivial run length encoding compression would be helpful. There are plenty of resources for Lempel Ziv compression on the web:

see

formatting link

If you get it working please post/send the result.

Reply to
Clark Pope

John,

I think that I had heard that zipping, and unzipping bit files led to the most compression (2:1 or better). (classic unix or windows zip/unzip)

I think that a zip/unzip routine would be a great example of something a uP could do without an unreasonable amount of memory (ROM+RAM) support.

Austin

John Lark> Forgive me if this has been asked before, but does anybody have

Reply to
Austin Lesea

First, please be aware that the ACSII .rbt file is 8x the simple .bin file size. Check the bitgen options and you'll find the ability to generate the straight binary file - 1s and 0s at the bit level, not the ASCII character level. Compression beyond that may be what you're looking for, but please - start with the binary file.

Reply to
John_H

Yes, but a compress targeted to FPGA content should be more efficent, and use less resource than a generic compress.

One engineer's reasonable is anothers excessive :)

There are two main classes of uC loader:

  • Ones that store the compressed stream on-chip, and so can expect to have good random access, for things like decompress tables. Large-code uC also tend to have larger RAM
  • Ones that store the compressed stream in low-cost serial flash. In this class, table handling is not as easy. uC used here could be as miniscule as the PIC10F in SOT23 PIC10F starts at 16 bytes RAM and 256 Words Code space... CPLDs are also used for loaders, and they can do simple decompress.

=jg

Reply to
Jim Granville

VCC did a package called HOTMan that does compression. It takes the bit file and turns it into a compressed file that looks like...

int testArray[2669]=\ {

0xddedda78,0xe55c8c5f,0xefe1c079.... }

We get at least 4 to 1 and small designs in big chip can get 50 to 1. The above format allows you to compile the design into a C/C++ program.

Steve

Reply to
Steve Casselman

No links, but have you considered simple run-length limiting? I can think of at least one scheme that would be guaranteed sub-optimal from a compression standpoint but that wouldn't take much code -- just encode any string of 0xff or 0x00 bytes as that byte followed by a count -- so that 0x00 0x00 0x00 0x00 becomes 0x00 0x04, for instance. You have the overhead that 0x00 becomes 0x00 0x01, and you also can't encode anything that spans bytes -- but you may be happy with it none the less.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Reply to
Tim Wescott

Of course. We have a little utility, vaguely like a linker, that gobbles up Motorola .s28 files and Xilinx .rbt files and builds a rom image, all properly squashed into bits. It's cute... it even saves the beginning of the rbt ASCII header in the rom image for FPGA version verification. My observation was that the bits themselves include long runs of 1s or 0s.

I'd like to design a board using a 28-pin eprom (space is at a premium here) but plan hooks for using a bigger Xilinx chip some day, and then I'd run out of rom space to store the config bits. So having a compression scheme would give us the margin to use the small eprom.

Suppose the compressed data were an array of bytes. If the MS bit of a byte were 0, the remaining 7 bits are to be loaded verbatum; if the MS bit is a 1, the other 7 bits specify a run of up to 63 1's or 0's.

Something like that; the exact numbers may need tuning. Very easy to unpack, not hard to encode. I'd have to test some actual config files to see how good something like this could compress.

John

Reply to
John Larkin

See:

formatting link
formatting link
formatting link

It should be straightforward to generate some RLL compression and decompression code. You might want to test the algorithms on a PC to make sure that the decompressed output ends up the same as the uncompressed input. A garbled bitstream can have the same effect as the MC6800 HCF opcode...

================================

Greg Neff VP Engineering

*Microsym* Computers Inc. snipped-for-privacy@guesswhichwordgoeshere.com
Reply to
Greg Neff

Hi John,

Can't help you on the Xilinx front, but many of Altera's newest chips (Cyclone, Stratix II) support on-the-fly decompression of the bitstream. The Quartus software compresses the bitstream which is then programmed into the device using pretty much any of the many methods of programming available, and the chip's configuration controller will decompress the bitstream that it sees. This typically achieves a 1.9-2.3:1 compression ratio, depending on the device utilization, RAM contents and such.

Some of our programming devices also can decompress bitstreams on-the-fly, allowing bitstream compression for other chip families that do not support decompression internally.

See the Configuration Handbook Volume 2

formatting link
for a detailed description of device programming and compression options.

Regards,

Paul Leventis Altera Corp.

Reply to
Paul Leventis (at home)

Has anyone taken the simple step of run ZIP on some Altera compressed files, to see how much more compression is possible ?

-jg

Reply to
Jim Granville

This 'compression' merely merges identical frames. The probability of getting identical frames in a well utilised FPGA isn't very high, so this doesn't result in much reduction in file size.

Some experiments I did a few years ago (on Virtex-E and Virtex-2 files) indicated that the this compression made subsequent compression by tools such as gzip *worse*. It is, however, the only way to speed up JTAG loading.

Regards, Allan.

Reply to
Allan Herriman

Interesting. A student of mine did the same as a semester project. The goal was to find a compression method simple enough that the logic to programm an FPGA from a NOR-Flash would fit into an XC9536.

For XC4K FPGAs he was very successfull. He achieved a compression in the range of 50% to 65% size compared to the original with runlength encoding of 1s only. This is almost as good as zip.

The Virtex family seems to use it's configuration bits a lot more efficiently. (encoded switch configurations ?). He could not find any simple solution for those.

Kolja Sulimma

Reply to
Kolja Sulimma

Maybe you could have allowed him to also use a 9572 ( or XC2C64) ?

If there was a big change between families, it sounds like Xilinx followed the same path, and did a simple reduction in bit-encode with some small RLC - after all, a 9536 level resource will be miniscule in a FPGA.

Reply to
Jim Granville

While this doesn't exactly answer your question, the new Xilinx XCFP serial PROMs support storage of compressed bitstream data. The data is compressed when you translate to the PROM format and the PROM does the decompression before delivery to the FPGA.

formatting link

John Lark>Forgive me if this has been asked before, but does anybody have

Reply to
Neil Glenn Jacobson

My application uses a 128Kb flash micro (M16C) to program a SpartanIIE-100 in slave parallel mode. Back "then", external serial config. memory price was outrageusly high. Besides, by keeping all inside the main micro, I now can remote upgrade my FPGA code very very easily.

To free some more Kb for a new requirement, I compressed the bit-stream with a simple LZW implementation, and decompressed on the fly before sending to FPGA.

Here are some results from a real design.

Starting from uncompressed design BIN file: CAMERA.BIN: 107980 bytes LZW compression with different "dictionary table" bit length:

BIT SIZE

10 73853 11 63283 12 61442 13 58012 14 56803

Same design, but BIN file has been compressed by ISE:

CAMERA_COMP.BIN 97944 (what I was using before; leaved 30 Kb for my app. code) After LZW compression:

BIT SIZE

10 73475 11 65442 12 62865 13 58795 14 57687

As expected, you get slightly better results starting from uncompressed BIN stream, when using simple compression algorithms.

Even after adding code decompression, I saved tens of flash Kb to implement more features at no cost. Initial configuration has slowed down, of course: before I was pumping out data to FPGA as fast as possible with string move assembly instructions. Now I have to decompress on the fly. Times stays in the hundreds of ms range, tough.

RAM USAGE during decompression:

from bit width of string table, you define the table size this way (smaller prime number larger than 2^bits):

BITS TABLE SIZE

14 18041 13 9029 12 5021 11 2053
Reply to
Antonio Pasini

"John Larkin" escribió en el mensaje news: snipped-for-privacy@4ax.com...

try searching for RLE (run length encoding) that's the encoding used for .PCX graphic files

Reply to
roller

Tried it but found the files aren't reduced in size much and more important, the software required to decompress the file eats away all the savings for a 400k device. In other words: Unless you have more than around half a million gates of configuration data, it's not worth it.

--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl
Reply to
Nico Coesel

OK, bear with me on this. Here's a piece of a .rbt for a Spartan XL...

01111111111111111111111111111111111111111111111111111111111011111111111111111111111111110111111110111111011111111110111111110101011101111110111111011111111111111111110011111111111111111111111111111111111111111111111111111110101 01111111111111111111111111111111111111111111111111111111111111111111111111111101111111111111111111111111110111111101111111111111111110111111111111110111111111111011111101111111111111111111111111111111111111111111111111111110011 01111111111111111111111111111111111111111111111111111100011111111111111111101111111100111111110011111111111111011101111111111100111011110011111011111111111111111111111110110111001111111111111111111111110111111011111111111111011 01111111111111111111111111111111111111111111111111111111011111111111111111101111111101011111111110011111111111111100111111111111011111111101111111111111111111110111101111111111110111111111111111111111111111111111111111111111110 01111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111011111111111111111011010111111110011111111011111111111011111011111011110101111111000111111111011111111111101111111111111110101101111 00111111111111111111111111111111111111111111000111111111111111111111111111111111111111111111111111111111110111111111110110111111011111111111111111111101111111111111111101111111110111111100011111111111111111111111111101101100000 01111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111101110111101011111111111111111111111111111111111111101111111101111111111111111111111111111111111111110111111100 00011111111111111111111111111111111111111111111111111111111100111111011111111111001111110110101111001111111101111111111111001111111100111111111001111101101011110110011111101010111101111111111111111111111010111100111111111111000 01101011111111111111111111111111111111111111111111111111111110101111111111111111101011111110011110111111110101001110111111101011011100111111111010010111001111110110101101111111111111111111111111111111110011111101111111010100111 01111011111111111111111111111111111111111111111111111111111111111111011111111111111111110111111111111111110110111111111111101011011111111111111111111101111111111111111101111111111101111111111111111111111111111111111111011111010 01101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110011111111111111111111111111111111111111111001111111111111111110111111111111111111111111111111111111111111111111100111110011000 01111111111111111111111111111111111111111111111111111111111011111111101111111110111111111011111111101111011110111111111110111101111001101111101111111110101011111011010111101111111110111111101111111111111111111110111111110100011 01101111111111111111111111111111111111111111111011111111111111111111111111111111111111111111111111111111111101011010111111111111110111101111111111111101011011111111111111011110111111111111111111111111111111111111111111111110101 00111100111111111111111111111111111111111111111011111111110100111111100011111101001111111000111111111111111110101011111101101011110010011111011011111111101011110110101111010001111110111111111111111111111111111101111100111110111

Where there are lots of 1's. Other hunks of this file are almost all

1's. So what we need is a not-very-general compression scheme, with the only "dictionary" entry being "the following is a hunk of 1's". So the decompressor could be very simple.

Interestingly, this is for a Spartan 2:

00000000000001001000000000000000 00000000000000000000000000000000 00000000000100100000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000001001000000000000000 00000000000000000000000000000000 00000000000100100100000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000001001100000000000000 00000000000000000000000000000000 11111111000100110000000100000100 00000000010001000000000000010000 00000000000001110100100000000000 11010100000000000011010000000000 00000001000000000000000000001000 00111111110001000000000000000000

Which has long runs of zeroes!

Just eyeballing these files, it looks like something very simple could get at least a 2:1 squash factor.

John

Reply to
John Larkin

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.