Trion bitstream compression test

J

John Larkin 3 years ago

Suppose we read in an FPGA bitstream file as 16-bit words, and want to output a compressed file, to save room in a small flash chip.

If an input word is 0000, output a single 0 bit to the compressed file.

If a word is nonzero, output a 1 followed by that 16-bit word.

The decompressor will be very simple.

I wrote a little PowerBasic program to test the compression, the input being an actual Efinix T20 bit stream file.

formatting link

Vote

R

Ricky 3 years ago

I don't think it would take much to improve that significantly. I would think the emphasis would be on keeping the decoder as simple as practical. The above algorithm would require a lot of shifting, which is often done one bit at a time on MCUs, so a bit slow. Better to split the encoded stream in two, the "encode" bits indicating a 16 bit zero string, vs. the 16 bit literal; and the stream of 16 bit literals. The first word in the file would be the length in bits of the "encode" bits, which divided by 16, gives the offset to the start of literal words.

The encoder would be run on the PC running the FPGA software, and the decoder is a simple algorithm reading one bit at a time from the "encode" bits and building the uncompressed stream one word at a time, which will likely be shipped to the FPGA as generated. Shifting all the data by arbitrary amounts would not be required. Lots of CPU cycles saved, and the configuration time is not impacted... as much.

Vote

D

Dimiter_Popoff 3 years ago

If you get the results you need using it why not. The more trivial RLE is not too resource hungry and easy to implement, IIRC (have done it last over a decade ago, our VNC (RFB) server does it all the time during framebuffer transfers).

Vote

R

Ricky 3 years ago

I don't think we've seen an indication that the test case is a design that uses a large percentage of the FPGA logic. The compression ratio of a sparse design is not important. What will need to be considered, is the limit of compression for a worst case. That will determine the required amount of storage space to be set aside if the design is to be updated, especially in the field.

The test case compressed by 60%. It is entirely possible, that with a less sparse design, the bit stream will have very few words with only zeros. For compression to be viable in a limited size of compressed storage, the compression needs to be effective for most designs. There will be a significant fraction of zero bits, even in a design that uses all the logic, but they may well not form many zero words. It could easily turn out that RLE compression would work much better than such a simplistic encoding as simply replacing zero words.

Trust, but verify. (test)

Vote

M

Martin Brown 3 years ago

Well you *could* do that but if the sample file you posted a while back is any guide to their contents you might want to investigate the run length N of consecutive zero bits that gives maximum compression.

My instinct is that it may be somewhere near 120000. Since there are entire blocks of 15k zero bytes in the test file.

0 bit counts with N = 5,6,7, 8, 16, 32, 64 ... might also be worth investigating and stop when it no longer yields additional compression.

Your sample file was 70% 00000000 and 20% with a single bit set. Its natural word length appears to be the 8 bit byte.

My preference would be to encode it as 8 bit words and use the symbols which never occur or occur just once in the entire file to represent the following (not necessarily in this order).

15k run of zeroes 00 000 0000 00000000

and <token>,<token> for the token itself

Vote

J

John Walliker 3 years ago

However, in this case the MCU is an RP2040 which is a dual-core ARM M0+ which has a 32-bit barrel shifter. There are plenty of resources for more complex decompression.

Vote

J

John Larkin 3 years ago

The code to get the next bit from a file is not complex.

Vote

J

John Larkin 3 years ago

I did consider the chunk size N at 32 bits.

If N=16, the compression gain for data=0 words is 16:1, and the penalty on top of nonzero words is 6%. That's not bad for a simple decoder.

0.4 file size ratio is pretty good. I like simple and done. I can explain this to a smart intern in minutes.

I'll run my little Basic program on some more complex FPGA bit streams, as soon as we make some. The current project has a serial flash chip per FPGA so doesn't need compression, but a new Pi-based product line might benefit from compression. I expect the FPGA designs to be fairly simple or absurdly simple. We will use the T20 chips because we could get them and bought a bucket full for $10 each.

Clocked slow and not working hard, no PLLs, the 1.25 volt core power is milliamps. A tiny linear regulator from +5 will work fine.

My guys say that compared to the big-guy FPGAs, the efinix design suite looks like garage-level engineering. That's a feature, not a defect. A lot of the suite is in plain-sight Python and there's no FlexLM horrors to fight.

Vote

D

Dimiter_Popoff 3 years ago

I think you worry a bit too much about decompression complexity. Decompression is much simpler/less computationally intensive than compression, even if you do something like LZW. Probably you won't need anything like that of course but if you do it won't be a killer.

Vote

J

John Larkin 3 years ago

A Pi Pico has 2 Mbytes of flash. A non-compressed (5.4 Mbit for ours) T20 bit stream takes about a third of that. 0.4 compression gets that down to about 1/8.

That sounds worth doing if it's easy and doesn't become a big project, and doesn't need a big decoder.

Vote

R

Ricky 3 years ago

There's no point in discussing what is and what is not a good algorithm until an appropriate test file is obtained. Long runs of zeros are going to become much less common in a densely used device. Playing with a trivial test file, is of no real value.

If the choices of potential designs is small enough, they can simply be identified by a single integer. That's real compression. LOL

Vote

M

Martin Brown 3 years ago

I don't think it would take much to improve that significantly. I would think the emphasis would be on keeping the decoder as simple as practical. The above algorithm would require a lot of shifting, which is often done one bit at a time on MCUs, so a bit slow. Better to split the encoded stream in two, the "encode" bits indicating a 16 bit zero string, vs. the 16 bit literal; and the stream of 16 bit literals. The first word in the file would be the length in bits of the "encode" bits, which divided by 16, gives the offset to the start of literal words.

I think you can make a pretty good guess that there will be a lot of zero bits in most of the FPGA designs that actually do something useful.

The distribution of non zero bytes in his sample file was consistent with the probability of any given bit being a 1 very small ~2%. So that

1 bit x 2 bits x^2 3 bits x^3 4 bits x^4 etc.

x ~ 1/40

By the time you get to 7 bits there were some possible states that were not present at all.

Given that statistical dominance of zero bits a simple byte based rule

0+x (0<x<127) means 1 to 128 0 bits 1+y means actual value 1+y

Should trounce the original suggestion hands down. Encoding it probably makes sense to treat it as literally a bitstream since 1's are so rare!

Vote

R

Ricky 3 years ago

Larkin's program doesn't compress zero bits. It only compresses zero words. That's my point. Until you know more about the bitstream for a densely used design, you can't gauge the compression scheme.

None of this means anything if you are working with a sparsely filled FPGA and you want to allow for a densely used FPGA. Bits are not set randomly in a configuration file, so a statistical approach is unlikely to provide much insight.

It *is* a bit stream. Organizing it into bytes or words has no meaning to the bitstream. Ones won't be so rare in a fully utilized part. I'm not saying there won't be useful compression when the design is close to full, but clearly, if the algorithm is only finding 60% of the words are compressible, that's going to get a *lot* worse as the design fills up. It's likely the compression curve vs. part utilization will be a gentle slope at low utilization, and drop off rapidly on reaching the higher numbers. It will also vary between designs, even at the same utilization.

There's only one way to find out. Pretending to know the properties of the distribution based on a single sample is not remotely realistic.

Vote

L

Lasse Langwadt Christensen 3 years ago

søndag den 11. december 2022 kl. 04.13.39 UTC+1 skrev John Larkin:

this RLE

formatting link

is slightly better .. checking Efinix.bin uncompressed size: 5429200 rle size: 2211264 vpackbits size: 2183728

Vote

J

Joe Gwinn 3 years ago

There is a theoretical limit, useful to see what is possible and how much is being left on the table.

.

formatting link

If one dips below the minimum number of bits, lossless compression is impossible, because multiple data patterns will map onto a single compressed-data sequence (that is, collide), and so all those data sequences cannot be uniquely recovered in decompression.

.

formatting link

Joe Gwinn

Vote

W

whit3rd 3 years ago

And, there has to be some framing; you cannot usefully consider the data stream to be interrupted by the '0' in a compressed file, unless you know that the position of that '0' is a flag, not a bit in the stream. Practically, something like '1' or '0' signalling has to be stored in a position in ROM just like the bits you send, but has to be NOT a bit being sent from the store... So, if there's long 0 strings, maybe store 8-bit string lengths, with 255 meaning send 256 bits from the table, and 0 meaning send 256 zeros, and 1 through 254 being available for justification up to the next string-of-zeros, or string-of-ones, or whatever.

Vote

M

Martin Brown 3 years ago

Lossless compression can be impossible for some already compressed formats. The "compressed" file can be longer if a truly incompressible file meets a less good compressor. Good archivers spot this and copy.

Bytewise entropy is a crude but effective way to classify files by their approximate compressibility. You can classify file type this way:

If the frequency of each pattern x is N_x then sum_x [ N_x ] = N

p_x = N_x/N

Entropy = Sum_x [ p_x log(p_x) ]

You can choose the size of a byte length although 8 and 16 are favourites. It isn't helpful for byte length 1 since 0log0 = 1log1 = 0.

However, the proportion of 0's and 1's in the file gives you a very good clue how you might compress it most easily. It also doesn't spot spatial correlations merely the frequency of each symbol in the alphabet. So from a byte entropy point of view these sequences are equal entropy:

ABCDABCDABCDABCD and ABDCADDBCCBACDAB

If there is a natural word length in the bitstream then computing this byte entropy test for each possible length of byte 2,3,4,... will find it/them - there may be more than one.

Vote

Trion bitstream compression test

Join the Discussion

Didn't find your answer?