efinix bit stream question

- J
- John Larkin
  
  Contact options for registered users
posted
1 year ago

Sun, Nov 27, 2022 4:34 AM

We use the efinix T20 trion FPGA.

Questions about the config bit streams:

Are they always the same size, or does it depend on how much logic is compiled? Would a simple application use less?

Are the streams very compressible? We have done some simple run-length coding to greatly reduce the storage requirement for other FPGAs. Configs tend to have long runs of 0's.

The T20/256 claims to need 5.4 megabits. I'd like to store the fpga config and application code in a Raspberry Pi Pico, which has 2 MB of onboard flash. Storing the full config would use about a third of that, so reducing that would be useful.

- G
- Gerhard Hoffmann
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Sun, Nov 27, 2022 9:46 AM

Am 27.11.22 um 05:34 schrieb John Larkin:

With Xilinx it would for sure. Never used efinix, but I would consider it broken if it didn't.

I would simply test example files with zip, zcat and similar. IIRC, there is even a flow-through decompressor.

We have done some simple run-length

cheers, Gerhard

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Sun, Nov 27, 2022 4:16 PM

I'm at home and don't have access to a compiled bitstream, and this is a discussion group.

I'll get a T20 bit stream Monday or Tuesday and see what it looks like. If there are many runs of 0's, compression and decompression are very simple. Or maybe a typical stream is just shorter than the max.

I recall a Xilinx or maybe Altera stream that compressed about 3:1 with a very simple algorithm. I think I compressed runs of 0's and 1's on that one, with a PowerBasic program.

We considered fancier dictionary-based schemes, sort of like Zip, but they weren't worth the hassle.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Sun, Nov 27, 2022 4:24 PM

I recall the conclusion that the best dictionary entry for a random data block is itself. Zip doesn't compress random binary data files very well.

FPGA bit streams are nonrandom in having long runs of 0's.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Dec 1, 2022 9:12 PM

Here's a T20 bit stream. The length seems to be constant vs functions coded, but there are enough runs of all 0's that it's probably worth compressing.

formatting link

The actual config file will be binary, not hex of course.

- C
- Clifford Heath
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Dec 1, 2022 9:50 PM

Gzip compresses your 2.0MB down to 105kB. The decompressor isn't tiny, but it's fairly small. The lz4 decompressor is tiny and still gets to

221kB. Possibly less if you RLE first. bz2 gets it to 76kB, and xz or lzma to 72kB.

Compression is one area where it's best to rely on work done by people who understand the theory. Some of these algorithms have a tiny decompressor, the magic is in the compressor.

CH

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Thu, Dec 1, 2022 11:05 PM

Quick scan with one of my utilities gives:

Filename : \users\martin\downloads\Efinix~1.hex File size = 4071902 Entropy = 1.225 ( max. 5.545 ) States used = 3.40 ( max. 256 )

Zero frequency : 0-9 11-47 58-64 71-255

Most frequent bytes: 48 30 "0" 2198086 10 A ... 1357302 49 31 "1" 98740 52 34 "4" 97072 56 38 "8" 96870 50 32 "2" 94906 54 36 "6" 26994 51 33 "3" 26880 67 43 "C" 26478 57 39 "9" 25500 65 41 "A" 6820 53 35 "5" 5944

The hex file consists mostly of character "0" bytes and linefeeds. Simple run length encoding would compact it a lot. It seems "7","B","D","E","F" are quite rare in these files.

The raw binary file obviously won't have the linefeeds and will be only one byte for every three in the ASCII .hex file so about 1.3M.

Back of the envelope RLE might get you a ~20x decrease in size.

The right compressor and it could be made a lot smaller. If you put up the binary I'll scan that for byte entropy too.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Dec 2, 2022 12:15 PM

Binary looks to have incredibly high redundancy and compressibility. One of the lowest byte entropy scores I have seen in a long time.

There appear to be strong correlations of identical blocks at strides of

9, 12, 24, 36 as well as huge runs of nul bytes. The odd one of 0a.

Also a quick eyeball reveals walking ones 80,40,20,10,08,04,02,01,00 at around 107227 (stride 9).

There is an incredibly long run of 15372 nul bytes at offset 143811

RLE the nul bytes should get you most of the way there and maybe some code to RLE the most obvious repeated sequences if you need a bit more.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Fri, Dec 2, 2022 3:22 PM

My comment was about really random data. An FPGA bit stream certainly has repeated patterns. One might build a N-bit structure, a multiplier or accumulator or filter or DDS, and bit-slice blocks are very likely repeated N times.

Maybe I can find some college kid who'd like to do a project or thesus to find or code a minimal decomp algorithm for efinix+rasperry pi, in exchange for some pittance.

I can imagine some dictionary-based thing where a dictionary entry is its own first occurrence in the bit file. The decompressor is basically scissors and a pot of glue.

I was thinking of just compressing runs of 0's, but there could be a few other smallish patterns that might not be horrible to stash in the decompressor dictionary. That presents the question, are there patterns that are common to *all* T20 bit streams?

I need a low-paid lackey.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Mon, Dec 5, 2022 11:01 AM

I don't think an FPGA bitstream is anything remotely like random data. The vast majority of the bytes are zeroes (70%), then bytes with 1 bit set ~2% each, 2 bits set <0.7%. It depends how hard you are prepared to work. Bytes with more than 3 bits set are comparatively rare.

In your example the bytes 8A, A7, BF, DB, ED all appeared just once and the token BE did not occur at all.

In principle for this application you can afford to use insane amounts of CPU power to encode if it makes the decoder simpler and faster. My instinct is that it is only worth compressing enough to make room for whatever code has to fit into the same space.

I recall way back jumping through endless hoops to fit slightly more firmware code into 8k ROMs back in the days when 64k was a lot of ram.

I used to have a university sandwich student for a year and sometimes a student over the long vacation and give them projects that were interesting and otherwise wouldn't get done. The occasional one turned out to be exceptionally good. The rest did an OK job. It is only worth doing if they can finish a project that you don't have the time to do.

Usually something that involves taking a lot of raw data and looking to see if there is anything interesting going on.

Judging by the way it looks to my correlator I would expect LHA type algorithms to do rather well on it. There is an inordinate amount of block duplication. A few simple subs will easily get you under 250k.

What stops you from having one? But you will get more use out of one that is paid the going rate.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
1 year ago

Mon, Dec 5, 2022 3:27 PM

Just kidding. We pay very well.

If we do a product line around raspberry pi, we could piggyback on the enormous physical and people culture. I've never seen anything like it.

formatting link

We might sponsor 5 or 10 smart poor high school or college kids, steer their paths a bit, give them summer projects or jobs, hire a couple of the best when they graduate.

Pi has enormous momentum so should be around for a while.