efinix bit stream question

We use the efinix T20 trion FPGA.

Questions about the config bit streams:

Are they always the same size, or does it depend on how much logic is compiled? Would a simple application use less?

Are the streams very compressible? We have done some simple run-length coding to greatly reduce the storage requirement for other FPGAs. Configs tend to have long runs of 0's.

The T20/256 claims to need 5.4 megabits. I'd like to store the fpga config and application code in a Raspberry Pi Pico, which has 2 MB of onboard flash. Storing the full config would use about a third of that, so reducing that would be useful.

Reply to
John Larkin
Loading thread data ...

I don't know about Efinix, but bitstream compression is not unusual in the industry. Bitstreams tend to have a lot of compressibility without using fancy algorithms.

It's easy enough to test. Construct a simple design and compile one instance, look at the size, then instantiate multiple instances and check the size again. There is likely a control somewhere to enable/disable compression if it's available.

So you compress for storage and expand prior to downloading? I assume this is done on the fly? RLL encoding?

2 MB is pretty small these days. The trouble with compression is it may not provide much reduction in size as the design fills up, but then again, maybe it still does.

Efinix is the one with little dedicated routing, instead using the logic elements for routing, right? I've not done enough research to tell how large a part is needed for a given size design. I recall a line of FPGAs from Atmel that was like that. It was not so good in the end. I think, before I used any of their parts, I would want to compile a design using tools from another, conventional FPGA maker and Efinix and see how they compare.

Gowin is a lot less expensive. So far, I like them. But they are Chinese, so I may not be able to use their parts.

Reply to
Ricky

Am 27.11.22 um 05:34 schrieb John Larkin:

With Xilinx it would for sure. Never used efinix, but I would consider it broken if it didn't.

I would simply test example files with zip, zcat and similar. IIRC, there is even a flow-through decompressor.

We have done some simple run-length

cheers, Gerhard

Reply to
Gerhard Hoffmann

søndag den 27. november 2022 kl. 10.46.54 UTC+1 skrev Gerhard Hoffmann:

Xilinx only if you use bit file compression and don't use encryption

Reply to
Lasse Langwadt Christensen

I'm at home and don't have access to a compiled bitstream, and this is a discussion group.

I'll get a T20 bit stream Monday or Tuesday and see what it looks like. If there are many runs of 0's, compression and decompression are very simple. Or maybe a typical stream is just shorter than the max.

I recall a Xilinx or maybe Altera stream that compressed about 3:1 with a very simple algorithm. I think I compressed runs of 0's and 1's on that one, with a PowerBasic program.

We considered fancier dictionary-based schemes, sort of like Zip, but they weren't worth the hassle.

Reply to
John Larkin

I recall the conclusion that the best dictionary entry for a random data block is itself. Zip doesn't compress random binary data files very well.

FPGA bit streams are nonrandom in having long runs of 0's.

Reply to
John Larkin

Here's a T20 bit stream. The length seems to be constant vs functions coded, but there are enough runs of all 0's that it's probably worth compressing.

formatting link
The actual config file will be binary, not hex of course.

Reply to
John Larkin

Gzip compresses your 2.0MB down to 105kB. The decompressor isn't tiny, but it's fairly small. The lz4 decompressor is tiny and still gets to

221kB. Possibly less if you RLE first. bz2 gets it to 76kB, and xz or lzma to 72kB.

Compression is one area where it's best to rely on work done by people who understand the theory. Some of these algorithms have a tiny decompressor, the magic is in the compressor.

CH

Reply to
Clifford Heath

Not really. If you need to send the data to Voyager 1, then every bit matters. For this work, there is no need for "best", or even "very good". Accommodation must be made for variety in the design, including growth. There is every reason to think that a design with more used elements would compress less. So if a simple compression method (such as RLL) gets the design down to 400 kB, there's not much need to get it down to 100 kB.

Compression works by finding repeatability in the data. In this case, the best compression would likely be an algorithm that is aware of the structure of the bit stream. In other words, a custom. Since "best" is not needed, something simple and low effort is probably best.

Reply to
Ricky

Quick scan with one of my utilities gives:

Filename : \users\martin\downloads\Efinix~1.hex File size = 4071902 Entropy = 1.225 ( max. 5.545 ) States used = 3.40 ( max. 256 )

Zero frequency : 0-9 11-47 58-64 71-255

Most frequent bytes: 48 30 "0" 2198086 10 A ... 1357302 49 31 "1" 98740 52 34 "4" 97072 56 38 "8" 96870 50 32 "2" 94906 54 36 "6" 26994 51 33 "3" 26880 67 43 "C" 26478 57 39 "9" 25500 65 41 "A" 6820 53 35 "5" 5944

The hex file consists mostly of character "0" bytes and linefeeds. Simple run length encoding would compact it a lot. It seems "7","B","D","E","F" are quite rare in these files.

The raw binary file obviously won't have the linefeeds and will be only one byte for every three in the ASCII .hex file so about 1.3M.

Back of the envelope RLE might get you a ~20x decrease in size.

The right compressor and it could be made a lot smaller. If you put up the binary I'll scan that for byte entropy too.

Reply to
Martin Brown

I've never heard of storing the bit stream in an ASCII file. FPGA bit streams are binary data. But maybe I'm just not remembering. It has been a while since I mucked with it at that level.

I had a design compiled for the 3.3V core voltage version of a chip. There was also a 1.2V core voltage version, which was the same chip, with the LDOs turned into bypass. Seems they use a bond wire to flip a bit in a status register the JTAG reads to distinguish the two. But the JTAG software checks, and you need a file that matches the ID value. I had to find this ID and then recompute the checksum (maybe a simple 8 bit add, rather than a CRC). I think that was an ascii file now that I think of it. I guess they use ASCII to make it easier to see what's what if you have to view it.

But the underling data is the binary equivalent to the ASCII, so the 3:1 gain of turning the ASCII data into binary is not really relevant. That's more a matter of discarding the pretty printing formatting. In fact, I'm pretty sure the Xilinx bit streams I've seen are binary. There was no translation in sending them to the chips. I expect this file is a .hex simply for purposes of sharing.

The sparsity of non-zero data in the file gives you an idea of the amount of unused resources in FPGAs. That's why they need the latest fab processes to be economical. They have much more silicon area than virtually any other device for the amount of resources actually used.

Reply to
Ricky

Not really what? Spouting words without meaning again, you are such an adversarial dope.

I was talking about the relative code complexity of a compressor compared to its matching decompressor. A decompressor can be tiny, which is a quality JL seemed to be concerned with.

Eggs, grandma's. And you're actually even wrong; it's predictability, not repeatability that matters. Compressors remove whatever is predictable (using whatever kind of prediction is appropriate - not just repetition).

Reply to
Clifford Heath

Why do you have to respond so adversarially? This is on you.

I know exactly what you said. That was in response to

You snipped the useful content. Ok, fine, but why argue if you don't want to actually discuss anything I've said?

I'm simply pointing out that there are many compression algorithms that are very simple and do not, in any way, require "people who understand the theory." at least it's not required to understand more than the basics.

Sure, if you say so.

Yes, technically that's correct. Give yourself a star, you bested me!

In relation to the FPGA bit stream, there are long sequences of zeros, as someone has analyzed, but also other repeating sequences, which will result in much less compression by recognizing. Recognizing the long strings of zeros obtain the first order compression and likely the second order compression as well.

Rather than finding more compression, a more important point, is recognizing that this is not static, but is data that will get updated when the design changes. This can result in significant growth of the compressed stream and so extra space needs to be provisioned, and this makes the amount of compression less advantageous. Interestingly enough, even on a part that with a high logic utilization factor, there will still be lots of zero bit strings. Much of FPGA real estate is not used, even in a dense design. More of the chip is routing than logic, and the routing almost never has a high utilization.

Are you done with your tantrum yet? I'm happy to discuss this rationally, if you are.

Reply to
Ricky

Yes, we want a very small decompressor.

I'm thinking of designing some small boxes with a Raspberry Pi Pico as the computer. Here's the rough idea:

formatting link
The Pi has only 2 MB of flash and 256KB of sram. An uncompressed T20 binary (not hex) bit stream would use about 1/3 of the flash. Compressing that by even 2:1 would help. Looking at the hex, there are long runs of 0's, which are the obvious compression target.

There must be a zillion kids around who are already working with Pi's.

Of course we could use a separate serial flash chip connected directly to the FPGA to store the config, but that would be inelegant.

Reply to
John Larkin

fredag den 2. december 2022 kl. 06.23.52 UTC+1 skrev John Larkin:

but what would you need much more flash for ?

anyway,

formatting link

Reply to
Lasse Langwadt Christensen

Binary looks to have incredibly high redundancy and compressibility. One of the lowest byte entropy scores I have seen in a long time.

There appear to be strong correlations of identical blocks at strides of

9, 12, 24, 36 as well as huge runs of nul bytes. The odd one of 0a.

Also a quick eyeball reveals walking ones 80,40,20,10,08,04,02,01,00 at around 107227 (stride 9).

There is an incredibly long run of 15372 nul bytes at offset 143811

RLE the nul bytes should get you most of the way there and maybe some code to RLE the most obvious repeated sequences if you need a bit more.

Reply to
Martin Brown

Possibly. There are a hell of a lot of nulls in the binary ~70% Some very long sequences of >15k nulls too. All the rest are a few hundred or less. My program sees 17 blocks of 15372 nulls in the file. (it is expecting a damaged JPEG)

15372 = 2^2.3^2.7.61

Which seems to me a very odd random constant length for a block!

I suspect a bit like with EPROM programmers there are development tools around which expect a .HEX file. The binary would be much more meaningful for working out a compression strategy. FPGA isn't my thing.

It is more easily readable by a human I suppose.

Very probably.

I'm curious about the obvious walking ones patterns in it.

The nulls I can account for as unused parts of the functionality. The length of them seems peculiar though (I expected 2^N).

Reply to
Martin Brown

My comment was about really random data. An FPGA bit stream certainly has repeated patterns. One might build a N-bit structure, a multiplier or accumulator or filter or DDS, and bit-slice blocks are very likely repeated N times.

Maybe I can find some college kid who'd like to do a project or thesus to find or code a minimal decomp algorithm for efinix+rasperry pi, in exchange for some pittance.

I can imagine some dictionary-based thing where a dictionary entry is its own first occurrence in the bit file. The decompressor is basically scissors and a pot of glue.

I was thinking of just compressing runs of 0's, but there could be a few other smallish patterns that might not be horrible to stash in the decompressor dictionary. That presents the question, are there patterns that are common to *all* T20 bit streams?

I need a low-paid lackey.

Reply to
John Larkin

That's interesting. Lots of flash and it looks like a USB C connector. We could buy a bunch and then design our own equivalent if it ever goes EOL.

We want an RTOS, USB, ethernet stack, javascript web pages, and a fair heap of application code. Maybe the standard Pi is underpowered, too much risk.

The RP2040 has a 16 KB execute-in-place cache, which runs code in the serial flash. That's not going to be super fast, but when we need number crunching we would do that in the FPGA. Many of the boxes that I'm considering will be slow and not even need an FPGA.

Reply to
John Larkin

I don't program FPGAs any more; I have kids that learn and fight the tools. I prefer to architect products and draw schematics. I asked one of my engineers to give me an efinix config file and that hex thing was it.

Eventually we'd build the application code (compiled c) and the compressed FPGA config into a single file. The Pi has a boot mode that makes it look like a memory stick and we'd just drag the runtime file onto the Pi. That is just one very cool feature of the Pi. They seem to have done everything right.

Reply to
John Larkin

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.