Re: ESR Meter - Roll your own - ESRrev0.JPG

I'm looking for something that will be fast and easy to decode at fpga config time.

I just did some statistical analysis of a bunch of existing Spartan 3 bit streams. If you treat them as bytes, and histogram the byte codes, there's some impressive stats, with code 0x00 of course dominant, then

0xFF, and the next 16 most common codes huge, tapering off pretty hard. That makes sense, since LUT bits are usually zero, so simple codes like 0x01...0x80 and simple 2-bit combos are most common. Block ram is usually 0 in our designs, too. So a fixed dictionary should work pretty well.

So it looks like a byte stream would do, with byte codes that say stuff like...

00000000 end of file

001nnnnn make N zero bytes, N=1 to 31

010nnnnn make N 0xFF bytes, ditto

011nnnnn the following N bytes are raw, unencoded stuff

1nnnnnnn look up code N in dictionary, N = 1 to 127

where the dictionary is a list of the most common 96 single bytes followed by the most common 32 byte pairs. Net compression in this case is just about 1:1!

Something like that.

Now the question is, how much compression will this give? I suppose I'll have to code it and see. I need less than 2:1 now, and that shouldn't be hard.

John

Reply to
John Larkin
Loading thread data ...

Are you reinventing Huffman coding ?

formatting link

If you have a CPU somewhere to do the decompression, check miniLZO. =

Decompression is fast and the code is very small ! Plus, it's an open source library that works, and you don't have to cod= e =

it. be lazy...

formatting link

Reply to
PFC

Run-length is pretty simple. The reason this caught my eye was we were looking at doing some invention here. As it turned out there was no inventing to be done because the problem was so simple.

I think you'll find that it's not only a byte of '00s' that's dominant, but *many* bytes. IIRC a '0' turns off a passgate and '1' on. There are *MANY* more off passgates than on. This, and the fact that many cells aren't used at all (even columns), makes run-length work out very nicely.

That should be a piece of cake. Again, look at the Xilinx app notes. I remember there being more information in there. Watch bad encoding though. Bad things happen to FPGAs with random bits set.

--
  Keith
Reply to
krw

Depends on what you want to use to unpack this, and what from.

Are the cells 8 bit multiples in size ?

Run-Length coding can work well, and even an extremely simple form like (eg) a 12 bit RLC field of

nnnnVVVVVVVV

where nnnn loads a counter (can be non linear - ie mini-table) and repeats the VVVVVVVV pattern until the counter hits zero, then another frame is fetched from the SPI memory.

That sort of coding can fit into a CPLD/Serial FLASH combo, which can give fast loading, and usefull compression.

Dictionary based compression needs table storage, so is more suited to small uC. Config clock speed is likely to be lower. (if that matters?)

-jg

Reply to
Jim Granville

An PC will build a rom image from uP hex code (S28 files, from an assembler) and one to as many as 6 FPGA config files (.RBT files, from the Xilinx tools) We burn a rom and it gets plugged into the end-product. At startup time, the uP code runs and early-on it will unsquash the compressed FPGA configs and load the chips, usually by bit-banging a parallel port. This can take several seconds already, so I don't want to slow it down much more.

The bit patterns in a Xilinx config file sure look like they are.

That requires treating the squeezed data as a stream, which is a nuisance to unpack, and has a lot of overhead in the "random" regions of the config data. The data tends to be long runs of 0's and clumps of messy stuff. I'm thinking that a byte-oriented dictionary thing might be able to have roughly zero overhead in the data clusters, and be very efficient in coding the long runs of zeroes and the occasional

0xFFs. But if bytes do tend to repeat, there might be a way to express that and pack a little better.

This is a 68332 uP, and a fixed dictionary wouldn't be bad, just a couple of hundred bytes in rom maybe. My whole application program, tables and all, comes in under 10K bytes!

John

Reply to
John Larkin

If you have a 68332 'going spare' for this, then your approach is prety spot on. You could check the streams to see if repeat-pattern was common enough to encode.

- and you could define the count via a dictionary/table as well. [eg 5 bit index into 32 x 16 bit values ]

-jg

Reply to
Jim Granville

Here's a byte histogram of a typical Spartan 3 config file. Most of the config bits are zero. I read one Xilinx appnote about the rad hardness of sram-based FPGAs, and they got the upset rate calculation down by noting that most config bits actually don't matter!

80,413 NONZERO BYTES

============== SORTED =================

NNN Byte Hex Frequency

0 0 0 181,476 1 255 FF 5,809 2 1 1 5,390 3 128 80 3,991 4 4 4 3,867 5 8 8 3,815 6 16 10 3,785 7 32 20 3,653 8 2 2 3,364 9 48 30 3,282 10 64 40 3,139 11 12 C 1,941 12 3 3 1,587 13 192 C0 1,472 14 36 24 1,038 15 112 70 1,009 16 20 14 936 17 10 A 824 18 144 90 706 19 40 28 699 20 24 18 678 21 9 9 631 22 80 50 626 23 160 A0 602 24 5 5 573 25 96 60 571 26 6 6 558 27 14 E 558 28 56 38 503 29 18 12 498 30 176 B0 487 31 19 13 480 32 60 3C 461 33 28 1C 454 34 129 81 449 35 153 99 449 36 136 88 441 37 13 D 411 38 34 22 408 39 33 21 405 40 195 C3 399 41 68 44 389 42 224 E0 383 43 204 CC 381 44 44 2C 366 45 130 82 358 46 170 AA 354 47 72 48 351 48 7 7 345 49 132 84 345 50 17 11 324 51 52 34 312 52 140 8C 308 53 65 41 299 54 240 F0 294 55 200 C8 290 56 120 78 279 57 66 42 274 58 29 1D 271 59 131 83 267 60 139 8B 242 61 49 31 236 62 30 1E 218 63 254 FE 218 64 157 9D 217 65 58 3A 216 66 45 2D 212 67 196 C4 207 68 138 8A 194 69 199 C7 191 70 51 33 191 71 227 E3 188 72 74 4A 188 73 26 1A 182 74 168 A8 181 75 50 32 180 76 37 25 173 77 122 7A 173 78 152 98 173 79 187 BB 172 80 85 55 166 81 252 FC 164 82 184 B8 161 83 156 9C 152 84 165 A5 146 85 22 16 146 86 193 C1 144 87 238 EE 139 88 42 2A 139 89 148 94 139 90 104 68 137 91 76 4C 135 92 88 58 134 93 137 89 133 94 59 3B 132 95 92 5C 130 96 11 B 129 97 84 54 122 98 102 66 119 99 208 D0 118 100 124 7C 115 101 71 47 114 102 15 F 112 103 250 FA 108 104 35 23 106 105 62 3E 103 106 41 29 101 107 251 FB 98 108 25 19 96 109 188 BC 96 110 21 15 95 111 54 36 93 112 61 3D 92 113 207 CF 91 114 116 74 89 115 55 37 88 116 143 8F 88 117 57 39 86 118 126 7E 85 119 90 5A 83 120 236 EC 83 121 226 E2 80 122 100 64 80 123 211 D3 77 124 146 92 77 125 221 DD 76 126 241 F1 76 127 82 52 75 128 203 CB 72 129 145 91 72 130 243 F3 70 131 194 C2 69 132 81 51 66 133 142 8E 66 134 46 2E 63 135 180 B4 63 136 67 43 61 137 239 EF 60 138 27 1B 60 139 70 46 60 140 253 FD 59 141 97 61 59 142 154 9A 58 143 186 BA 57 144 197 C5 55 145 134 86 55 146 53 35 54 147 108 6C 54 148 69 45 53 149 191 BF 52 150 99 63 52 151 38 26 52 152 158 9E 52 153 98 62 52 154 172 AC 52 155 247 F7 50 156 179 B3 49 157 190 BE 48 158 113 71 47 159 248 F8 47 160 141 8D 46 161 175 AF 45 162 115 73 45 163 201 C9 45 164 225 E1 45 165 78 4E 45 166 114 72 44 167 162 A2 44 168 245 F5 43 169 189 BD 43 170 73 49 43 171 228 E4 43 172 135 87 42 173 133 85 42 174 216 D8 42 175 223 DF 41 176 202 CA 39 177 161 A1 38 178 87 57 37 179 185 B9 37 180 94 5E 36 181 23 17 34 182 31 1F 34 183 147 93 34 184 118 76 34 185 163 A3 33 186 77 4D 32 187 125 7D 32 188 174 AE 32 189 178 B2 32 190 86 56 31 191 106 6A 31 192 149 95 30 193 234 EA 30 194 198 C6 29 195 206 CE 29 196 242 F2 29 197 232 E8 28 198 164 A4 27 199 220 DC 27 200 244 F4 26 201 39 27 25 202 93 5D 24 203 177 B1 24 204 121 79 23 205 150 96 23 206 105 69 22 207 83 53 21 208 89 59 20 209 182 B6 20 210 110 6E 20 211 119 77 18 212 95 5F 18 213 63 3F 18 214 181 B5 17 215 117 75 16 216 212 D4 16 217 219 DB 15 218 101 65 15 219 230 E6 15 220 218 DA 15 221 43 2B 14 222 205 CD 14 223 209 D1 14 224 171 AB 13 225 75 4B 12 226 235 EB 11 227 169 A9 11 228 222 DE 11 229 79 4F 10 230 127 7F 10 231 123 7B 10 232 151 97 8 233 215 D7 8 234 47 2F 8 235 109 6D 8 236 173 AD 8 237 214 D6 8 238 166 A6 8 239 167 A7 7 240 213 D5 7 241 103 67 6 242 217 D9 6 243 233 E9 5 244 246 F6 5 245 111 6F 4 246 229 E5 4 247 210 D2 4 248 155 9B 3 249 91 5B 3 250 249 F9 3 251 231 E7 2 252 183 B7 2 253 159 9F 2 254 237 ED 1 255 107 6B 1
Reply to
John Larkin

On this one, that's appx 70% zeroes. Could be interesting to see the 'RunCount' values - maybe something like the 2 smallest, 2 highest, and 7 most common run lengths, in 11 more columns ? to help design the best algorithm. I also notice that a Single HI bit, is very common, so even if that does not repeat, a specific compression coverage for those would cover 38.5% of the non zero bytes ?

-jg

Reply to
Jim Granville

Still working on this after 3 years?

The following site looks promising if you can read German:

formatting link

--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl
Reply to
Nico Coesel

Another optimisation may come from compressing the frame headers and deleting unused areas. Xilinx configuration data (which also includes commands BTW) is organised in frames with a CRC, preamble, location and postamble. There are appnotes on partial (re)configuration from Xilinx explaining these in more detail. The initialisation frames of blockrams may be omitted completely if they are not initialised which saves several kB.

--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl
Reply to
Nico Coesel

Done something similar some time ago, to save some flash space for bitsream on a relatively small micro.

Check this thread on this newsgroup:

formatting link

or search with google groups for "compressing Xilinx bitstreams, some test data".

There are some measures from my real design, and a link to the source code.

Please share your results with the group!

"John Larkin" ha scritto nel messaggio news: snipped-for-privacy@4ax.com...

Reply to
Antonio Pasini

It was interesting three years ago. This month, it's mandatory.

John

Reply to
John Larkin

I wrote the following two small C programs in 2001 for an Altera FPGA. You can't get it much simpler, and from a whole bunch of config files it gets about a 50% compression factor (plus or minus 15%).

You may need to set the 'most-common value' from 0x00 to 0xff - I don't have a lot of Xilinx bitstreams here to check what's best.

You could actually check the 'golden' bitstream to count which byte value occurs the most...

Good luck!

Ben

Reply to
Ben Twijnstra

2/3 of the data is empty of bits

all bits set is next most common.

followed by one bit set.

and two bits set....

are there any longer repeating patterns or is the data pretty random byte by byte

Reply to
Jasen Betts

OK, we have a company-standard rom-builder program that gobbles up S28-format uP code and Xilinx .RBT files and builds binary rom images, sort of a barbaric linker. I modified it to accept a /RLL switch, to do essentially the same thing as your code (but recoded it in PowerBasic!) It's byte oriented, and whenever I see a 0 byte, I output it and count the total number of 0's in the run, then poke that count into the next output byte. A single input 0 then becomes two bytes (0,1) and a run of 100 zeroes becomes two bytes, (0,100). Nonzero bytes just get shipped out unchanged.

I added 32 extra zero bits at the end (0,4) because the chips seem to like that, and then (0,0) as the end-of-file marker, to tell the uP config code when to quit. We also added a /FILL switch to load unused image bytes with all 1's, to make eprom programming a bit faster.

A relatively simple VME interface chip, an XC3S200, crunches to 0.23 of the original size, over 4:1 compression.

A fairly hairy 8-channel massively pipelined DDS synthesizer in an XC3S1500, using lots of resources, compresses to 0.43. If we initialize the block rams, it's more like 0.46.

We could also RLL the 0xFF bytes, but that would help only marginally. We have come up with some trickier dictionary-based algorithms, but the RLL ratios are much better then we need at present, so we'll just go with this.

Runtime unpacking and config bit-banging should be fast. When we see a null byte, we can hop to a brute-force routine that reads the next byte and bangs out up to 2040 zero bits as fast as it can wigwag the CCLK port bit. It should be much faster than the old loop, which just shifted all the data bits out.

Thanks,

John

Reply to
John Larkin

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.