Open source Verilog BCH encoder/decoder

As part of my research, I needed a BCH encoder/decoder engine. Sadly, such a thing has no existed under a permissive license. Even more depressing is that many students seem to submit Verilog or VHDL engines as a project (or even research), but never release anything that is usable.

Anyway, I'm releasing a BSD licensed Verilog BCH encoder/decoder. It offers :

  • Parallel input/output
  • Modular components that can be shared across multiple decoders
  • Automatic selection of BCH parameters based on data size and errors to be corrected
  • Specialized error locators for 1 error and 2 error codes
  • Parallel or serial error polynomial generator for codes with 2 or more er rors

formatting link

I'm releasing this under BSD because I'd like to see the code used as widel y as possible, but I'd still like to get feedback and hopefully improvement s merged back in.

As an example, a decoder for a 512 byte data block that corrects up to 12 e rrors with an 8 bit wide input and an 8 bit wide output currently occupies

1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. Such a de coder would take input for 532 clock cycles (512 data bytes, 20 ecc bytes), calculate for about 28 clock cycles, and then produce output for 512 clock cycles.

The code currently compiles on Icarus Verilog (latest git) and Xilinx XST/I sim (tested with 14.5).

Reply to
Russell Dill
Loading thread data ...

the link is expired, Can you the share it. Even i doing research on BCH Encoder and Decoder.

Thank You.

Reply to
manojb.jgi

the link is expired, Can you the share it. Even i doing research on BCH Encoder and Decoder.

Thank You.

formatting link

Reply to
Andy Bennett

Which link is expired? The only link in the post is to github, which is fine.

Reply to
russ.dill

h a thing has no existed under a permissive license. Even more depressing i s that many students seem to submit Verilog or VHDL engines as a project (o r even research), but never release anything that is usable.

rs:

be corrected

errors

ely as possible, but I'd still like to get feedback and hopefully improveme nts merged back in.

errors with an 8 bit wide input and an 8 bit wide output currently occupie s 1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. Such a decoder would take input for 532 clock cycles (512 data bytes, 20 ecc bytes ), calculate for about 28 clock cycles, and then produce output for 512 clo ck cycles.

/Isim (tested with 14.5).

Awesome code, thanks for making it available !

Simulations run great on Icarus.

However, I'm having execution time trouble on XST.

On a Thinkpad T540P with 16 GB DDR3, a DATA_BITS=1024,T=8,BITS=8 is s till synthesizing after 12 hours (note that 1 of 4 CPUs is fully utilized s o it's not a case of continual memory swapping).

What compute capabilities did you use for DATA_BITS=4096,T=12,BITS=16 on XST ? How long did it take ?

I am using -loop_iteration_limit 2048 (for my case, 1024 barfs out) and

-opt_level 2

Thanks !

Reply to
paul.sc

uch a thing has no existed under a permissive license. Even more depressing is that many students seem to submit Verilog or VHDL engines as a project (or even research), but never release anything that is usable.

fers:

o be corrected

e errors

idely as possible, but I'd still like to get feedback and hopefully improve ments merged back in.

12 errors with an 8 bit wide input and an 8 bit wide output currently occup ies 1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. Such a decoder would take input for 532 clock cycles (512 data bytes, 20 ecc byt es), calculate for about 28 clock cycles, and then produce output for 512 c lock cycles.

ST/Isim (tested with 14.5).

still synthesizing after 12 hours (note that 1 of 4 CPUs is fully utilized so

16 on XST ?

I've pushed some updates related to corner cases and syndrome computation, go ahead and pull and give it another try.

The main thing that will make XST run "forever" is swapping due to lack of RAM. For synthesizing single channel decoders, I'd recommend at least 16GB. For multi-channel, 32GB.

Reply to
Russell Dill

such a thing has no existed under a permissive license. Even more depressi ng is that many students seem to submit Verilog or VHDL engines as a projec t (or even research), but never release anything that is usable.

offers:

to be corrected

ore errors

widely as possible, but I'd still like to get feedback and hopefully impro vements merged back in.

o 12 errors with an 8 bit wide input and an 8 bit wide output currently occ upies 1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. Suc h a decoder would take input for 532 clock cycles (512 data bytes, 20 ecc b ytes), calculate for about 28 clock cycles, and then produce output for 512 clock cycles.

XST/Isim (tested with 14.5).

is still synthesizing after 12 hours (note that 1 of 4 CPUs is fully utiliz ed so

=16 on XST ?

f RAM. For synthesizing single channel decoders, I'd recommend at least 16G B. For multi-channel, 32GB.

Thanks Russel.

I believe you've only changed the Makefile yes ?

What's your approximate synthesis time for sim.v with DATA_BITS=1024,T=

8,BITS=8 ?

I'm not sure what you mean by single / multiple channels.

Cheers,

-Paul

Reply to
paul.sc

y, such a thing has no existed under a permissive license. Even more depres sing is that many students seem to submit Verilog or VHDL engines as a proj ect (or even research), but never release anything that is usable.

t offers:

rs to be corrected

more errors

as widely as possible, but I'd still like to get feedback and hopefully imp rovements merged back in.

to 12 errors with an 8 bit wide input and an 8 bit wide output currently o ccupies 1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. S uch a decoder would take input for 532 clock cycles (512 data bytes, 20 ecc bytes), calculate for about 28 clock cycles, and then produce output for 5

12 clock cycles.

nx XST/Isim (tested with 14.5).

8 is still synthesizing after 12 hours (note that 1 of 4 CPUs is fully util ized so

S=16 on XST ?

nd

of RAM. For synthesizing single channel decoders, I'd recommend at least 1

6GB. For multi-channel, 32GB.

No, you can see the full list of changes here:

formatting link

=8,BITS=8 ?

tb_sim.v and sim.v were not intended to be synthesizable.

Running multiple decoders in parallel

Reply to
Russell Dill

Sorry for the imprecise question, my first pull was a week ago so I meant since then, I believe the Makefile only has changed but I may be wrong again :)

Sure, bad question again, I hacked your sim.v (and unfortunately kept the same name ...) to contain bch_syndrome, bch_errors_present, bch_sigma_bma_parallel and bch_error_tmec (+ hook-ups ... etc).

If I try to synthesize the whole thing with T=12, DATA_BITS=4096 I hit a wall (on a 16GB machine, 1 BCH channel only).

So I narrowed it down:

bch_syndrome, bch_errors_present, bch_sigma_bma_parallel all synthesize individually in 10 minutes or fewer and use < 2GB DDR even if I use T=64, DATA_BITS=8192 (>5x T and 2x DATA_BITS of above)

However, bch_error_tmec ALONE with T=12, DATA_BITS=4096 only, takes 1+1/2 hours and reaches 7 GB DDR utilization with a funny pattern of slow ramp-ups and sharp declines - it doesn't use up all the available DDR though, i.e., there are 5 or more GB available at all times.

T=64, DATA_BITS=8192 barfs.

I tried chien separately and got the same result as with bch_error_tmec.

Do you expect chien to be so much harder to synthesize than all the rest ? From my past experience with Reed-Solomon I sort of expected Berlekamp-Massey and Chien to be of somewhat comparable complexity.

Thanks !

-Paul

Reply to
paul.sc

since then, I believe the Makefile only has changed but I may be wrong aga in :)

You are looking at the dates of the commits. The commits happened a while a go, but were only recently pushed.

4,T=8,BITS=8 ?

same name ...) to contain bch_syndrome, bch_errors_present, bch_sigma_bma_ parallel and bch_error_tmec (+ hook-ups ... etc).

t a wall

Here's xilinx_error_tmec with PIPELINE_STAGES=2, DATA_BITS=4096, T=12 , BITS=8, REG_RATIO=8

xst:

524.43user 4.14system 8:42.02elapsed 101%CPU (0avgtext+0avgdata 13612104max resident)k 0inputs+26104outputs (0major+4229163minor)pagefaults 0swaps

map:

11.68user 0.12system 0:11.81elapsed 100%CPU (0avgtext+0avgdata 581040maxres ident)k 0inputs+184outputs (0major+155622minor)pagefaults 0swaps

So XST is using nearly 14GB. Depending on your machine's configuration, 16G B of physical memory would not have been enough. Incidentally, the decoder uses 612 slices, and runs at least 200MHz.

ndividually in 10 minutes or fewer and use < 2GB DDR even if I use

lization with a funny pattern of slow ramp-ups and sharp declines - it does n't use up all the available DDR though, i.e., there are 5 or more GB avail able at all times.

?

ssey

The way I'm dynamically compiling the chien modules is giving XST a hard ti me. Although going bit-parallel, you do need a lot of parallel multipliers for T=64 (some 512 of them). I've created quite a few variants to get the most out of XST, each variant with it's own strengths and weaknesses. If y ou change the multiplier in bch_chien_expand from a parallel_standard_multi plier_const1 to a parallel_standard_multilier, you can save some memory, bu t not the orders of magnitude required.

Here's an example at: PIPELINE_STAGES=1,DATA_BITS=4096,T=32,BITS=8, REG_RATIO=8

xst:

1925.94user 9.58system 31:49.83elapsed 101%CPU (0avgtext+0avgdata 28960864m axresident)k

map:

565608inputs+50552outputs (353major+8315051minor)pagefaults 0swaps 13.37user 0.15system 0:13.60elapsed 99%CPU (0avgtext+0avgdata 613264maxresi dent)k 2104inputs+208outputs (3major+163868minor)pagefaults 0swaps

So you can see that in this case xst is using about 29GB of memory. I origi nally targeted the code for around T=3 to T=16. If you want to get up t o T=64, you'll likely need to figure out why XST is taking so much memory synthesizing the multipliers, likely by paring this down bit by bit.

Additionally, the number of pipeline stages in error tmec is limited to 2 r ight now. You might need to go higher to get 64 13 bit terms summed togethe r and compared with zero.

Reply to
Russell Dill

Hi Russell! First of all, thank you very much for this amazing tool. It works wonders a nd is very well coded. I do have a question and hope you can help me. For my project, I need to ha ve a specific codeword length. My DATA_BITS size can change, as well as "T" , but I do need a specific N which is 256 (32 bytes). So far I cannot achie ve this. For different values of DATA_BITS and T either I get 31 or 33 byte s codeword length, never 32. I guess this has to do with this piece you wro te on the README: "Note that the number of errors correctable for a given p olynomial is sparse. The search function will choose the next highest numbe r of correctable errors rather than trying to move to the next polynomial."

Any advice on how can I work around this?

Thanks again. Gabriel

Reply to
gab.jimenez93

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.