how to design this datapath unit for DSP using VHDL/Verilog?

Dear all,

I want to design an arithmatic datapath unit for digital signal processing using VHDL and/or Verilog.

The input are 5 elements(either sequential or parallel) each having 8 bits. It needs to multiply each of these 5 inputs with a predefined constant matrix(10x10, floating point scaled and round to integer). The output will be a 10x10 matrix summing the above five matrices up, each element having 12 bits). So for each element of the matrix, I can have a MAC unit. The internal computation will be 16 bits.

Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix

Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;

If I put an MAC for each element, I will have a purely parallel architecture, but I need 100 16bits MAC units, which will be too resource consuming.

I am considering to make a parallel-serial architecture, at each time, it outputs one row, which will be 10x12 bits... so the output will be row-by-row.

I also need to consider to streamlize the datapath operation. Since there will be a stream of 5 elements input in a non-stop fashion, the output will also be non-stop streaming. So after one row is outputted, that row can be used for computation/storage of the results for the next 5 input elements.

I am ok so far in thinking... but further thinking makes me confused and perplexed... how to do sequential timing control(how to what to do at which cycle)? do I need to pipelining? how to design the architecture?

Finally, how to program this? Is there any examples for this?

Please help me!

Thanks a lot,

-Walala

Reply to
walala
Loading thread data ...

Dear all,

I want to design an arithmatic datapath unit for digital signal processing using VHDL and/or Verilog.

The input are 5 elements(either sequential or parallel) each having 8 bits. It needs to multiply each of these 5 inputs with a predefined constant matrix(10x10, floating point scaled and round to integer). The output will be a 10x10 matrix summing the above five matrices up, each element having 12 bits). So for each element of the matrix, I can have a MAC unit. The internal computation will be 16 bits.

Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix

Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;

If I put an MAC for each element, I will have a purely parallel architecture, but I need 100 16bits MAC units, which will be too resource consuming.

I am considering to make a parallel-serial architecture, at each time, it outputs one row, which will be 10x12 bits... so the output will be row-by-row.

I also need to consider to streamlize the datapath operation. Since there will be a stream of 5 elements input in a non-stop fashion, the output will also be non-stop streaming. So after one row is outputted, that row can be used for computation/storage of the results for the next 5 input elements.

I am ok so far in thinking... but further thinking makes me confused and perplexed... how to do sequential timing control(how to what to do at which cycle)? do I need to pipelining? how to design the architecture?

Finally, how to program this? Is there any examples for this?

Please help me!

Thanks a lot,

-Walala

Reply to
walala

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.