Do you have a question? Post it now! No Registration Necessary
- Subject
- Posted on
posted on
August 30, 2003, 5:10 am
August 30, 2003, 5:10 am
Dear all,
I want to design an arithmatic datapath unit for digital signal processing
using VHDL and/or Verilog.
The input are 5 elements(either sequential or parallel) each having 8 bits.
It needs to multiply each of these 5 inputs with a predefined constant
matrix(10x10, floating point scaled and round to integer). The output will
be a 10x10 matrix summing the above five matrices up, each element having 12
bits). So for each element of the matrix, I can have a MAC unit. The
internal computation will be 16 bits.
Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;
If I put an MAC for each element, I will have a purely parallel
architecture, but I need 100 16bits MAC units, which will be too resource
consuming.
I am considering to make a parallel-serial architecture, at each time, it
outputs one row, which will be 10x12 bits... so the output will be
row-by-row.
I also need to consider to streamlize the datapath operation. Since there
will be a stream of 5 elements input in a non-stop fashion, the output will
also be non-stop streaming. So after one row is outputted, that row can be
used for computation/storage of the results for the next 5 input elements.
I am ok so far in thinking... but further thinking makes me confused and
perplexed... how to do sequential timing control(how to what to do at which
cycle)? do I need to pipelining? how to design the architecture? I mean, I
know pipelining theoratically from one semester course, but now I am going
to implement one, I am totally lost...
Finally, how to program this? Is there any examples for this?
Please help me!
Thanks a lot,
-Walala
I want to design an arithmatic datapath unit for digital signal processing
using VHDL and/or Verilog.
The input are 5 elements(either sequential or parallel) each having 8 bits.
It needs to multiply each of these 5 inputs with a predefined constant
matrix(10x10, floating point scaled and round to integer). The output will
be a 10x10 matrix summing the above five matrices up, each element having 12
bits). So for each element of the matrix, I can have a MAC unit. The
internal computation will be 16 bits.
Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;
If I put an MAC for each element, I will have a purely parallel
architecture, but I need 100 16bits MAC units, which will be too resource
consuming.
I am considering to make a parallel-serial architecture, at each time, it
outputs one row, which will be 10x12 bits... so the output will be
row-by-row.
I also need to consider to streamlize the datapath operation. Since there
will be a stream of 5 elements input in a non-stop fashion, the output will
also be non-stop streaming. So after one row is outputted, that row can be
used for computation/storage of the results for the next 5 input elements.
I am ok so far in thinking... but further thinking makes me confused and
perplexed... how to do sequential timing control(how to what to do at which
cycle)? do I need to pipelining? how to design the architecture? I mean, I
know pipelining theoratically from one semester course, but now I am going
to implement one, I am totally lost...
Finally, how to program this? Is there any examples for this?
Please help me!
Thanks a lot,
-Walala
Re: how to design this datapath unit for DSP using VHDL/Verilog?
Hi Kevin,
Thanks for your answer!
The requirement of output throughput is 33-50MHz, i.e., it should output 33
million to 50 million 12-bits element per second,
and each 5 inputs correspond to 10x1010%0 such 12-bits element outputs...
The technology I am going to use is 0.25u.
I think the inputs are naturally serial, but I can let them be parallel,
since there are only 5 of them, but again, I am not sure how to do
the parallel-serial partition of the internal MACs... and how to pace the
outputs...
Seems inputs are faster than the outputs, maybe I should let the input wait
after fed into the unit?
Can you give some further advice on how to do this architecture? how to do
the timing? I think it is really difficult...and point me to some resources?
Thanks very much,
-Walala
hardware.
processing
will
having
matrices;
resource
it
there
be
elements.
I
going
Thanks for your answer!
The requirement of output throughput is 33-50MHz, i.e., it should output 33
million to 50 million 12-bits element per second,
and each 5 inputs correspond to 10x1010%0 such 12-bits element outputs...
The technology I am going to use is 0.25u.
I think the inputs are naturally serial, but I can let them be parallel,
since there are only 5 of them, but again, I am not sure how to do
the parallel-serial partition of the internal MACs... and how to pace the
outputs...
Seems inputs are faster than the outputs, maybe I should let the input wait
after fed into the unit?
Can you give some further advice on how to do this architecture? how to do
the timing? I think it is really difficult...and point me to some resources?
Thanks very much,
-Walala
hardware.
processing
will
having
matrices;
resource
it
there
be
elements.
I
going
Re: how to design this datapath unit for DSP using VHDL/Verilog?
calculate one
row at a time, you just add a 10:1 mux before the MAC and select
inputs for the next element after the current element is done and the
accumulator is cleared. A counter or a simple state machine can be
used to control the mux select signal. This will take at least 10
times longer to get all 100 elements.
Jim Wu
snipped-for-privacy@yahoo.com
http://www.geocities.com/jimwu88/chips
Site Timeline
- » DSP
- — Next thread in » Field-Programmable Gate Arrays
- » Re: DCM divide/phase problem
- — Previous thread in » Field-Programmable Gate Arrays
- » engineered data path versus inferred data path
- — Newest thread in » Field-Programmable Gate Arrays
- » waterproof connectors for well pump
- — The site's Newest Thread. Posted in » Electronics Design
- » Newest LTspice update has lots of AD op amps.
- — The site's Last Updated Thread. Posted in » Electronics Design