Dear all,
I want to design an arithmatic datapath unit for digital signal
processing
using VHDL and/or Verilog.
The input are 5 elements(either sequential or parallel) each having 12
bits.
It needs to multiply each of these 5 inputs with a predefined constant
matrix(10x10, floating point scaled and round to integer). The output
will
be a 10x10 matrix summing the above five matrices up, each element
having 8
bits). So for each element of the matrix, I can have a MAC unit. The
internal computation will be 16 bits.
Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are
matrices;
The throughput requirement is 33-50MHz, i.e., it should output 33 to
50 million 8-bits element in one second. The technology I am going to
use will be 0.25u technology.
If I put an MAC for each element, I will have a purely parallel
architecture, but I need 100 16bits MAC units, which will be too
resource
consuming.
I am considering to make a parallel-serial architecture, at each time,
it
outputs one row, which will be 10x8 bits... so the output will be
row-by-row.
I also need to consider to streamlize the datapath operation. Since
there
will be a stream of 5 elements input in a non-stop fashion, the output
will
also be non-stop streaming. So after one row is outputted, that row
can be
used for computation/storage of the results for the next 5 input
elements.
I am ok so far in thinking... but further thinking makes me confused
and
perplexed... how to do sequential timing control(how to what to do at
which
cycle)? do I need to pipelining? how to design the architecture? I
mean, I
know pipelining theoratically from one semester course, but now I am
going
to implement one, I am totally lost...
Finally, how to program this? Are there any examples for this? Can you
give some detailed explanation or program(preferable)?
Please help me!
Thanks a lot,
-Mizhael |