How best do I implement routing boxes in RTL?

- N
- news reader
  
  Contact options for registered users
posted
17 years ago

Thu, Mar 8, 2007 2:54 PM

In the design I have 256 3-bit registers, every time I need to read or write 16 of them (data_o0, 1, ...15). The read/write address is not totally random.

For example, assuming that I arrange the register into a 16X16 matrix, data_o0 accesses among the zeros row or column. data_o1 may access from 20 of the registers, but not 256, data_o2 may access from 30 of the variables, etc.

If I code such that every output reads from the 256 registers, the final logic will be overkill and highly redundant.

If I use case statements to list each of the senarios, the RTL code may end up 500 kilobyte. Will design compiler synthesize a 500KB design efficiently? Will NCVerilog compile and simulate it efficiently?

Are there any neater techniques to attack this problem?

- U
- Utku Özcan
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Mar 8, 2007 8:14 PM

Hi "news reader", my humble perls in between..

news reader schrieb:

It seems that you have an algorithm that handles a deterministic distribution of the values to be accessed. Therefore you think you can implement it with logic only.

I assume you are modeling an algorithm for a special matrix operation.

the

The values do not give us much info. data_ox (x = 1, 2, ...) is accessing which elements and in which distribution?

You think that the distribution of elements can be accessed with pure logic. Therefore you tried to model your logic to cover every case, or you want to do it so.

This is reasonable then.

What means "efficience" for you? Speed or minimum logic? If minimum logic, then please share with us the algorithm you are trying to implement.

NCVerilog does not care about logic implementation. It defines the behaviour of the system, no matter how the objects are linked.

Since you have not given much data, I think you can implement this stuff with a RAM. Why don't you use a RAM? Then you can define the RAM addresses to model your matrix. You will generate addresses to define the positions for your matrix which mimics your algorithm.

Utku.

- N
- news reader
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Mar 9, 2007 4:09 PM

It's not matrix, but the memory access is intensive, must accomplish r/w in single clock cycle, so register is used instead of memory.

In each clock cycle, 16 addresses are generated, and 16 data are read/written. However, each of the 16 data is read/written only to n/256 addresses (0 If I code such that every output reads from the 256 registers, the final

By means of case statement, I use 32 case statements, in each case statement there are less than 256 choices. Some have only 20, 30 choices, etc.

For example in read operation,

--------------------- implementation A------------------ input [7:0] addr_i0, addr_r1, ...addr_r15; output [2:0] dat_o0, dat_o1, ...dat_o15;

reg [2:0] mymemory[0:255]; // Main memory

dat_o0 model your matrix. You will generate addresses to define the positions

I used registers instead of RAM due to the memory throughput.

- J
- jtw
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Mar 11, 2007 5:44 AM

I have had similar requirements (updating state variables, or some such) where I used dual-port RAM; I use one port for the read, and the other (delayed a clock) for the modify-write.

The pipeline needs to be managed properly, but it can save tremendously on registers (assuming that only one index needs to be updated at a time. If all entries need concurrent access--well, a memory won't cut it. For my application(s), typically TDM processing of multiple channels, it works well.)

JTW