Prob in Synthesizing and Simulating large Mux

- V
- vssumesh
  
  Contact options for registered users
posted
18 years ago

Fri, Sep 30, 2005 10:29 AM

Hi all, I am developing a hardware in which I need large size MUX. I need a

240 to 1 byte multiplexer. I tried to code it but observed the following problems.

I tried the straight forward way. Using the AND and OR gates. This is simple as I have to use simple "generate" functions in verilog. But the problem is that I could neither simulate nor synthesize the design. In the modelsim (V 6.0a) it just stop responding when I tried to load the design. And in the Xilinx ISE also its not working. In case of ISE first it shows strong activity and loads the processor and takes up loat of memory. But after some time it just not working ; ISE is showing activity but the processor usage is almost '0' and after some 4 hrs it showed only 60% progress. If I reduce the size of the inputs it just works fine and gives output in few minuts.
Then I tried the case statement and I written 240 cases. In this case also xilinx is not working. I am using Windows XP on AMD machine. Version of the ISE is 6.0. And if I reduce the number of cases to 120 it gives proper output. I confused about the low activity of the Xilinx. Why its not loading the processor. Is it because of the problem in the OS. I hope the method 2 will work with the synthesizer. Please advice me on this issue. And please let me know about any usual ways to generate this type of huge MUX.

The output of the Xilinx is given below.

Started process "Synthesize".

=========================================================================

HDL Compilation

========================================================================= Compiling source file "../test/test.v" Module compiled No errors in compilation Analysis of file succeeded.

=========================================================================

HDL Analysis

========================================================================= Analyzing top module . WARNING:Xst:905 - ../test/test.v line 23: The signals are missing in the sensitivity list of always block. Module is correct for synthesis.

Set property "resynthesize = true" for unit .

=========================================================================

HDL Synthesis

=========================================================================

Synthesizing Unit . Related source file is ../test/test.v. Unit synthesized.

=========================================================================

Advanced HDL Synthesis

=========================================================================

Advanced RAM inference ... Advanced multiplier inference ... Advanced Registered AddSub inference ... Dynamic shift register inference ...

========================================================================= HDL Synthesis Report

Found no macro =========================================================================

=========================================================================

Low Level Synthesis

=========================================================================

Optimizing unit ...

***### Program stoped the processor loading here###***

- A
- Antti Lukats
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 30, 2005 10:33 AM

"vssumesh" schrieb im Newsbeitrag news: snipped-for-privacy@z14g2000cwz.googlegroups.com...

you better think in terms of luts, and code an hierarchial tree, that should sysnthesize without anyproblem. We have defenetly synthesized way wider MUXes

Antti

- V
- vssumesh
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 30, 2005 11:40 AM

Ok .. but is it easy to simulate? And if we code it in a hierarchial tree will it take more area than required. Please give little more details in this.

- A
- Antti Lukats
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 30, 2005 11:49 AM

"vssumesh" schrieb im Newsbeitrag news: snipped-for-privacy@g47g2000cwa.googlegroups.com...

one slice (2 LUTs + MUX) can implement 4:1 mux so you mux down by 4, than again by 4 as much as needed

256 to 1 MUX:

if you take 256 signal then 1 LUT level reduces it to 64 (64 slices) the second to 16 (16slices) the 3rd to 4 (4 slices) and the last to 1 signal (1 slice) ==85 slices this is the smallest LUT based mux

whatever you write in HDL the same amount of LUTs is required

Antti

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Sep 30, 2005 2:43 PM

Also try to think about whether you really need a random accessible mux in your case. For example if you allways need the inputs in the same order you can load all of them into a shift register and shift them out.

Kolja Sulimma

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Oct 2, 2005 1:39 AM

you can get better pipelined performance by decoding the selects before the first level then combining the first level outputs in an OR tree. You can also use the carry chains, or if using virtexII the horizontal or chains with this method to help reduce the size of the logic. This is for a random selection sequence. As Kolja said,, a shift register might be a better choice if you can constrain the selection order. If it is to read back registers that you've written into a design, you can use a block ram as a shadow for the registers and read back the block RAM. Finally, if you can afford the latency, you can get better place and route results by going with a linear structure.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

- V
- vssumesh
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Oct 3, 2005 5:06 AM

I dont fully understand what you are suggesting. But it seems to me that you are advicing a pipelined operation. But that is not possible in the design. It is a completely random MUX. The task is to take data from a 240 byte register and to arrange that into a 64 byte wide data bus (simultanious)(each output byte can take data from any of the 240 registers). And the selection bits are direct to each mux. That is 240 bit selction lines into each MUX. I tried to implement it with the LUT but it gave the same result. I am ready to wait for days but the ISE is simply giving up. If i reduce the output by 32 it is giving the output.

Please give me little more details on this. I tried to to implement normal ANDing and then ORed all the bits. Sumesh

- S
- Simon Peacock
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Oct 3, 2005 7:26 AM

so what your really saying ...

Is you want to make 64 x 8 x (240:1) mux or 122,880 combinations... What was suggested isn't that hard to implement.. and it isn't pipelined either.. pipelining assumes a clock and one level per stage.. so a 4 stage pipeline creates a 4 clock delay... and there is no clock.

so thats 240x8 + 64x8 + (8x6) pins / signals ... If you think about what you are asking for.. you would see its a might rediculas! even assuming you have just the 240 bytes .. thats 1,920 signals all on its own!

Please look at your design and maybe comeup with something a might more sensable.

Simon

P.S.

package mux240_1_pkg is type byte240_typ is array (0 to 239) of std_logic_vector(7 downto 0); type byte64_typ is array (0 to 63) of std_logic_vector(7 downto 0); type int240_typ is array (0 to 63) of integer range 0 to 239; end mux240_1_pkg;

entity mux240_1_byte is Port ( din: in byte240_typ; dout: out byte64_typ; sel: in int240_typ ); end mux240_1_byte;

library work; use work.mux240_1_pkg.all;

architecture rtl of mux240_1_byte is begin gen: for i in dout'range generate dout(i)

- V
- vssumesh
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Oct 3, 2005 8:29 AM

Hello Simon, Yes i am trying to implement the 64 nos of 8bit wide (240:1) mux. And there is 240 * 64 = 15360 total selction bits (240 bits to each mux). And 240 * 8 = 1920 data bits to whole block of 64 muxs (same data goes to all MUX). Thus the mux array block will have 17280 input lines and 64 * 8 output lines. Why you are saying that it is not possible. All signals are internally generated from other parametrs (I dont know the internal routing efforts of the FPGA). Please advice. The mux (the code) you suggested is a single 240 to 1 byte mux. But i want 64 copies of that. Is that possible. I know that it is not possible to implement it in asingle design by getting the selction signal from external sources; is it because of this constrain that the ISE stops working. I am able to get output if i reduce any of the parametrs to half (no: out put or no: registers etc).

- S
- Simon Peacock
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Oct 3, 2005 10:04 AM

That's where my snippet is different.. the "for generate " will repeat that mux 64 times for you :-) nice and simple isn't it ???

The problem is you have to think of the resources.. I don't know exactly.. but the number of loads on any CLB are finite.. I doubt they are 64... so the whole thing gets repeated multiple times as you are talking 8x240 outputs you will chew up resources horribly fast.

The Next problem is Xilinexs as with all FPGA's are a compromise... the 1 M gate quote is based upon designs which are synchronous.. and yours isn't.. that makes a huge mux very inefficient and not what the tools are designed to cope with.

The best bet would be to rethink.. possibly use the idea of shifting the data into a dual port ram.. and using the second port of the ram as the output of the mux... it does mean your design ends up pipelined.. but you will be struggling to do it some other way.

The other solution is to put down 4 FPGA's

Simon

- V
- vssumesh
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Oct 3, 2005 12:55 PM

Every thing is correct. But i cant simply change my design. What i am thinking now is to proceed with the 120 register version. If required i can switch over to Virtex "XC2V8000". Will that help (with 8M gates). But i am still wondering why Xilinx is not doing any synthesizing work. And about your code is there any way to implement the same in verilog. I dont know VHDL.

- S
- Simon Peacock
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Oct 4, 2005 7:42 AM

Of course there is an identical way to implement VHDL in verilog... unfortunately I don't know verilog either :-) A XC2V8000 wow.. you must have a seriously huge budgie.. $8,300 each.. :-)

4 XC3S1000's would only cost you about $220. only 4,000 LUT's.. but I bet it would fit too .. that's why I suggested using 4 FPGA's in the first place.

Simon

- V
- vssumesh
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Oct 4, 2005 11:56 AM

The problem is i want that in a single chip. How can i link those huge control signals out of FPGA. But i am still wondering why the ISE is not working with my design. Ok any way i am proceeding with 120 registers and will let all of you know the results. Thanks for all the advice and suggestions.

- A
- Andy Peters
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Oct 4, 2005 5:26 PM

Probably because your expectations are not inline with the device capability?

You're trying to create a huge combinatorial mux. It's not at all a surprise that you're not meeting your timing requirements.

-a

- S
- Simon Peacock
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Oct 5, 2005 9:56 AM

you don't link.. you replicate!

Simon

- V
- vssumesh
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Oct 5, 2005 12:26 PM

I didn't understand that. How can i interconnect the huge routing signals between the FPGA's.

- S
- Simon Peacock
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 6, 2005 7:37 AM

one way is to simply serialize the data using LVDS.. the other is to simply replicate the input in each device the each device handles only 1/4 of the selection.

Partly it depends on how/where your data is coming from. I can't tell you how to do it... you have to analyse and learn

Simon