Simulation vs Synthesis

Simon · 2015-11-30T06:27:00+00:00

So I have a partly-complete design for a 6502 CPU, it's simulating just fin e for the implemented opcodes, but when I run synthesis, I get a whole load of "Sequential element (ewSPData_reg[23] ) is unused and will be removed from module execute.", one for each bit in the register, in fact. I know the logic is *trying* to use this register, I can see the values in the register changing during simulation runs, but I can't for the life of m e see why it would be removed - the 'execute' module is basically a case st atement, with one of the cases explicitly setting the value of the 'newSPDa ta' register. Again, in the simulation, I see the case being executed, and the values cha nging. I guess what I'm looking for is any tips on how to tackle the proble m ("The Knowledge", if you will), I've already tried the 'trace through the logic for the case that should trigger the case in question, and see if an ything jumps out at me'. I remain un-jumped-out-at [sigh]. I'm happy to send the design if anyone wants to have a look, but it's a chu nk of verilog code, so didn't want to paste it here... Cheers Simon.

S

Simon 10 years ago

Maybe I'm reading/understanding it incorrectly - it looks to me that there' s an always @ (posedge(clk)) dependency for writes - but I'm relatively fin e with that - I won't need the data until the next clock anyway if I'm writ ing, because that's how the 6502 worked.

For reads, it looked to me as though it used always @ (*), and I (perhaps i ncorrectly) thought that would get me the results on the module's data bus as soon as the 'address' lines changed.

As for why to change it, I don't like it when I don't understand the error/ info messages the tool is giving me. Given my (relatively limited) understa nding of what the synthesis tool is actually *doing* under the hood, it pro bably means I'm not getting what I actually want, or if I am, it's in some highly-inefficient manner. Your comment about inferring extra adders unnece ssarily is pretty relevant I feel :)

It does tie me to a single write/read per clock, whereas I could set N regi sters per clock (and thus "push" 3 elements onto the stack for the BRK inst ruction in a single clock for example), but I'm actually ok with that too, I think. The 6502 only had 1 databus, so *it* took multiple clocks to do mu ltiple writes as well.

Its entirely possible my understanding of the module is flawed. I'm happy t o be corrected :)

Cheers Simon

Vote

B

BobH 10 years ago

Thanks! I have always explicitly built a model RAM when I wanted RAM in an FPGA rather than inferring one. I just automatically include the reset when I do D flops because it makes the simulation cleaner. I think that the original poster wanted an array of flops thinking that they would be faster than block ram.

This looks worth messing with when I get some breathing space. I am a little curious about the synthesizabilty of it.

Regards, BobH

Vote

B

BobH 10 years ago

Block ram tends to be smallish and often wierd sizes.

This will result in a huge barrel shifter which will likely get slow. I don't know what your clock speeds are relative the the FPGA capability, but I don't like the big barrel shifter implementations. If your clock speeds are a few MHz and you are using a modern FPGA, you probably can afford to implement it that way.

Try Mark Curry's suggested syntax. If it is synthesizable, it will be MUCH easier to implement! From Mark's comment, if you include the reset, it should prevent the replacement of FF's with block RAM.

Regards, BobH

Vote

M

Mark Curry 10 years ago

Bob - it's all synthesizable for FPGA's just fine. The only trick is when you definetly want to infer Block RAMs. In that case, it's best to check the Xilinx Docs, and use their templates, with little modification.

You can modify the Xilinx template, for instance, to make the RAM width, and depth a parameter. But stray to far, and it may trip up. And when I say trip up - I mean it'll synthesize to something that matches your description - however it may mess up and build it up out of FFs instead of Block RAMS. (You may also optionally attach a pragma to FORCE it to map to FFs - in the case you mentioned above where you may want the faster access. Just don't make it a very big array!)

Play with it when you have time. It's an excellent tool in your toolbox.

Regards,

Mark

Vote

R

rickman 10 years ago

I'm not sure the above is a correct model for block RAMs in many devices. The ones I have used have a register delay even in the read path. There can be separate interfaces (address, controls and data) for reading and writing, but in all cases the read data is registered.

What devices will this model work for? Or maybe I'm not so familiar with Verilog. The read path in the above description is async, no?

Rick

Vote

R

rickman 10 years ago

My understanding is that all block RAM have a register in the read path, I've always considered there is a register in the input side of address, data in and control rather than worrying about any internal details. It all works the same.

Looks like I had forgotten about the distributed RAM. It has async read and sync write. So your model will work just fine.

Now that my misunderstanding is straightened out I see what you are saying. I don't understand the error message either, but then I can't see the code.

Try isolating the error to a smaller section of code. Obviously there is something else going on that it thinks an 8 bit address RAM is being indexed by a 32 bit value. I expect it has something to do with the way you are using the array rather than the way you are declaring it.

Rick

Vote

R

rickman 10 years ago

I Googled and found the distributed RAM in the Xilinx parts support async reads. So I am clear on this now. I must have forgotten this.

Rick

Vote

G

glen herrmannsfeldt 10 years ago

(snip)

Yes that has async. read and sync. write, and that doesn't work with the usual block RAM.

I am not sure if it wants the register before, or after, or if it doesn't matter.

-- glen

Vote

G

glen herrmannsfeldt 10 years ago

(snip)

The distributed RAM is just the usual LUTs, so support asynchronous read the same way they do when they are gates. I think they also support asynchronous write, but that is less obvious.

-- glen

Vote

R

rickman 10 years ago

No, they do not support async writes. I recall it was in the XC4000 series they got rid of async writes because they had so much trouble supporting it. Basically there were too many users who didn't know how to properly use async memory. There may have been some technical advantages to using a sync write for the FPGA designers, but I am pretty sure it was really an issue of complaints that it didn't work right which really meant they were not meeting the specs on the pulse width of the write strobe. Async RAM has a lot of timing details to meet compared to the sync version. With sync it is basically just setup and hold of the inputs.

Rick

Vote

R

rickman 10 years ago

I'm not sure what you mean. Before or after what exactly?

Rick

Vote

J

jt_eaton 10 years ago

Do you put a register before or after the ram array.

You can register the addresses and then do an asynchronous read or you can do an asynchronous read and then register the data.

The difference is known as writethru. If you have a dual port sram and do both a read and write operation to the same address in the same cycle then do you read the old data or the new?

In the first case you will get the new data while the second case will give you the old data.

In the first case the write data is written though the sram to become the read data.

Selection depends on the circuit needs. If you are using sram in a fifo then writting to a completely full fifo on exactly the same cycle that data is popped off will not work with writethru. You want

to pop off the oldest and replace it with the newest.

If sram is a cpu register bank and you store in register X followed by an instruction the uses register X then pipelining will read the new data on the same cycle that it writes it to ram. In that case you must have writethru.

John Eaton

--------------------------------------- Posted through

formatting link

Vote

R

rickman 10 years ago

Yes.

Different vendors give the modes different names, but essentially on block RAM writes the read data can be the old data, the new data or the read data port is held at the last value with no change. None of this is affected by where you put the registers in your HDL. This is typically controlled by attributes.

Rick

Vote

M

Mike Field 10 years ago

The problem might be How are you indexing that small block of registers - is the address being used to index zp_reg also 8 bits?

Also, why 255 elements and not 256?

Mike

Vote

G

glen herrmannsfeldt 10 years ago

rickman wrote: (snip)

(snip)

For the case of reading, so consider a ROM, do you put the register on the address inputs, or the data outputs?

Or, since the difference is only delay, can the synthesis tools move it from one to the other?

-- glen

Vote

G

glen herrmannsfeldt 10 years ago

jt_eaton wrote: (snip, I wrote)

(snip)

Yes that is what I meant.

If the FIFO has the same clock for both, then I suppose you can do that. With asynchronous read and write, you can't really do that, as you can't prevent the read from coming just slightly after the write.

Most FIFOs have an "almost full" that helps avoid that, and also allows for other delays in stopping data come in.

Or you add extra logic to bypass the RAM in that case.

-- glen

Vote

M

Mark Curry 10 years ago

Rick,

My only real point in the above code was showing it was possible to index into a multi-dimensional array in Verilog in synthesizable code. One doesn't need to explicity code out each index. Synthesis WILL build SOMETHING for all of these variations. It's all synthesizable.

Now, if you're intending to map specifically to BLOCK, or Distributed memories, then I strongly suggestions checking the vendor documentation, and using their templates. It's easy to trip up the tools, and have them not build what you intended. Your example is a simple one. If you want to generate a BRAM, then you must register your read data (as well as your write). Missing this, you'll get Distributed (or FFs!).

Regards,

Mark

Vote

R

rickman 10 years ago

Rather than try to guess what is happening, just read the vendor's documentation and copy their examples for inferring RAM. I know Xilinx gives this info. I looked at an 8 year old document from Lattice and they say there are enough subtle differences between vendors that there is little point to inferring block RAM, so just instantiate it, (a newer document may have different recommendations). I don't like that and have never had any trouble with inference. I always put the registers at the inputs to the RAM as in some families there is an optional additional register on the data output. Otherwise I expect there is no difference based on where you put it...

Rick

Vote

S

Simon 10 years ago

Yep. The only part that references anything larger than 8 bits is where I select out 8 bits from a 32-bit register, using constant bit-ranges.

`define W 8 `define NW (`W-1)

reg [`NW:0] stack[0:255]; // Stack-page reg [`NW:0] SP; // Stack pointer

...

if ((action & `UPDATE_SP) == `UPDATE_SP) begin if (numSPBytes == 1) begin stack[SP]

Vote

R

rickman 10 years ago

Where is zp_reg[255] declared? That is the ROM being complained about. I assume it is considering it a ROM because it is not written to anywhere.

Rick

Vote

Simulation vs Synthesis

Join the Discussion

Didn't find your answer?