6.1 vs. 6.2

J

John 21 years ago

(Mike, I generally respect your posts, this one bears comment)

If this were true, Chipscope would probably not exist.

Let's hypothesize some numbers (someone with better knowledge of simulator performance please correct me on the first point)

1) A simulator (even a cycle based one) on a moderately complex design may run about 10e5 times slower than the actual circuit. E.g., a 100 MHz FPGA might simulate at about 1 KHz, 1 ms/cycle. This number is just a crude estimate, assumes 10e4 nodes in a circuit, 10e2 cycles per node processing time for the simulator, and the simulator cycling a 1 GHz. (I think this is optimistic, does anyone have a good number?) 2) State machines can be extremely complex, with "billions _of_ billions" of states. Remember, a processor executing code is really a state machine, where state corresponds to [program counter, register contents, pipeline states, even memory contents]. For that matter, the whole FPGA can be viewed as a single state machine...but I won't get that outrageous.

Assume three state machines, A, B and C (each of which has already been thoroughly proven) each with only 10000 states each (tiny pieces of code running on simple processors, with very few variables perhaps), and a set of 10 external inputs. Further assume that an error condition corresponds to a small subset of the trillions of possible states (bit x of variable X on processor A is '1' while bit y of variable Y on processor B is '0' and bit z of variable Z in processor C is '0', causing the rocket engine to shut down or whatever).

Now make the biggest, wildest assumptions of all:

3) A totally brilliant (omniscient?) test bench writer produces code that puts the system through every possible state. (Unlikely, but possible...there are some smart people out there), and produces this code in under a month. 4) Each of the 10e12 states of the hypothetical machine can be reached in a single simulator cycle (even more unlikely than 3).

So we have: 10e12 states at 10e-3 seconds per state giving 10e9 seconds to perform the simulation... umm, that's 31.7 years.

Meanwhile, a day of testing the actual circuit on the testbed turns up the failure.

OK, I hear sputtering on comp.arch.fpga...from both sides...

"Someone didn't do enough higher level system analysis, the system designer should be shot"

"It's the programmer's fault, shoot him"

"Only 10e12 states? I've got 50 different inputs to worry about, let alone internal states. Shoot me"

So shoot down my example, you can probably come up with a better one. The point is (echoing Jeff Goldlum, "Jurassic Park" ran last week) any sufficiently complex system will have unforseen failure modes. Simulation (even "formal verification") cannot catch them. When the failure occurs, it is nice to have visibility into the circuit, be it Chipscope or handrolled test circuitry.

Forget about Brian's BlockRAM example, Mike's arbitration answer takes care of that. Here where I work, programmers tend to use logic analysis as much as the H/W types do, to monitor bus activity (who said what to whom, and when?). My designs involve video processing, which can be quicker to implement and look at than to try to simulate. If I understand O.P., he's in the same position.

To all c.a.f, sorry for the B/W, I haven't posted anything lately. I just hate hearing "simulation is all you need". For small simple circuits that is true; for larger systems, the more tools the better (till you end up spending more time learning the tools than actually producing). Apologies Mike if I took your statement out of context. You are 100% right about regression, but not every corner can be forseen, and often the data outweighs the simulator. I don't use timing sim either (although the x-men posts leave me wondering if I'll have to)...

Regards, John

Vote

M

Mike Treseler 21 years ago

Chipscope exists because a significant percentage of fpga designers already understand and can make good use of logic analysis to debug designs.

No sputtering here. Simulation can't cover every fpga code defect. Certainly testing continues on the bench and at alpha and beta sites. And almost certainly, the fpga designer will be fingered with a defect or two.

When this happens, you either fire up a sim and try duplicate the problem or configure a logic analyzer and try and trigger on it. It's a matter of personal preference. When you're under the gun, you go with what you know best.

If that's how it sounded to you, I apologize. My intention was to voice the minority opinion that simulation can be just as useful as logic analysis.

Thanks for the posting.

-- Mike Treseler

Vote

P

Philip Freidin 21 years ago

Which means: If you write to Port A, at address Y, with data Z, the prior contents of location Y will be latched on the port A output. Location Y will be set to Z.

If Port B is looking at some address other than Y, the behavior is as expected. If port B is set to address Y, and the clocks occur at the appropriate time, and Port B is enabled, you might not get what you expect. I describe exactly what happens below.

Which means: If you write to Port A, at address Y, with data Z, the Z be latched on the port A output. Location Y will be set to Z.

If Port B is looking at some address other than Y, the behavior is as expected. If port B is set to address Y, and the clocks occur at the appropriate time, and Port B is enabled, you might not get what you expect. I describe exactly what happens below.

Which means: If you write to Port A, at address Y, with data Z, Location Y will be set to Z. Port A output latch will not be changed.

If Port B is looking at some address other than Y, the behavior is as expected. If port B is set to address Y, and the clocks occur at the appropriate time, and Port B is enabled, you might not get what you expect. I describe exactly what happens below.

This is called "you get what you deserve mode"

I believe this is not quite the full story. The write mode causes some changes in the internal timing sequence of a port, and when exactly the RAM is updated. The write mode on the other port has no affect on the timing of a read on that port. The following describes how the internal timing sequence on the write port can affect the read on the other port, IF it is using exactly the same clock, enable, and address.

The problem is that this APP note does not express reality well.

I hope the following will un-confuse you, and give you understanding.

Here is the "truth"

1) The two ports are totally independent, except for the array of storage bits.

2) There is no contention detection logic, or address match logic. So there is no way for the block RAM to "know" that both ports are accessing the same location.

3) The 3 different write modes do not have identical internal timing and this leads to behavior that may surprise you if you didn't know this.

4) If the two clocks are asynchronous, you can not predict what will happen on the other port, when writes and reads are close to each other, and addresses match. If you care about reliable systems, there should be system level things you should have done so that this can never occur.

So what follows, is for the very special case of:

A) both ports are using the same clock (if not, nothing can be predicted) B) both ports are enabled (if the non writing port is not enabled, then nothing changes on its output regardless of address. Remember, the ports are totally independent) C) port A is writing, port B is reading (or the other way round) (if neither are writing, then nothing special is going on. if both are writing, either it is to different address and nothing special about that, or they are writing to the same address, and you need a new system designer) D) both ports have the same address (if not same address, nothing special to think about)

Block RAMS take time to write data. Block RAMS take time to read data. Reading and writing involves address decoding, data propagation, write select lines, sense amplifiers for reading, timing control logic.

To write, latch the address, data, enable. Decode address, select the word, distribute the data on the bit lines, strobe the word write logic.

To read, latch address, enable. Decode address, select the word, enable contents to be driven from the RAM bits onto the bit lines, use the sense amps to read the bit lines, latch the sensed data into the output latch.

Between read and write, the address latch and decode logic is obviously shared, and that is not an issue because the reads and writes use them the same way. The bit lines are a different story.

Here is the timeline for Read and the 3 write modes:

READ: decode address, drive RAM onto bit lines, use sense amps to read bit lines, update output latch.

WRITE_FIRST: decode address, distribute write data, strobe data into RAM drive RAM onto bit lines (no change), use sense amps to read bit lines, update output latch.

READ_FIRST: decode address, drive RAM onto bit lines, use sense amps to read bit lines, update output latch, distribute write data, strobe data into RAM

NO_CHANGE: decode address, distribute write data, strobe data into RAM

READ and READ_FIRST have the same read timing. The read timing for WRITE_FIRST is different.

WRITE_FIRST and NO_CHANGE have the same write timing. The write timing for READ_FIRST is different.

Now lets look at the table 8 from the APP note, but first, lets look at the prior paragraph:

"These different modes determine which data is available on the output latches after a valid write clock edge to the same port."

underline "same port"

"In READ_FIRST mode, a memory port supports simultaneous read and write operations to the same address on the same clock edge, free of any timing complications."

Place a little asterisk after "free of any timing complications".

At the bottom of the page, put another asterisx, and follow it with a link to this article.

"Table 8 outlines how the WRITE_MODE setting affects the output data latches on the same port, and how it affects the output latches on the opposite port during a simultaneous access to the same address."

underline "during a simultaneous access". This means: Exactly the same clock, and enable. Go back and read (4) above.

OK, here we go.

Port A writes, port B reads same address, identical clocks. port B is using READ timing (see above) which assumes that RAM contents are stable. WRITE_FIRST is writing to the RAM at this time, so bits are in transition. Port B sees anything between old contents, new contents, or a mix of both. The read and write timing are "self timed" so no predictions can be made as to which happens first. Rather than say "Invalidates" it would have been better to say "unpredictable"

Port A does the late write, so the port A read first occurs at the same time as the port B read, so the data has not yet changed at the time when port B is reading, so it gets the old contents, same as port A.

Port A writes, port B reads same address, identical clocks. port B is using READ timing (see above) which assumes that RAM contents are stable. NO_CHANGE is writing to the RAM at this time, so bits are in transition. Port B sees anything between old contents, new contents, or a mix of both. The read and write timing are "self timed" so no predictions can be made as to which happens first. Rather than say "Invalidates" it would have been better to say "unpredictable"

You can see all the gory details by going to

formatting link

and looking at US 6,373,779 "Block RAM having multiple configurable write modes for use in a field programmable gate array "

Figure 5 is READ Figure 6 is WRITE_FIRST Figure 7 is NO_CHANGE Figure 8 is READ_FIRST

This will be on the final exam.

Philip Freidin

=================== Philip Freidin snipped-for-privacy@fpga-faq.com Host for

formatting link

Vote

S

Simon 21 years ago

Thanks Philip, most illuminating :-)

[grin] I did my finals more years ago than I care to remember, but I'm always keen to learn something new :-) [aside] Up until now I've always used single-clock designs, so there was an implicit assumption in my post that the clocks/enables were identical on both ports. It's the signs down the multiple-clock path that put me off, you know, the ones saying "Abandon hope, all ye who enter here", and "Here be Dragons". I think the blood dripping onto the path is a little melodramatic though :-) [/aside]

Simon.

Vote

J

Johan Bernspång 21 years ago

Well, I answered the question myself on friday afternoon.

Yes, I did have timing constraints, and yes the constraints were met.=20 However, a typo in the period of the clock made the constraints a little =

bit too relaxed. I don't know what I was thinking when I calculated that =

number in the first place, but it made the clock period almost 4 times=20 longer than the clock I'm utilizing... Completely my fault, and rather=20 embarrassing..

Anyway, I appreciate the input from all of you, and I'll get back to you =

if I still get a huge difference in system performance when switching=20 from 6.1 to 6.2 with working timing constraints.

Regards Johan

Brian Philofsky wrote:

=20

t=20

e=20

l=20

=20

e=20

o=20

x=20

r=20

e=20

=20

--=20

----------------------------------------------- Johan Bernsp=E5ng, snipped-for-privacy@xfoix.se Embedded systems designer Swedish Defence Research Agency

Please remove the x's in the email address if replying to me personally.

Vote

6.1 vs. 6.2

Join the Discussion

Didn't find your answer?