Can you turn off Pipeline in ARM Cortex M3

I am not sure that the pipeline depth is an architecture feature of M3. I'd think it's an implementation detail that could change, e.g. when your current P/N is replaced. Plus, as others have said, you are at a mercy of other implementation details: caches, if any, memory bank effects, interrupt latencies, etc.

If you need cycle-accurate timing, you should use something that is guaranteed to be cycle accurate, for instance the PRU units in TI AM335x, as used in e.g. BeagleBone

formatting link
or a dedicated logic/FPGA.

Reply to
Przemek Klosowski
Loading thread data ...

I don't get it then, what is stopping him to have the two OC-s always offset by the dead time needed?

Dimiter

Reply to
Dimiter_Popoff

I think you are making this too complicated. How much dead time do you need? What is the range of the times you need to set and what are the absolute limits?

--

Rick
Reply to
rickman

Why can't you just use two separate timer outputs with non-overlapping signals? At worst, you will need an inverter on the output if the timer block does not support inverting the output on the second block, but most microcontroller timers can do that themselves (I am not familiar with the exact device you are using).

Using DMA to transfer new values into the PWM unit might be convenient if you don't want to respond to an interrupt, but you should also manage it from an interrupt on the timer.

Please forget about the pipeline:

  1. You cannot disable the pipeline, making the discussion pointless.

  1. Disabling the pipeline would not affect the predictability and determinism of the code - the pipeline is deterministic.

  2. Flushing pipelines does not disable them.

  1. The determinism of the cpu is dominated by effects such as memory buses, caches, instruction pre-fetches, etc. The pipeline itself would be a minor issue even if it were non-deterministic.

  2. If your design depends on accuracy, predictability or jitter in the cpu execution speed, the design is broken. You are re-arranging the chairs on the deck of the Titanic.
Reply to
David Brown

Not significantly.

Even with i486s, with their tiny caches, the ratio between mean and worst case (IIRC) interrupt latencies could be 10:1 (from memory 70us vs 700us).

Modern processors have much larger caches, and the variations can be far larger.

The disparity between cache speeds and main memory speeds is far larger, so cache misses have a larger affect on the latency.

The i960 enabled its cache to be frozen, to avoid that kind of problem.

Quite frankly, if you are worried about any determinism effects due to pipelining, either you are misguided or your hardware/software architecture needs to be changed.

If /variations/ in instruction timing really is that critical, then you'll have to use something like the XMOS toolchain.

Reply to
Tom Gardner

Hmmm, these are huge figures, even the 70uS is too huge I suppose. A 1 MHz 6800 had IRQ latency in the range of 30uS or so.

Dimiter

Reply to
Dimiter_Popoff

The minimum deadtime is about 1us, maximum is 2-3us

Regards

Klaus

Reply to
Klaus Kragelund

I can generate standard 6 phase PWM that way, but I need center aligned PWM in which the individual PWM signal goes both high and low with a wait state before next cycle

Regards

Klaus

Reply to
Klaus Kragelund

We are talking about a CM3, that has sub-microsecond latency (12 cycles from memory).

--

John Devereux
Reply to
John Devereux

The figures were from memory, but were definitely in units of time (s), not conductance (S).

I've disinterred the original, "Perils of the PC Cache" by Phillip J Koopman. In that he read data from a port and put it in a simple circular queue. Naively looking at the data book indicated it would take 104 clocks. Measuring the mean time of 1e6 iterations took 149.6 clocks. The worst case took 272 clocks.

So clearly I has mis-remembered clocks as microseconds, and there was a factor of 2:1 (i.e. 100%) for the I/D caches, not 10:1.

With the I/D caches turned off, he measured min-max of

484-508 clocks, i.e. 5% - a considerable improvement.

But then he also measured the effects when the TLB and cache could get in the way, and found that the mean was 300 clocks and the max was 900 clocks, which is the source of my 10:1.

Summary: even trivial I/D caches caused a 100% variation between mean and max. variations. Switching caches off reduced that to a 5% variation.

TL;DR: forget worst-case predictablity if you have caches; loss of predictability is inherent with caches.

Reply to
Tom Gardner

Hah! I have thought for decades second was supposed to be abbreviated to a capital S.... may be it has been the case in the past, may be I remembered wrongly. I have been pretty consistent in my documentation etc., if I have used a lower case s it has been an error of mine... :-). Thanks for noting that, it may take me a while to get used to the lower case s but I'll work on it.

Ah that sounds more sane. 700 is still huge but comparable to say the

70 of a 68k mcu of that era (its division was in that ballpark).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Reply to
Dimiter_Popoff

The Pentium IV has a _20_ stage pipeline: a mispredicted branch on that chip has a significant impact on performance even if both branch targets are in cache.

Of course, nobody in their right mind would use such a chip in a HRT system, and it is true that load stalls are far more worrisome than a pipeline flush ... but it is not "misguided" to be concerned about the pipeline length.

George

Reply to
George Neuner

I expected something like that - do the 12 cycles include the worst case opcode in execution (probably division)? (I guess it does, trapping must take just a cycle or two).

The 6800 needed 22 cycles to stack all its registers.... or may be to stack them and to fetch the vector and go there, actually I think it was the latter. Last time I have needed that figure must have been over 30 years ago, strange I remember it (unless I just think I remember something, could well be the case).

Dimiter

Reply to
Dimiter_Popoff

It never was.

However, Tektronix used to get it wrong on their oscilloscopes - and as a metrology company they really ought to have know better. I suspect they finally realised the error of their ways when specifying their digitising scope's sampling rate, e.g. 100MS/s. Not that S=samples is correct!

Hey! A convert! My life's work is complete :)

Reply to
Tom Gardner

May be I have taken it from them. I used to look at some of their service manuals when I was making my first steps in analog design (back then I wanted to build an oscilloscope... never built it, not many projects I have left unfinished but this is one of them. Learned a lot while trying though. I may still build one, now at least I know how to :D ).

Dimiter

Reply to
Dimiter_Popoff

One of my projects, when I get A Round Tuit, is to make a 2GS/s

4GHz scope. The trick is to find a way to do it with only trivial analogue front end components. Hence no amplifiers, no ADCs, just a 50ohm input with very simple analogue components. Yes, there will be limitations, but that's half the fun :)

Yes, I know you can buy remarkably fast remarkably cheap ADCs nowadays. But I want to do it for tens of dollars, not thousands :)

Reply to
Tom Gardner

instructions like divide are aborted to take the interrupt, so the 12 cycle s (29 with floating point) from interrupt to first isr instruction executed i t the real maximum

-Lasse

Reply to
lasselangwadtchristensen

es

e

les

it the real maximum

Yes, and that takes 600ns for at 50MHz cpu with no optimization

Is it correct that the pipeline works also in the ISR latency, so that is m ay shorten off the 12 cycles to effective 4? (for at 3 stage pipeline)

Cheers

Klaus

Reply to
Klaus Kragelund

A 3-stage pipeline does not mean that the cpu does 3 instructions per clock cycle! The 12 cycle maximum latency of interrupts on an M3/M4 takes the pipeline into account, but assumes there are no latencies on memory or buses (all instructions and vectors in cache or single-cycle memory).

Reply to
David Brown

No, the pipeline needs filling so it does not improve latency.

But this is not the point here. It looks as if you are trying to figure out a way to use a hammer on a screw; screws are meant to be used with screwdrivers, not with hammers.

If you do not have enough timers to program them such that you will get the dead time, you can still use those you have to initiate the opening of one of the transistors and open the other after that by the core in an interrupt handler. You can only guarantee the minimum time for that, but it might be enough if the IRQ latency is in the ns range, perhaps you can afford a jitter of a miscosecond [uS is what I would have written, how am I supposed to do that now? "us" is more ambiguous? Tom?] or so. If you cannot there is nothing much better you can do with that mcu anyway.

Dimiter

Reply to
Dimiter_Popoff

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.