Hi I am not an embedded expert, so please be patient I have an application with 6 phase PWM and the CC2650 TI processor does not have deadtime support (to avoid cross conduction in a 3 stage halfbridge d esign) So, I could code this so when the timer PWM compare capture is updated, I c heck the value that is needed to setup and adjust both the lowside and high side compare values. That requires IF statement, and no control of where the program might conti nue in flash and thus the 3 stage pipeline in the Cortex M3 must be flushed A colleague said it would require a lot of code to do that. But, is it poss ible to disable the pipeline all together, so there will be no flushes and time used for this check is determined by the clock frequency directly? (no optimization from the pipeline) Regards Klaus

If I understand you correctly, what you are trying to get here is cycle-accurate deterministic instruction counts for a series of instructions - i.e., you want to be sure of /exactly/ how long those instructions will take, in order to make exactly the right changes to your lowside and highside values. If that is true, then the pipeline in the cpu is only one relatively minor issue - there are many more factors that can affect exact timing. Some factors can be eliminated or reduced (depending on the details of the chip), but not all. Putting it bluntly, you don't have that sort of control - and if you think you need it, you've got a poor design (of hardware or software). Take a step back and look at what you are really trying to do, and if you have the right approach. If you conclude that you /do/ need accurate timing, but not necessarily cycle accurate, then there are various possibilities to deal with that. Disabling the cpu's pipeline is not one of those possibilities. Post some...

Can you turn off Pipeline in ARM Cortex M3

P

Przemek Klosowski 10 years ago

I am not sure that the pipeline depth is an architecture feature of M3. I'd think it's an implementation detail that could change, e.g. when your current P/N is replaced. Plus, as others have said, you are at a mercy of other implementation details: caches, if any, memory bank effects, interrupt latencies, etc.

If you need cycle-accurate timing, you should use something that is guaranteed to be cycle accurate, for instance the PRU units in TI AM335x, as used in e.g. BeagleBone

formatting link

or a dedicated logic/FPGA.

Vote

D

Dimiter_Popoff 10 years ago

I don't get it then, what is stopping him to have the two OC-s always offset by the dead time needed?

Dimiter

Vote

R

rickman 10 years ago

I think you are making this too complicated. How much dead time do you need? What is the range of the times you need to set and what are the absolute limits?

Rick

Vote

D

David Brown 10 years ago

Why can't you just use two separate timer outputs with non-overlapping signals? At worst, you will need an inverter on the output if the timer block does not support inverting the output on the second block, but most microcontroller timers can do that themselves (I am not familiar with the exact device you are using).

Using DMA to transfer new values into the PWM unit might be convenient if you don't want to respond to an interrupt, but you should also manage it from an interrupt on the timer.

Please forget about the pipeline:

You cannot disable the pipeline, making the discussion pointless.

Disabling the pipeline would not affect the predictability and determinism of the code - the pipeline is deterministic.
Flushing pipelines does not disable them.

The determinism of the cpu is dominated by effects such as memory buses, caches, instruction pre-fetches, etc. The pipeline itself would be a minor issue even if it were non-deterministic.
If your design depends on accuracy, predictability or jitter in the cpu execution speed, the design is broken. You are re-arranging the chairs on the deck of the Titanic.

Vote

T

Tom Gardner 10 years ago

Not significantly.

Even with i486s, with their tiny caches, the ratio between mean and worst case (IIRC) interrupt latencies could be 10:1 (from memory 70us vs 700us).

Modern processors have much larger caches, and the variations can be far larger.

The disparity between cache speeds and main memory speeds is far larger, so cache misses have a larger affect on the latency.

The i960 enabled its cache to be frozen, to avoid that kind of problem.

Quite frankly, if you are worried about any determinism effects due to pipelining, either you are misguided or your hardware/software architecture needs to be changed.

If /variations/ in instruction timing really is that critical, then you'll have to use something like the XMOS toolchain.

Vote

D

Dimiter_Popoff 10 years ago

Hmmm, these are huge figures, even the 70uS is too huge I suppose. A 1 MHz 6800 had IRQ latency in the range of 30uS or so.

Dimiter

Vote

K

Klaus Kragelund 10 years ago

The minimum deadtime is about 1us, maximum is 2-3us

Regards

Klaus

Vote

K

Klaus Kragelund 10 years ago

I can generate standard 6 phase PWM that way, but I need center aligned PWM in which the individual PWM signal goes both high and low with a wait state before next cycle

Regards

Klaus

Vote

J

John Devereux 10 years ago

We are talking about a CM3, that has sub-microsecond latency (12 cycles from memory).

John Devereux

Vote

T

Tom Gardner 10 years ago

The figures were from memory, but were definitely in units of time (s), not conductance (S).

I've disinterred the original, "Perils of the PC Cache" by Phillip J Koopman. In that he read data from a port and put it in a simple circular queue. Naively looking at the data book indicated it would take 104 clocks. Measuring the mean time of 1e6 iterations took 149.6 clocks. The worst case took 272 clocks.

So clearly I has mis-remembered clocks as microseconds, and there was a factor of 2:1 (i.e. 100%) for the I/D caches, not 10:1.

With the I/D caches turned off, he measured min-max of

484-508 clocks, i.e. 5% - a considerable improvement.

But then he also measured the effects when the TLB and cache could get in the way, and found that the mean was 300 clocks and the max was 900 clocks, which is the source of my 10:1.

Summary: even trivial I/D caches caused a 100% variation between mean and max. variations. Switching caches off reduced that to a 5% variation.

TL;DR: forget worst-case predictablity if you have caches; loss of predictability is inherent with caches.

Vote

D

Dimiter_Popoff 10 years ago

Hah! I have thought for decades second was supposed to be abbreviated to a capital S.... may be it has been the case in the past, may be I remembered wrongly. I have been pretty consistent in my documentation etc., if I have used a lower case s it has been an error of mine... :-). Thanks for noting that, it may take me a while to get used to the lower case s but I'll work on it.

Ah that sounds more sane. 700 is still huge but comparable to say the

70 of a 68k mcu of that era (its division was in that ballpark).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

G

George Neuner 10 years ago

The Pentium IV has a _20_ stage pipeline: a mispredicted branch on that chip has a significant impact on performance even if both branch targets are in cache.

Of course, nobody in their right mind would use such a chip in a HRT system, and it is true that load stalls are far more worrisome than a pipeline flush ... but it is not "misguided" to be concerned about the pipeline length.

George

Vote

D

Dimiter_Popoff 10 years ago

I expected something like that - do the 12 cycles include the worst case opcode in execution (probably division)? (I guess it does, trapping must take just a cycle or two).

The 6800 needed 22 cycles to stack all its registers.... or may be to stack them and to fetch the vector and go there, actually I think it was the latter. Last time I have needed that figure must have been over 30 years ago, strange I remember it (unless I just think I remember something, could well be the case).

Dimiter

Vote

T

Tom Gardner 10 years ago

It never was.

However, Tektronix used to get it wrong on their oscilloscopes - and as a metrology company they really ought to have know better. I suspect they finally realised the error of their ways when specifying their digitising scope's sampling rate, e.g. 100MS/s. Not that S=samples is correct!

Hey! A convert! My life's work is complete :)

Vote

D

Dimiter_Popoff 10 years ago

May be I have taken it from them. I used to look at some of their service manuals when I was making my first steps in analog design (back then I wanted to build an oscilloscope... never built it, not many projects I have left unfinished but this is one of them. Learned a lot while trying though. I may still build one, now at least I know how to :D ).

Dimiter

Vote

T

Tom Gardner 10 years ago

One of my projects, when I get A Round Tuit, is to make a 2GS/s

4GHz scope. The trick is to find a way to do it with only trivial analogue front end components. Hence no amplifiers, no ADCs, just a 50ohm input with very simple analogue components. Yes, there will be limitations, but that's half the fun :)

Yes, I know you can buy remarkably fast remarkably cheap ADCs nowadays. But I want to do it for tens of dollars, not thousands :)

Vote

L

lasselangwadtchristensen 10 years ago

instructions like divide are aborted to take the interrupt, so the 12 cycle s (29 with floating point) from interrupt to first isr instruction executed i t the real maximum

-Lasse

Vote

K

Klaus Kragelund 10 years ago

es

e

les

it the real maximum

Yes, and that takes 600ns for at 50MHz cpu with no optimization

Is it correct that the pipeline works also in the ISR latency, so that is m ay shorten off the 12 cycles to effective 4? (for at 3 stage pipeline)

Cheers

Klaus

Vote

D

David Brown 10 years ago

A 3-stage pipeline does not mean that the cpu does 3 instructions per clock cycle! The 12 cycle maximum latency of interrupts on an M3/M4 takes the pipeline into account, but assumes there are no latencies on memory or buses (all instructions and vectors in cache or single-cycle memory).

Vote

D

Dimiter_Popoff 10 years ago

No, the pipeline needs filling so it does not improve latency.

But this is not the point here. It looks as if you are trying to figure out a way to use a hammer on a screw; screws are meant to be used with screwdrivers, not with hammers.

If you do not have enough timers to program them such that you will get the dead time, you can still use those you have to initiate the opening of one of the transistors and open the other after that by the core in an interrupt handler. You can only guarantee the minimum time for that, but it might be enough if the IRQ latency is in the ns range, perhaps you can afford a jitter of a miscosecond [uS is what I would have written, how am I supposed to do that now? "us" is more ambiguous? Tom?] or so. If you cannot there is nothing much better you can do with that mcu anyway.

Dimiter

Vote

Can you turn off Pipeline in ARM Cortex M3

Join the Discussion

Didn't find your answer?