I am not sure that the pipeline depth is an architecture feature of M3. I'd think it's an implementation detail that could change, e.g. when your current P/N is replaced. Plus, as others have said, you are at a mercy of other implementation details: caches, if any, memory bank effects, interrupt latencies, etc.
If you need cycle-accurate timing, you should use something that is guaranteed to be cycle accurate, for instance the PRU units in TI AM335x, as used in e.g. BeagleBone
I think you are making this too complicated. How much dead time do you need? What is the range of the times you need to set and what are the absolute limits?
Why can't you just use two separate timer outputs with non-overlapping signals? At worst, you will need an inverter on the output if the timer block does not support inverting the output on the second block, but most microcontroller timers can do that themselves (I am not familiar with the exact device you are using).
Using DMA to transfer new values into the PWM unit might be convenient if you don't want to respond to an interrupt, but you should also manage it from an interrupt on the timer.
Please forget about the pipeline:
You cannot disable the pipeline, making the discussion pointless.
Disabling the pipeline would not affect the predictability and determinism of the code - the pipeline is deterministic.
Flushing pipelines does not disable them.
The determinism of the cpu is dominated by effects such as memory buses, caches, instruction pre-fetches, etc. The pipeline itself would be a minor issue even if it were non-deterministic.
If your design depends on accuracy, predictability or jitter in the cpu execution speed, the design is broken. You are re-arranging the chairs on the deck of the Titanic.
Even with i486s, with their tiny caches, the ratio between mean and worst case (IIRC) interrupt latencies could be 10:1 (from memory 70us vs 700us).
Modern processors have much larger caches, and the variations can be far larger.
The disparity between cache speeds and main memory speeds is far larger, so cache misses have a larger affect on the latency.
The i960 enabled its cache to be frozen, to avoid that kind of problem.
Quite frankly, if you are worried about any determinism effects due to pipelining, either you are misguided or your hardware/software architecture needs to be changed.
If /variations/ in instruction timing really is that critical, then you'll have to use something like the XMOS toolchain.
I can generate standard 6 phase PWM that way, but I need center aligned PWM in which the individual PWM signal goes both high and low with a wait state before next cycle
The figures were from memory, but were definitely in units of time (s), not conductance (S).
I've disinterred the original, "Perils of the PC Cache" by Phillip J Koopman. In that he read data from a port and put it in a simple circular queue. Naively looking at the data book indicated it would take 104 clocks. Measuring the mean time of 1e6 iterations took 149.6 clocks. The worst case took 272 clocks.
So clearly I has mis-remembered clocks as microseconds, and there was a factor of 2:1 (i.e. 100%) for the I/D caches, not 10:1.
With the I/D caches turned off, he measured min-max of
484-508 clocks, i.e. 5% - a considerable improvement.
But then he also measured the effects when the TLB and cache could get in the way, and found that the mean was 300 clocks and the max was 900 clocks, which is the source of my 10:1.
Summary: even trivial I/D caches caused a 100% variation between mean and max. variations. Switching caches off reduced that to a 5% variation.
TL;DR: forget worst-case predictablity if you have caches; loss of predictability is inherent with caches.
Hah! I have thought for decades second was supposed to be abbreviated to a capital S.... may be it has been the case in the past, may be I remembered wrongly. I have been pretty consistent in my documentation etc., if I have used a lower case s it has been an error of mine... :-). Thanks for noting that, it may take me a while to get used to the lower case s but I'll work on it.
Ah that sounds more sane. 700 is still huge but comparable to say the
70 of a 68k mcu of that era (its division was in that ballpark).
The Pentium IV has a _20_ stage pipeline: a mispredicted branch on that chip has a significant impact on performance even if both branch targets are in cache.
Of course, nobody in their right mind would use such a chip in a HRT system, and it is true that load stalls are far more worrisome than a pipeline flush ... but it is not "misguided" to be concerned about the pipeline length.
I expected something like that - do the 12 cycles include the worst case opcode in execution (probably division)? (I guess it does, trapping must take just a cycle or two).
The 6800 needed 22 cycles to stack all its registers.... or may be to stack them and to fetch the vector and go there, actually I think it was the latter. Last time I have needed that figure must have been over 30 years ago, strange I remember it (unless I just think I remember something, could well be the case).
However, Tektronix used to get it wrong on their oscilloscopes - and as a metrology company they really ought to have know better. I suspect they finally realised the error of their ways when specifying their digitising scope's sampling rate, e.g. 100MS/s. Not that S=samples is correct!
May be I have taken it from them. I used to look at some of their service manuals when I was making my first steps in analog design (back then I wanted to build an oscilloscope... never built it, not many projects I have left unfinished but this is one of them. Learned a lot while trying though. I may still build one, now at least I know how to :D ).
One of my projects, when I get A Round Tuit, is to make a 2GS/s
4GHz scope. The trick is to find a way to do it with only trivial analogue front end components. Hence no amplifiers, no ADCs, just a 50ohm input with very simple analogue components. Yes, there will be limitations, but that's half the fun :)
Yes, I know you can buy remarkably fast remarkably cheap ADCs nowadays. But I want to do it for tens of dollars, not thousands :)
instructions like divide are aborted to take the interrupt, so the 12 cycle s (29 with floating point) from interrupt to first isr instruction executed i t the real maximum
A 3-stage pipeline does not mean that the cpu does 3 instructions per clock cycle! The 12 cycle maximum latency of interrupts on an M3/M4 takes the pipeline into account, but assumes there are no latencies on memory or buses (all instructions and vectors in cache or single-cycle memory).
No, the pipeline needs filling so it does not improve latency.
But this is not the point here. It looks as if you are trying to figure out a way to use a hammer on a screw; screws are meant to be used with screwdrivers, not with hammers.
If you do not have enough timers to program them such that you will get the dead time, you can still use those you have to initiate the opening of one of the transistors and open the other after that by the core in an interrupt handler. You can only guarantee the minimum time for that, but it might be enough if the IRQ latency is in the ns range, perhaps you can afford a jitter of a miscosecond [uS is what I would have written, how am I supposed to do that now? "us" is more ambiguous? Tom?] or so. If you cannot there is nothing much better you can do with that mcu anyway.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.