I think you've got things a bit mixed up here. No one is running Linux on an M0 (though people /have/ run Linux on M3 and M4 cores, albeit the nommu version - what used to be called ucLinux before the nommu support was integrated into the mainline kernel).
The thread here has got somewhat confusing, because people tried to help Tim before he had given us his full requirements. When he wanted to do serial stuff on a Linux system, several responses suggested Python because it is much easier than doing it in C. When he said it had to be portable to embedded systems, Python was still a suggestion since it works fine in embedded Linux systems. But it turns out that he wants to code in C so that it can be easily used on small non-Linux embedded systems (so that he can test and debug the code on the PC, then run it on a Cortex M0 - not a bad plan).
And while it is /possible/ to run a cut-down and limited Python on an M0, I think it is unlikely to be a good choice in practice!
There is a lesson to be learned here in making your requirements and constraints explicit from the start - it's a lesson we have all "learned" many times, and all forgotten just as often :-)
If you are really keen, then I'd imagine this setup could be ported to an M0:
In most systems, various caches (data, instruction, MMU TLB) can be disabled or at least frequently invalidated, so you get the worst case performance.
Hard real-time specify deadlines so that the execution time _must_ be in 100 % cases below a certain limit. As long as that requirement is met, the actual execution time could be 1 % or 99 % of that deadline time. The only benefit of a very low execution time is that you may say some power :-).
In hard real-time environment, one would not use busy loops the create some specific amount of time delay, so it does not matter, how many cycles the processor executes with or without cache.
The only interesting thing is that the worst case execution time is _below_ the deadline time.
You can usually get a pretty solid idea of the worst case cache timing, especially if it is write-through (with write-back, you could have many "dirty" lines that need written before flushing). Reading via a cache may mean an extra couple of clock cycles to handle matching and flushing, before the actual memory read. And it will typically mean something like 4 times as much data is read to fill the cache line even though you just request one read. Using such numbers, you can work out an absolute worst case cost for the cache if every single memory access is independent, scattered about memory, and causes a cache flush - say, all memory reads take four times as long as without cache.
Then you can do your deadline testing on that basis, perhaps by reducing the memory clock to 25% (or the whole system clock if the memory clock is not independent) when testing with caches disabled.
Another thing to remember in all this is that you do not have to prove that your deadlines will be reached in 100% of cases. Perhaps 99.999% is good enough, or perhaps you need 7 nines. But your task is never to aim for "perfect" - it is to be "good enough". If you can provide statistical evidence that it is more likely for the user to be killed by a meteorite than for a deadline to be missed, then that is often good enough for the job. Of course you must be careful doing this sort of thing - but there is always a balance to be struck between the reliability of a system and the cost.
And presuming the processor manufacturer doesn't "improve" the chip, and purchasing gets exactly the same one for the next batch, etc.
I strongly suspect that a factor of 4 is way too optimistic; even the i486 showed a factor of 10. /Demonstrating/ (i.e. not merely asserting) that a factor of 1000 or 10000 or 100000 is appropriate is extraordinarily difficult. Much easier to not have the issue in the first place.
With hardware synchronisers you really want to ensure metastability failure rates of 10^12 or better!
There is always a tension between "the best is the enemy of the good" and "having a Ford Pinto discussion".
I'm satisfied if people realise and understand the downsides to caches before they make the correct decision for their requirements.
... and don't use "i686 etc" in the same context as "hard realtime" :)
If "frequent" is defined as insert "invalidate" command between every actual machine instruction, shouldn't that be enough ?
If the processor is just running that HRT task and nothing else, that would be the case.
However, there are often other non-RT task such as (l)user interfaces that could use the processing power not needed by the HRT task.
Let me put some numeric values to clarify the thing.
Assuming there are exactly 1000 evenly spaced interrupts each second, i.e. the interrupt must be served within 1000 us. As long as the worst case (cache disabled etc.) service time is less than 990 us, this system is OK. However, if the processing only takes 100 us on average, you do not get Brownie points for that, not at least from the HRT community.
However, if the interrupt service routine average CPU usage is only
100 us (with caches enabled), the CPU usage is only 10 %, so 90 % of the CPU capacity is available for non-RT tasks, such as user interfaces.
Are you saying that there are braindead processors that are slower when caches are enabled compared to situations in which all caches are disabled ? I guess that must be quite pathological cases :-).
It all depends on if you have some non-RT work that can be executed in the NULL task.
True - but that's the same regardless of the chip, the cache, and anything else involved.
Certainly the factor will be higher with desktop-oriented cpus. A factor of 4 is realistic for embedded microcontrollers with caches. So this method would be reasonable for a Cortex M4 + cache - but out of the question for trying to run hard real-time on a modern x86 cpu. Using a large "safety factor" does not relieve you from the task of picking an appropriate architecture for the job in hand.
And there are situations when even that is not good enough. This is one of the reasons why you pick the architecture that suits the job - for the highest determinism, you go closest to the hardware, or at least the simplest possible software.
By that definition, there is no such thing as "hard real time". If I ask you to build me a blinking LED with an absolute 100% guarantee that it will blink at least once every second, you cannot do so. You could give me MTBF estimates suggesting that the LED and the microcontroller /should/ have an expected lifetime of 10 years - but you can't guarantee it. And you can't guarantee that the device will not suffer from a single-event upset, or a hit from a cosmic ray, that will cause malfunction.
You might be able to prove that part of the system - the software code - is 100% good enough. But that only applies on the assumption that everything else, including the hardware and the development tools, is perfect.
Your job as a software engineer working on a "hard real time" system is to ensure that the contribution made to the expected failure rate as a result of the software is minor in comparison to other expected failure causes. When you reach the point that you can say "when the system fails to meet the deadlines, it is highly unlikely to be the fault of the software", then your software is "hard real time".
(Ideally, of course, the software for critical parts should be simple enough that you /are/ sure it will be within deadlines 100% of the time
- given that everything else works according to specification. But that's just the ideal case.)
I agree that this is a fundamental difference - and you clearly have to take the cache's behaviour into account when determining if you can be confident enough that your system will meet its deadlines.
In practice, people don't have trouble making real-time systems with microcontrollers with caches. It is an extra issue to consider and deal with, but it is perfectly possible. As with all real-time systems, you have to divide up tasks and figure out what is important, and how to be sure you can meet your deadlines, and perhaps move faster response tasks to hardware, dedicated microcontrollers, or whatever. Other possible solutions including putting critical interrupt routines into uncached static ram, or locking cache lines. Microcontrollers have such features precisely so that you can get the responses you need and still use cache. Desktop cpus don't have such features - that is one of the reasons why they are unsuitable for hard real-time tasks.
This is attempted by static WCET (Worst-Case Execution-Time) analysis tools such as aiT from AbsInt
Works IMO pretty well for instruction caches, less so for data caches (that is, you get a considerable over-estimate in WCET), but much depends on the regularity and complexity of the program. Preemptive scheduling is also a bit of a problem.
There are certainly processors in which a cache miss at a certain point in a program leads to an overall faster execution of the program than if a cache miss occurs at that point. The reason is often that the cache hit lets the processor execute more things speculatively, and if the speculation turns out not to be needed (for example, a branch prediction was wrong) then the speculation, and its effects on the caches etc., may cause more delay than the cache miss would have caused.
In the WCET analysis community, such cases are known as "timing anomalies" and they are the bane of static WCET analysis, because their presence means that the analysis cannot make worst-case assumptions at each point in the program, but must analyse many, many possible cases and combinations.
There are also programs (at least constructed examples) which have almost no cache hits. For some processors, enabling the cache (or including a cache in the HDL model) makes cache misses more expensive than cache-less main memory accesses because one or a few cycles are used in the cache look-up before the miss is detected and a main memory access is started. Then, for programs which have few cache hits, execution with a cache can be slower than execution without a cache.
But that is of course not true for the "average program", whatever that means.
Winsock is a DLL available to any Microsoft application.
However, Winsock's select() only works with sockets and not also with files, serial ports, etc. as it does in Unix/Linux.
The Windows equivalent of Unix's select() is WaitForMultipleObjects(), and to use it you have to use the asynchronous event APIs for all the "objects" involved. It isn't difficult really, but it is quite different from Unix where much of the complexity is hidden.
Ach. I was afraid of that. This being said, an enterprising person might write something to map a serial port to a socket using the programming language of their choice - sort of an internal terminal server.