Arm development systems

sting

M3

han

cent

ks as

bit

the

a

STR7 is old school ARM7TDMI not cortex (I know NXP also had acceleration on ARM7TDMI)

STM32L1 doesn't have a flash accelerator but it can read flash in

64bit words and is really meant for low power not speed

STM32F2/F4 have flash accelerators like NXP, and run full speed from flash

-Lasse

Reply to
langwadt
Loading thread data ...

We are talking about 32 bit platforms.

RTFM

There are tools for analysing stack depth before deploying code. Its quite bad if you don't think about that before starting on the software.

Learn how to program :-) Often it takes a good review of your code to figure out why things don't work.

Usually the layer that actually does I/O is very small. What goes up to the algorithms is easely simulated on a PC. Time doesn't matter in the digital domain. Once you get the algorithms right you can see if it works in a controller and optimise if necessary.

A few years ago I developed an echo canceller for a client. I started with getting the algorithm to work properly on a PC. That gave me a known good implementation. Then I started to optimise so it would meet the timing requirement. The first floating point version was about 15 times too slow. The final (mostly fixed point) version was fast enough and still had the same quality as the initial version.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
Reply to
Nico Coesel

Nice part, thanks. Blows the budget, unfortunately, particularly when I can get most of that inside the uC with a bit of bandaging.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net
Reply to
Phil Hobbs

when I was involved in ARM development at a big company we used codewarriors compiler, and a Lauterbach TRACE32 jtag debugger it is expensive but very fast and the user interface can do so many things and be programmed to do more than you can imagine

We also had their tracer that logs program flow so you can basically rewind and replay what happen

but it massive overkill unless you are doing something very complicated, I know of tons of code that has be programmed and debugged with nothing but a serial port and a software debugger

With the current Cortexs that have rom bootloaders for usb/rs232/can, etc. all you really need is a computer and GCC

-Lasse

Reply to
langwadt

Signedness of chars is independent of machine width.

I guess you've never been tasked with writing software for a hardware platform that hasn't yet been defined? :>

Do you know what the actual distribution of IRQ's will be on your platform? Esp when (as above) the hardware might not exist at the time you develop the code? Do you know what level of support for floats will be provided in the hardware vs. "helper routines"?

Moving to actual iron is rarely a case of just changing the name of the compiler invoked on the command line.

"Perfect code" won't help you if an output is directly/indirectly routed to an IRQ (either by mis-design or a layout problem). You can simulate it all day on a PC and never *expect* it.

Don't do much RT work, eh? :>

As I said previously, a lot depends on your coding style. Asked to write a task to copy a file, most folks would implement this in a single thread using pseudo-synchronous I/O:

while (source not empty) { buffer write(sink) }

*I* would write it as three threads:

thread1: parse source and sink specifiers, signal errors setup source and sink devices (files, pipes, etc.) allocate buffer sized per needs of device types create producer and consumer threads wait for them to finish interact with user to report status, as required handle signals to release resources, if needed

thread2: wait for space in buffer buffer write(sink) lather, rinse, repeat

[note that the single threaded example omits many of these issues which would still need to be addressed -- in "spaghetti code"]

My solution is harder to emulate "well" on a PC. Do you have support for threading? Can the device interfaces be emulated accurately? How much time will it take for me to design the emulation environment? How many bugs will

*it* have?

But, mine is a lot easier to get right "first time" (when you look at *all* of the "practical" issues that arise).

If you constrain yourself to working on the PC, you limit the types of solutions you can reasonably pursue.

Reply to
Don Y

s,

)

ms*

if that is how you would solve a simple task in an embedded system, I'm beginning to understand why MCUs now come with megabytes of flash and runs a 100's of MHz...

-Lasse

Reply to
langwadt

It's a simple task only if you are willing to accept "simple solutions"! :>

What happens if the source or sink stalls "indefinitely"? I.e., a disk that won't spin up? A serial port that has been "paced off"? etc. How do you *kill* the task? Or, do you let it hold those resources indefinitely? (cycle power? reboot? gee, that's an elegant solution! :> )

What happens if/when the user REPEATS the request? Do you now have *two* instances running -- exhibiting the same problem? Can your design even *tolerate* two instances of the same task?? (i.e., have you statically allocated a *single* copy buffer in your design and now see that buffer as "in use") Or, does it manifest to the user as a different problem (e.g., you've run out of resources)

How do you tell the user what is happening -- since the consumer/producer (which the simple implementation fits into that single consumer-producer thread) can be blocking waiting for data/space? How do you *ask* the task what is happening?

Do you code the algorithm to handle large block devices and assign resources on that scale, regardless of the needs of the actual devices involved in the operation? Does that then limit the number of concurrent instances of this task that you can support -- even if the devices in those instances are serial ports (that don't need to buffer huge amounts of data)?

Do you write (and maintain) different versions of the same (conceptually) task to handle those different cases?

Decomposing tasks almost always makes things easier and more efficient -- unless you are willing to live with simple solutions and the problems that inevitably follow, "in practice".

Do you think your iPhone has one giant "loop" that runs continuously?

Embedded systems use more complex hardware because embedded systems now do a lot more than they used to. Often, more than

*desktop* systems!
Reply to
Don Y

I've never come across unsigned chars in the past 15 years.

I port a lot. Does that count?

Perhaps but since it behaves unlike the reference implementation you'll quickly notice

I do lots of it.

I would never ever go multi-threaded on a microcontroller. Too much obfustication and overhead. I use consecutive non-blocking tasks which are extremely predictable regarding performance and stack usage. Don't make things more complicated than they need to be.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
Reply to
Nico Coesel

Finding memory corruption bugs is hard to do on an embedded platform. Better be ahead of them. I add plenty of boundary checks and pointer validations. There are tools like Valgrind to find leaks, boundary errors, etc on a PC platform.

A nice feature of ARM devices is that they have exceptions. So when you run into a pointer trying to write to the flash or read where there is no memory you'll get a notification. In debug builds I let the software print a stack dump and a small backtrace. All of this does take some inline assembler code. In release builds I just let an exception reset the device. The Cortex devices also offer a division by zero exception.

Depends on the number of units you intend to ship. Succesfull niche market products allow for customer specific solutions which may need more memory than expected.

In the STR700 everything is mediocre.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
Reply to
Nico Coesel

For the same package size and MCU series, all the chips are pin-compatible, so I can just specify the next larger size for that customer. They'd be paying a premium anyway. The things I'm looking at doing are cost-sensitive, moderately high volume things--maybe 100k units per year if all goes well. A buck extra on the processor gets to be important money at that point, so paying $10k for the fancy Keil system might make good sense. (Assuming I have some outside development money.) The Code Red ones are only $1100 or so including a JTAG module, but they don't support the embedded trace module. They told me they were developing it, and that it was due to ship Real Soon Now ^H^H^H^H^H^H^H^H^H^H^H later this year. Assuming it doesn't go up by a factor of 10, Code Red plus one of those Seeger JTrace gizmos looks about right. I might get an STM32 demo board and try the demo version of Code Red on it. I really don't want to be stuck with just one vendor's chips.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net
Reply to
Phil Hobbs

The Codered stuff look good. The lite version has a limited code capability. If that works for you then great.

Keil is a little pricey, but judging from their old 8051 compiler, you wont find many bugs.

As for Open stuff, theres DS-5 (for Linux) from the ARM site. And the coocox tools look promising

formatting link

Cheers

Reply to
Martin Riddle

Um, *Arm* (isn't that what this thread was about? :> ). MIPS.

Then you know that time *does* matter in the digital domain!

Actually, well written multitasking programs are smaller, simpler and more robust than equivalent single-threaded programs. Otherwise, you end up having to do lots of spin-waiting and checking unrelated "things" while you are busy with something else (or, worse yet, push all of those things into ISR's, needlessly).

*Think* about how you would write that file copy task to handle the real-world issues that I mentioned. You'll end up with a real mess of spaghetti code -- assuming you don't ALSO have to do "other things" while copying! [I'm serious, here! Are you going to put timeouts on the non-blocking calls? What's keeping track of time for you? Can the output device work *while* the input device is working? Or, does one stall while you service the other? Do you embed code in your spin-wait loops to poll the user to see what he might want? Or, to tell him what's happening? etc.]

I wouldn't consider writing a NON-multitasking application (unless it was a trivial "high school project").

Reply to
Don Y

Thats timing. What I mean is that it doesn't matter how fast (or slow) a PC processes input/output data for an algorithm. You don't need a realtime 'simulation' when developing algorithms.

A hate to burst your bubble but whenever multithreading comes into play things get difficult. Even for very experienced programmers. In most cases multi-threading works on a single CPU because of luck. I've seen too many programs crash and burn when the multi-core PC cpu's came onto the market.

In your example you have 3 seperate pieces of code that work closely together on one task. What if thread 3 gets a timeslice before thread

2? In that case you'll always have excess buffer space filled (and memory wasted) . You more or less have to make sure thread 2 is always activated before thread 3 so the buffer space filled in thread 2 has a big chance getting flushed in thread 3 immediately. And what if thread 3 doesn't get enough timeslices?

Not to mention chances of deadlocks.

Nope. I'd write it like this:

run_copy { if (data in source and buffer is not full) -> copy from source if (output ready and buffer not empty) -> write to output }

The function run_copy is called continuously from a main loop in a list of several run_do_something functions. Actually these functions can be considered threads. They don't run in parallel but in series which doesn't matter for execution time because you only have so much CPU time to work with.

The biggest advantage is that the whole system is more predictable and you can use (almost) all of the stack for each task.

Ofcourse. As soon as there is nothing more to be done for now, the non-blocking function (task) exits. Thats the whole point of a non-blocking function!

No. If you write all the tasks as non-blocking functions you have no problems with spin locks. User interaction is just another task in the sequential list of tasks.

I've worked (with others) on huge projects (producing >700kB binaries) that operate this way for timing critical systems. Writing non-blocking tasks requires a certain state of mind though. Once you get used to it, it makes life a whole lot simpler.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
Reply to
Nico Coesel

In RT designs, time *is* a factor in the correctness of the algorithm. That is the essence of RT.

MultiPROCESSING is different from multiTHREADING. It is relatively easy to write a safe multithreaded program by following simple rules.

People who can't just haven't learned *how*.

So what? If there is no data in the buffer, it yield's the processor. If there *is* data, it processes it. That's what it is SUPPOSED to do.

If thread 2 gets a timeslice before thread 3, then it will read in more data IF THERE IS SPACE IN THE BUFFER. Otherwise, it will yield the processor (assuming the OS doesn't support event notifications).

Each of the two worker threads concentrates on just one aspect of the task.

No, the example you cited would cause the buffer to be "more empty" (the consumer is running)

Whether the buffer is 90% full or 9% full, the same amount of memory is used by the buffer.

In a single threaded design, you would completely *fill* the buffer (is that "wasted memory"?) and then completely EMPTY it (so the memory is sitting there, idle -- is THAT wasted memory?)

You don't understand how multitasking works. Thread 2 runs as often as there is space available in the buffer. Thread 3 runs as often as there is data in the buffer.

[Note that I did not say *when* there is/isn't data -- just "averages"]

Thread 3 can be activated 10,000 times for every *one* time thread 2 is activated. Doesn't affect the algorithm's correctness.

Imagine thread 3 services a 110 baud UART and thread 2 services a modern disk drive. Disk fills the buffer via thread 2 in *one* invocation of that thread (for modest size buffers). Thread 2 can't do anything until there is space in the buffer. That relies on thread 3 pulling data OUT of the buffer.

But, at 110 baud, it takes 100ms just to move *one* byte out of the buffer. To move a full sector out of the buffer (assuming you are using the block device for the disk) will take 50 seconds. *Then*, there is enough room for thread 2 to put another sector worth of data into the buffer. And, immediately start waiting -- for another 50 seconds.

Thread 2 won't *do* anything more unless thread 3 empties out space in the buffer. The multithreaded approach allows for more parallelism. I.e., device I/O associated with thread 2 can happen while thread 3 is running (and vice versa)

What if your single threaded implementation doesn't get enough timeslices (time)?

Where is the chance for deadlock? There is a single shared resource, the buffer. Since only one owner can hold that resource, it can't be deadlocked by a dependence on another resource -- no "deadly EMBRACE" (since there is no other "party" holding a resource).

With two or more resources, you just make sure you take all of the resources that you need in a fixed order -- so that anyone else taking that same set of resources takes them in the same order (which means, if you managed to acquire resource X before the other task, then he will have to wait for X and won't have a chance of taking *Y* -- which you also need -- before you get it).

It's just a matter of programming discipline. Like making sure you POP as many things as you PUSH.

*One* byte at a time? What if the next byte isn't ready yet? (e.g., imagine pulling bytes out of a UART receiver or pushing them into a Tx register)

The multitasking executive formalizes your "big loop".

You have: while (FOREVER) { if (enable_run_copy) { run_copy() }; if (enable_run_GUI) { run_GUI() }; if (enable_run_PID) { run_PID() }; ... }

(since you don't want to run_whatever if "whatever" doesn't need to be done!)

And, when you want to have a second instance of run_copy?

while(FoREVER) { if (enable_run_copy1) { run_copy1() }; if (enable_run_copy2) { run_copy2() }; if (enable_run_GUI) { run_GUI() }; if (enable_run_PID) { run_PID() }; ... }

Of course, making sure that run_copy2 uses a different buffer than run_copy1, different I/O descriptors, etc.

You also have to keep track of each run_whatever's state *in* that routine. E.g., if you can't finish *all* of run_whatever in a single invocation, you have to store information in the routine (static variables) that allow the routine to remember where it was and restart from that point.

E.g., if you are printing 30 address labels, you don't want to tie up the whole CPU waiting for all 30 to be printed. So, you have to remember which label you printed last and how *much* of that label you managed to spit out to the printer before the printer signalled "busy" and you decided not to wait (because that would tie up the CPU at the expense of other tasks)

I worked for a company that had a very efficient multitasking executive that worked exactly that way. Task switch was 7us on a Z80 (i.e., a handful of opcodes!). It is a LOT harder to code in that environment. Tuning the system is hit or miss as you add "tasks":

while(FoREVER) { if (enable_run_copy1) { run_copy1() }; if (enable_run_copy2) { run_copy2() }; if (enable_run_GUI) { run_GUI() }; if (enable_run_PID) { run_PID() }; if (enable_run_motor1) { run_motor1() }; if (enable_run_motor2) { run_motor2() }; if (enable_run_barcode){ run_barcode() }; if (enable_printer) { run_printer() }; if (enable_run_scanner){ run_scanner() }; if (enable_run_comm1) { run_comm1() }; if (enable_run_comm2) { run_comm2() }; if (enable_run_power) { run_power() }; ... }

Ooops! We're missing some barcode scans. Need to cut the time between run_barcode() invocations:

while(FoREVER) { if (enable_run_barcode){ run_barcode() }; if (enable_run_copy1) { run_copy1() }; if (enable_run_copy2) { run_copy2() }; if (enable_run_GUI) { run_GUI() }; if (enable_run_PID) { run_PID() }; if (enable_run_motor1) { run_motor1() }; if (enable_run_motor2) { run_motor2() }; if (enable_run_barcode){ run_barcode() }; if (enable_printer) { run_printer() }; if (enable_run_scanner){ run_scanner() }; if (enable_run_comm1) { run_comm1() }; if (enable_run_comm2) { run_comm2() }; if (enable_run_power) { run_power() }; ... }

Ooops! Labels are taking forever to come off the printer. Need to give run_printer more CPU time:

while(FoREVER) { if (enable_run_barcode){ run_barcode() }; if (enable_run_copy1) { run_copy1() }; if (enable_run_copy2) { run_copy2() }; if (enable_run_GUI) { run_GUI() }; if (enable_run_PID) { run_PID() }; if (enable_run_motor1) { run_motor1() }; if (enable_run_motor2) { run_motor2() }; if (enable_run_barcode){ run_barcode() }; if (enable_printer) { run_printer() }; if (enable_printer) { run_printer() }; if (enable_printer) { run_printer() }; if (enable_printer) { run_printer() }; if (enable_run_scanner){ run_scanner() }; if (enable_run_comm1) { run_comm1() }; if (enable_run_comm2) { run_comm2() }; if (enable_run_power) { run_power() }; ... }

Etc.

I've got a paper I wrote on the technique. I've used it on some small PICs but its ancient technology from a time when processors were considerably more crippled (like PICs!) (though I know many products that were designed with this system under-the-hood that are still in use, today!)

You're spinning around the main loop. Same difference.

User is prompted for an ID number. You don't want to block waiting for the slow human. So, run_GUI has to *remember* that it is "waiting for an ID number" -- so that the next time around the main loop, you don't start the "run_GUI" task at the beginning but, instead, resume *in* the "get_ID_number" portion of the task.

[Is this a function? If so, it has stuff piled on the stack so you can't *just* get back to this point. Instead, that "function" has to return -- so run_GUI() can return -- but remember that *it* should receive control of the processor the next time run_GUI() is invoked.]

Once a digit has been typed in, you will eventually process it. Then, "remember" that you have accepted one digit -- and are still waiting for more (or ENTER).

Once the ID is complete AND entered, get_ID_number is complete and run_GUI can move on to the next step -- looking up the name associated with that ID number -- which it will then print on the printer, etc.

In the multitasking approach, you just write the code that you want to run without worrying about keeping track of where you have to "get back to". You have a virtual processor at your disposal.

You can't provide tight controls on timing because timing changes each time you change the contents of your main loop. Or, the code in one of those functions (what happens in run_compute_pi? how many digits do you compute before you arbitrarily decide to let some other task have a chance at the CPU? Is the time required for each digit constant? If it varies, then the time in any particular instance of run_compute_pi can vary. This affects *when* the other run_whatever routines are invoked, etc.

So, time critical things move into ISRs -- which makes the application brittle. Or, you get stuck in this juggling of "if () {run()}" invocations.

With a multitasking design, a task *can* block taking no temporal resources. It need not do anything out-of-the-ordinary to remember what it was doing when most recently active. It just sits *in* the statement that it was executing at the time of the task switch.

It concentrates on the task at hand instead of accommodating other tasks via a particular set of switching mechanisms.

Reply to
Don Y

I *think* you can find this at: My first time using that service so I welcome feedback if you encounter any problems fetching it (I can't make it available from any of my servers, directly)

Note the example is in ASM but the concepts remain the same (i.e., no saved state, etc.) Coding in a HLL means that you have to exit ALL functions before moving to the next task (as that would violate stack protocol)

Reply to
Don Y

On a sunny day (Wed, 08 Feb 2012 23:28:38 GMT) it happened snipped-for-privacy@puntnl.niks (Nico Coesel) wrote in :

Ever use Linux? Most video buffers are unsigned char*, and compiling other peoples code generates trillions of gcc warnings of assigning a signed char to an unsigned... Did you write that? :-) ?? LOL

I freaking had to fix every one of them to get clean compiles.

Reply to
Jan Panteltje

You're missing the point. We are talking about systems where a char type is unsigned. Not 'unsigned char' but a char without sign.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
Reply to
Nico Coesel

Never used dynamic buffers I assume? Been there done that.

When systems get more complex.

Where do I say one byte at a time?

The test on what needs to be run is inside the run_do... functions.

You'll need to retain that information anyway. You're overcomplicating things.

Every GUI I've worked with is event driven. So you just poll for a flag 'input done in ID field', read the contents and carry on. An RTOS hides that polling for you but the net result is the same.

I agree there are limits to the serial task system. At one point I was asked to try and implement PolarSSL into an ARM controller. SSL does some hefty encryption that took a long time. I abandoned that project but if I had continued I would have used an RTOS.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
Reply to
Nico Coesel

You are missing the point. What I'm saying is that there are many debugging, code quality and profiling tools available for a PC which make software development on a PC very easy. Besides that there is no hassle with JTAG dongles, limited breakpoint capabilities, etc. IMHO it is a good idea to develop and test complex chunks of code on a PC before running them in a microcontroller. A lot of my embedded software was born on a PC.

I never typed anything about writing bloated software.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
Reply to
Nico Coesel

On a sunny day (Thu, 09 Feb 2012 10:47:16 GMT) it happened snipped-for-privacy@puntnl.niks (Nico Coesel) wrote in :

OK, I have no experience with ARM and gcc, so are you saying that gcc ARM has only unsigned char?

In fact I do not like ARM, I believe in more done in hardware. I have noticed an AMD announcement that they will also do ARM systems on a chip. What has the world come to? :-)

More bloat software expected.

Reply to
Jan Panteltje

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.