microZed adventures

J

John Larkin 12 years ago

We're into this signal processing project, using a microZed/ZYNQ thing as the compute engine.

After a week or so of work by an FPGA guy and a programmer, we can now actually read and write an FPGA register from a C program, and wiggle a bit on a connector pin. Amazingly, the uZed eval kit does not include a demo of this, and the default boot image does not configure the FPGA!

We're using their build tools to embed the FPGA config into the boot image. We'd really like to be able to have a C program read a bitstream file and reconfigure the FPGA, but we haven't been able to figure that out.

If we run a C program that wiggles a pin as fast as it can, we can do a write to the FPGA register about every 170 ns. Without any attempts at optimization (like dedicating the second ARM core to the loop) we see stutters (OS stealing our CPU) that last tens or hundreds of microseconds, occasionally a full millisecond. That might get worse if we run TCP/IP sessions or host web pages or something, so dedicating the second ARM to realtime stuff would be good.

John Larkin Highland Technology Inc www.highlandtechnology.com jlarkin at highlandtechnology dot com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom timing and laser controllers Photonics and fiberoptic TTL data links VME analog, thermocouple, LVDT, synchro, tachometer Multichannel arbitrary waveform generators

Vote

J

Jan Panteltje 12 years ago

On a sunny day (Fri, 22 Nov 2013 08:57:12 -0800) it happened John Larkin wrote in :

In my view FPGA should be used for - or used as hardware solution. Putting a processor in a FPGA will work. and a multitasker will constantly see I?O interruped. use the procssor for what the processor is good for, and do the rest in hardware. If you want I/O speed... Or any speed. Else you are just building an other _slow_ mobo. and then may as well use this:

formatting link

Without any defined spec for the application who knows? Ad Jim says: You are about as vague as it gets on that,

Vote

T

Tim Wescott 12 years ago

There's not nearly enough information there, but if you're serious about real time you don't just throw a bag of unknown software at something and expect it to work. Operating systems don't steal CPU time -- programmers steal CPU time, sometimes by choosing the wrong OS.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

P

Phil Hobbs 12 years ago

That is amazing, especially since so many manufacturers (*cough*ST*cough*) respond to any query by saying "Hack up the sample code."

Cheers

Phil Hobbs

Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC Optics, Electro-optics, Photonics, Analog Electronics 160 North State Road #203 Briarcliff Manor NY 10510 hobbs at electrooptical dot net http://electrooptical.net

Vote

S

Spehro Pefhany 12 years ago

Hi, John:-

What OS is it running? Can you set the granularity of the task switching to something reasonable like low double-digit microseconds?

This is a very interesting chip, especially with a bit bigger FPGA than the 7010.

Vote

T

Tom Gardner 12 years ago

That's very useful information since I'm seriously considering using a MicroZed (or a Zybo - both have advantages and disadvantages for me).

Please continue to update us on your progress, and maybe write a blog pointing us to how you overcome "issues" similar to the ones you mention below.

Have you asked on any of the ZedBoard/MicroZed forums?

That's a useful figure to have in mind.

Personally I'm surprised that it is only a millisecond, but then I'm a pessimist :)

I'm sure I'm teaching you to suck eggs, but you may like to consider these points: - contention at the hardware level, particularly w.r.t. DRAM shared between two cores and the any FPGA logic

- cache effects. Even a 486 with its minimal cache showed interrupt latencies that were sometimes ten times the typical latency, all due to pessimal caching. Larger caches would probably exhibit poorer worst-case performance

- hard realtime systems are often best designed by determining the worst-case software main-loop time then once per main-loop configuring the hardware, and then letting the hardware deal with all actions for the next main-loop

For serious HRT work, personally I'd consider the XMOS processors -- no caches nor interrupts so the dev system can specifies the worst case performance. I don't know about the propellor chips.

Vote

T

Tom Gardner 12 years ago

I don't think there's any argument about that.

But it isn't the only consideration and doesn't invalidate the concepts behind the Zynq chips.

I/O speed is only one aspect. In most cases: - predictable worst-case latency is a more significant parameter - precision relative timing is a more significant parameter - any processor with a cache /will/ cause problems w.r.t. worst-case software guarantees

If you know of any boards that can be added to an RPi or something similar and contains an FPGA that can capture three digital inputs at >=1 GSa/s each, please let me know.

Sure, but the poster didn't want advice about his application!

Vote

T

Tom Gardner 12 years ago

What's the peak rate at which the RPi could read data from the FPGA and copy it to DRAM?

I haven't had rime to understand the RPI's i/o, yet. My concern is that while there are several i/o bits available on the GPIO connector, they can't all be read simultaneously. If true then GPIO i/o would be reduced to the level of a bit-banged interface!

Vote

J

John Larkin 12 years ago

There is a perfectly good, detailed spec. You can't see it.

I'm building a box that accepts an analog input, crunches it in complex ways, and makes an analog output. ZED is an ideal platform. All that nasty DRAM and flash and power supply stuff is done, and it boots Linux out of the box. Having the dual-core ARM and the FPGA on the same chip is cool, because it avoids a lot of interconnect between two chips on a board. Unfortunately, ARM-FPGA transactions still cross a clock boundary, so aren't blindingly fast.

John Larkin Highland Technology, Inc jlarkin at highlandtechnology dot com http://www.highlandtechnology.com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom laser drivers and controllers Photonics and fiberoptic TTL data links VME thermocouple, LVDT, synchro acquisition and simulation

Vote

T

Tim Wescott 12 years ago

What programming model are you assuming?

The RTOS's that I'm used to using make it very easy to make RTOS tasks event-driven off of interrupts, so the critical timing parameter for the OS is how rapidly it can reschedule a task after such an interrupt.

Note that for every RTOS I've ever worked with, a programmer can cut the RTOS off at the knees by turning off interrupts and doing some lengthly process, or by doing lengthly processing in an ISR on a processor that automatically turns off interrupts on interrupt. No interrupts means the RTOS has no way of getting its hands on the processor.

This "interrupt off" time can be essential if you have something that absolutely, positively must be done without interrupts, and it can be a valuable way to make an operation atomic if it happens to create less latency than using the OS's mutex scheme. But in the hands of a lazy programmer, it can absolutely kill real-time capability (as, for that matter, can misuse of mutexes or any number of other OS features).

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

T

Tim Wescott 12 years ago

Probably more egg sucking instruction, but if the instruction cache is big enough and allows it, you can lock down the lines that contain the critical ISRs and OS bits. You can only take this so far: at some point you need to either throw your hands up in despair, or find a way to fork that job into the FPGA.

At the price of making your software inscrutable, and slowing down everything else, etc., etc., etc.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

J

John Larkin 12 years ago

We're running the Linux that comes installed on the uZed.

There's probably some way to make Linux run the realtime app on the second ARM core. I guess we could even force that to run bare-metal code if we had to. The current application will probably run fine if we do nothing special. We'll have some FIFOs in the FPGA to buffer stuff, so the occasional millisecond or so timeout won't break anything.

I have been pressured for some time now to run C apps inside an FPGA, in a soft core processor, microBlaze or whatever. That never made sense to me, eating all those FPGA resources to get a pokey CPU with a tiny memory space. The new FPGA chips with hard ARM cores, cache, ram, and DRAM controllers mean it's time to do it.

John Larkin Highland Technology, Inc jlarkin at highlandtechnology dot com http://www.highlandtechnology.com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom laser drivers and controllers Photonics and fiberoptic TTL data links VME thermocouple, LVDT, synchro acquisition and simulation

Vote

J

John Larkin 12 years ago

It's looking that the Linux that comes with the uZed will work fine for the current application. Is Linux "a bag of unknown software"?

We need to do TCP/IP stuff, and manage waveform data files, run BIST, things like that, and using Linux sure makes that part easy. A lot of the functionality could be in ARM code or might be in the FPGA, but I'm moving as much as possible into the ARM. C is a lot easier to code and compile than VHDL.

John Larkin Highland Technology, Inc jlarkin at highlandtechnology dot com http://www.highlandtechnology.com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom laser drivers and controllers Photonics and fiberoptic TTL data links VME thermocouple, LVDT, synchro acquisition and simulation

Vote

T

Tom Gardner 12 years ago

I know the i960 allowed you to do that. Which more modern processors do? (No, I'm not after an exhaustive list!)

Vote

J

Jan Panteltje 12 years ago

On a sunny day (Fri, 22 Nov 2013 10:58:58 -0800) it happened John Larkin wrote in :

Well you state 120 ns, I dunno, but I can flip 10M Hz that makes 50 nS on 50 nS off on the Raspi, without any FPGA, using GPIO. I posted about those test results here, maybe it was actually 12 MHz, That allows me to send real time DVB-S signals from GPIO. because of the kernel task switching that you noticed, I added 8kx9 FIFO to get a constant output stream.

formatting link

That should not be that hard for a hardware designer,. Considering the insane price of that Xilinx thing the choice should be even easier. Now that free Mathematica runs on it what cannot be done... But Mathematica is of course very slow on it, but no math limits. When it MUST be FPGA for some part of the project bugblat can for example do your vector math.

formatting link

It has also enough on board RAM to make a FIFO IIRC. What your programmers took a WEEK?? to do is already working in several languages there.

I know the value of saying 'Look We Have The Latest ZZZZZZZ' and that _could_ impress a customer, IF he knows something, but usually they do not care how you do it if it contained horse s*it and beats their expectations fine, I think that those were Kennedy's ('s OK now? hehe) words,

Nothing to argue, It is your show. There are many channels. I have heard Monty Python wants to do a come back, I have never even watched the stuff ever.

So, I will spend some time playing with Wolfram Language (never used that) and Mathematica on my Raspi. Very nice if him to make it available for free.

Vote

T

Tom Gardner 12 years ago

Small point: you're comparing frequency and conductance :)

More interesting question, assuming you were actually talking about times, what's the typical and guaranteed tolerance on those times?

Vote

J

Joerg 12 years ago

Or, as in my case about am ARM controller and a few instances before, ST did not answer at all. So they flew out of the design.

Regards, Joerg http://www.analogconsultants.com/

Vote

J

John Larkin 12 years ago

I haven't tried it, but I'm sure the 600 MHz ARM dio pins are faster. The ARM transactions into the FPGA pass through the AMBA bus thing into the FPGA clock domain, and that slows things down. It's probably one of those modern situations that trades DMA throughput against single-transaction latency.

The uZed is only about $200, a pretty good deal for all the stuff that it does. It's something like a 12 or 14 layer board.

They were fabulous. Dead parrots. Gas cookers. Exploding version of the 1812 Overture. Lumberjack Song. Twit Races. Cartoons. Get some DVDs.

John Larkin Highland Technology, Inc jlarkin at highlandtechnology dot com http://www.highlandtechnology.com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom laser drivers and controllers Photonics and fiberoptic TTL data links VME thermocouple, LVDT, synchro acquisition and simulation

Vote

M

mroberds 12 years ago

This is more true for desktop-y Linux than it is for embedded, but sometimes it pays to recompile the kernel and customize it to your needs. There are some choices in the scheduler that can be tweaked this way. This can also be a good way to get a kernel that just doesn't work at all, so make sure you have a way to go back to stock.

The kernel configuration for plain old x86 is pretty good about dependencies: if you turn on kernel feature X, it will tell you that you need to turn on Y and Z as well to get X to work, and maybe even do it for you. Linux for embedded CPUs varies - sometimes it's like x86, and sometimes there is precisely one (undocumented) combination of kernel build options that actually works. Sometimes the documentation exists, but it's on the wrong side of an E[STF]L barrier.

Back to your original point, sometimes the lack of sample code is down to the vendor being clueless. Sometimes there is another reason - if you get the hard sell for using their approved development partners, that's a hint.

taskset(1) from the command line if they gave you that, sched_setaffinity(2) if they didn't.

I am pretty sure that you can tell Linux at boot time to not use some of the cores, but I'm not sure how you would then get something else to run on the cores that Linux isn't suing.

I heard this too, and my objection (from the software side) was that in my experience, the hardware guys I worked with tended to be pretty bad at revision control for FPGAs. Part of this was down to the FPGA tools, and part of it was down to an "it's hardware and we'll never have to change it" attitude, so they wouldn't leave a way to update the soft core over the phone. I think I have ranted about this before. :)

I like the hard core idea better, as long as the stuff that runs on the hard core loads up like normal, and is not embedded in some huge binary blob puked out by the FPGA tools.

Matt Roberds

Vote

T

Tim Wescott 12 years ago

From the perspective of someone who needs an RTOS, Linux is a bag of very dubious software, indeed.

The fact that you were trying to work around Linux certainly explains the large and variable delays.

I agree that Linux makes the "complicated" stuff easier. If you really have a need for real-time, and you don't really need that other processor for your "big box" stuff, then put an RTOS on it and run your timing- critical stuff there.

Or, research the state of real time Linux -- it kind of dropped off my radar screen about five years ago, so either it's a done deal and no one's excited about it any more, or it never quite worked right and everyone who was excited about it is now too embarrassed to speak up.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

microZed adventures

Join the Discussion

Didn't find your answer?