Linux question

J

John Larkin 11 years ago

gen is a setup-time command. The user can send us a pile of commands, and wait for each one to finish. He can open waveform files, set gains, summing, modulation, playback frequency, filtering, all sorts of stuff. Then he says GO and the show begins, with the selected channels beginning to play simultaneously, all time-synced. After GO I want everything to run as fast as possible.

I had the problem of gear-ratio channels, like one channel having 5 times the clock rate of another. I'm using a DDS clocked at 64 MHz to generate the playback clock in each wave generator channel. The fix was to use a 64-bit DDS and quantize the customer frequency requests to 1 mHz. We only load the DDS registers with multiples of 1 mHz, integer 288,230,376. FPGAs are cool.

John Larkin Highland Technology, Inc picosecond timing precision measurement jlarkin att highlandtechnology dott com http://www.highlandtechnology.com

Vote

M

Maynard A. Philbrook Jr. 11 years ago

most OS I know of will allow you to open a file in Read only mode, while another can have it open in write mode.

Basically the logging app can just append data to the end. You can also perform locks in segmented areas of the file.. etc..

Jamie

Vote

D

Dave Platt 11 years ago

What you could do, is have "gen" do the following:

- Get the parameters needed to open the file

- Open the file (to a numeric fd).

- dup2(fd, 0).

- close(fd)

- exec("/path/to/daemon")

What happens here is that you get the file opened, to a file descriptor (whose number you can't actually predict in advance).

You simultaneously close whatever gen's standard-input descriptor (0) was open to, and then "duplicate" the descriptor you just opened into descriptor zero. At this point, the file you opened is accessible through two fd's (0, and another one).

You then close the descriptor you originally opened. The file is now open only through descriptor 0.

You then use "exec" to terminate "gen", and run "daemon"... within the same process. This is different than firing off "daemon" via a separate command. "daemon" will inherit all of the descriptors that "gen" had open...

... and so its standard input descriptor will now be hooked to the file that "gen" had opened.

I *think* that "daemon" can simply start reading from STDIN as usual.

This process is very similar to how any of the Linux shells would implement a program launch with a file being specified via redirection e.g.

/path/to/daemon < /my/input/file.dat

Vote

M

Maynard A. Philbrook Jr. 11 years ago

Linux can do "Memory mapped files"..

Jamie

Vote

H

Habib Bouaziz-Viallet 11 years ago

Two shell-scripts can do the job as far as they open the file as r (not rw), the trick is the mechanism of the mutual identification between processes/Unix and also the communication between them.

Never had to do this stuff under GNU/Linux.

Habib.

Vote

D

Don Y 11 years ago

Lots missing here (though if it was posted elsewhere and I missed it, my apologies).

I assume the FIFOs are in *hardware*. I.e., somewhere not present in a typical PC (on a card that you've built, etc.). Are the FIFO's always clocked at the same rate? I.e., how much *time* does 32KS translate into? Can any number of channels be operative? I.e., what's the worst case bandwidth that you need to keep *each* FIFO from starving -- given the maximum sample rate and number of *other* FIFO's that need to be serviced (because that determines how quickly you can get around to REservicing *this* one, etc.)?

Any length limitations on the file?

Any special storage media that will *source* those files? Or, just a generic COTS magnetic disk?

What happens *when* a FIFO runs dry and there is more data still pending in the file?

You could run "gen" for each of the channels that are currently active. Perhaps give it a command line argument telling it how much to prefill each FIFO (esp if different FIFOs can be clocked at different rates).

When *all* of these have been prefilled, start as many instances of the program needed to keep *filling* the FIFOs (why does this need to be a second program?). The first thing that *it* would do is fseek() to the offset specified in the command line (the same value that gen used to prefill the FIFO). Then, *block* waiting for space to become available in *its* FIFO.

When all of these Program2's have blocked, you can then START the machine. As each FIFO empties, the corresponding Program2 will sense that and push more data into that FIFO.

Of course, Linux makes no RT guarantees (though I think there have been efforts to add *some* RT capabilities -- but, I'm pretty sure those are not guarantees that the kernel makes itself. I.e., it wasn't designed from the ground up to think about determinism in *every* action it undertakes).

So, your only hope at this is overprovisioning and hoping the user doesn't ask the machine to do something that causes it to block for "too long" (e.g., encountering a bad block on *a* disk -- even if it is not in the file YOU are using! Or, noticing a USB insert/remove event, etc.).

If forced to use a non-realtime OS for something like this, I'd opt to allocate large buffers in each Program2's process space so they can "read ahead" as much as possible. Have Program2 consist of a thread that fills the FIFO (from this memory buffer -- a "FIFO extension", so to speak) while another thread watches the memory buffer and tries to top it off from the "mass storage" medium.

Ideally, wire down these buffers so the OS won't swap them out (which loses any advantage that you would have had from them!). Then, renice your apps to run at elevated priority.

Or, use a *real* RTOS and do the math up front to ensure it

*never* "runs dry" (by *design*, instead of "wishful thinking")

David or George may be able to comment on Linux *specifics* (I don't run Linux)

If that daemon does blocking reads (for the 8 files that it is accessing), then if one read stalls, all other reads that it was GOING to do will also be postponed. I.e., imagine files locate on different media or schedulde differently by the disk subsystem.

Single best piece of advice I can give is: implement some mechanism that lets your code know when it has missed a deadline (i.e., when a FIFO has "run dry"). How you handle this event (a glitch in the data? abort the process? etc.) is a secondary issue. But, if a customer complains that the data coming out "looked all wrong", that flag can be a lifesaver in determining where your problem lies!

It also is the reason I suggested you provide command line arguments telling each "Program" how much to preload, how many channels to run, etc. If the flag ever sets, you can tell the customer to retry the experiment with different preload values, different number of channels, etc.

[Of course, you can also tell him to stop any *other* activities -- including anything happening on the network or with other daemons/services running on the machine -- that may be stealing real-time from your application!]

You might also try comp.arch.embedded for any folks over there who are doing similar things.

You should probably characterize the SD cards you expect to be used. Note that performance can change from "access to access" as the internal controller encounters bad blocks and has to attempt recovery (an access occuring at this time can be delayed; potentially causing you trouble if you had *expected* an access to take a particular amount of time).

Vote

D

Don Y 11 years ago

-------^^^^^^^^^^^^^^^

You do understand that you could have "gen" do all of the work for

*one* channel. Then, start *8* copies of gen -- each with a different fileN.dat argument (and channel number). A single supervisory program (that starts them -- perhaps just forking itself -- and then WAITING for ALL of them to signal their readiness to continue) then tells them *all* when to "begin".

Might also be prudent to ensure the blocks in each file.dat are physically contiguous so the OS isn't thrashing around on the medium to find the "next" block in each file...

Vote

L

Lasse Langwadt Christensen 11 years ago

I'd just make one program with a main task that implements a command interface and otherwise sleeps, and forks a second task that keeps up with filling the all the buffers when the main task tells it where to get data from and when to start, make an interrupt on buffers half empty to drive it

The standard uio driver does the heavy lifting of getting interrupts to userland via a a read() that blocks until an interrupt occurs

-Lasse

Vote

D

Don Y 11 years ago

If I was building a *device* to do this, there would be N instances of the "per channel" task/thread/process under the control of some "coordinator" (ensures all are "ready" before allowing any of them to "start", knows how to pause/resume them, ensure they shut down properly, etc.). I.e., there doesn't appear to be any difference between channels so why not have N instances of the *same* algorithm running? Also makes scaling easier.

OTOH, none of those instances should be "more equal" than the others. Hence the need for something to coordinate among them.

UI is a separate issue entirely -- how to implement that would depend on what the UX is supposed to be like...

Vote

J

John Larkin 11 years ago

Yes. Both ARM cores and the FPGA are all in the same chip, a ZYNQ SoC thing. We are designing the FIFOs and the signal processing ourselves. An ARM sees the head of the FIFO as a 64 Kbyte dummy address space, so it can request Linux file reads directly into the FIFO.

I.e., somewhere not present in

No, each has a DDS clock generator to set the FIFO unload rate. 500 KHz max, 1 mHz resolution.

I.e., how much *time* does 32KS

At 500K, it would take 65 millisec to empty one full FIFO. So occasional Linux overhead timeouts, a couple millisecs now and then, might not be fatal.

Can any number of channels be operative? I.e.,

All 8 running flat out, we'd have an aggregate rate of 8 Mbytes/sec. The Class 10 SD card, who stores the files, seems to peak at 10 or 12 Mbytes/sec on file reads, so it's dicey. Maybe we can use a U3 class card; we're not sure it will run any faster, but we'll try that soon.

Well, we have a 128G SD card. Some waveforms might run for days.

Class 10 micro SD card. It plugs into our microZed SBC board, which plugs into our application board. It will look a lot like this:

formatting link

but more parts.

That's bad. The FIFO hardware will freeze the waveform until more file data arrives, and set an error flag/red LED for the user. Any channel-channel time correlation is clearly lunched. If we can't guarantee enough bandwidth, we resort to user policy: don't exceed some total sample rate on all 8 channels.

gen is a command-line program. Users run it and expect a prompt. There are lots of other command-line programs he can run, too.

gen sets up playback sessions; the file demon runs them.

Sort of. Actually, gen blocks until the demon pre-fills the requested FIFO. Eventually, the user says GO and we turn things loose.

That's the bummer. There's really no data available on the realtime performance of this hardware+Linux, or how fast a c program can bang FPGA registers, or any numbers about file transfer rates. We have to instrument and measure all that ourselves.

With the ZYNQ 7020 chip, I can allocate 32K 16-bit ram to each of the

8 wavegen channels. That leaves a little leftover RAM for various DSP functions.

Swap? We don't swap!

Yes. We have an MT flag in the FIFO hardware, plus we can snoop the FREE registers, to see how close to dry we are running. We can oscilloscope probe interesting things, too, like FIFO accesses.

How you handle this event

Yup, big red LED, too.

John Larkin Highland Technology, Inc picosecond timing precision measurement jlarkin att highlandtechnology dott com http://www.highlandtechnology.com

Vote

J

John Larkin 11 years ago

No. It's a user command-line program thet has to terminate with a prompt or an error message.

gen -c b -mode play # set up wavegen B gen -c b -file waveform3.dat gen -c b -freq 500e3 -filt spl -mul x gen -c d .... # set up wavegen D... ... # more stuff

gen -c bd -go # enable B and D strobe # start outputs

Then, start *8* copies of gen -- each with a different

Can Linux create contiguous files? My guys are not sure.

John Larkin Highland Technology, Inc picosecond timing precision measurement jlarkin att highlandtechnology dott com http://www.highlandtechnology.com

Vote

L

Lasse Langwadt Christensen 11 years ago

gen could just still just send messages to a program that handles all the setup and forks a child for the fast fifo stuffing

it doesn't get slower or faster but it means it is a single program and everything can be shared

don't think it matters much, flash is random access

-Lasse

Vote

M

mroberds 11 years ago

Aha. From your other examples, the stuff I thought would be in a header is just command-line arguments to gen. There isn't really any structure in the data file to parse, so forget about my idea of putting a parser in both gen in the daemon.

Pass the file name and the offset from gen to the daemon, then. It just isn't that expensive for the daemon to do an open()/lseek(). If you're paranoid, maybe pass the modification time of the file from gen to the daemon as well; the daemon can do a stat() to see if the file has changed.

Also, as I understand it, gen is going to initially fill the FIFO, and then the daemon somehow gets a "start from here in the file" marker, but the daemon doesn't have to start filling the FIFO right then - that happens later when the user calls "strobe". In that case, the daemon has all the time in the world to seek to the right offset in the file,

*before* the tight timing requirements start.

If I understand it right, the "seek to offset X" only has to happen on the very first pass through the data file. Once the daemon hits the end of the data file, it starts reading again from offset 0 in the file. In other words, that seek to a weird offset only happens at setup time.

Fair enough.

If you have the disk space, you might want to ship some sinewave or squarewave data files, in some read-only directory. Maybe also a test script that uses those files to light up all the outputs at 1 MHz, or

1, 2, 3, 4, 5 MHz, etc.

Matt Roberds

Vote

M

Mark Zenier 11 years ago

Check out "Named Pipes" for the input to the daemon.

man 4 fifo mkfifo Name mknod p Name

Mark Zenier snipped-for-privacy@eskimo.com Googleproofaddress(account:mzenier provider:eskimo domain:com)

Vote

R

Ralph Barone 11 years ago

Each invocation of gen -c starts up a copy of your daemon, which fills its allocated buffer from the start of the file, and then blocks, because the buffer is full.

Your Go command then starts the hardware sucking away at the buffers, which get refilled in the background by your multiplicitous daemons. Where's the problem?

Vote

M

Martin Riddle 11 years ago

Heres some info on SD card write times.... You could extrapolate the read times from it. The Extreme cards are well over 10MBs

Cheers

Vote

L

Les Cargill 11 years ago

Have two programs - call them a and b.

a reads your little bit of the file, then dumps the rest out stdout.

Link them together with

%./a | ./b

It's not problem but it's a bit complicated for user space programs. See also "System V shared memory".

Les Cargill

Vote

D

Don Y 11 years ago

# gen -verbose -c b -mode play Gen v1.0 Service not found. Installing daemon (pid = 12312) Forking a copy for Channel B Channel B initialized.

# gen -verbose -c b -file waveform3.dat Gen v1.0 Service running (pid = 12312) Channel B configured to read waveform3.dat FIFO B loaded.

# gen -verbose -c b -freq 500e3 -filt spl -mul x Gen v1.0 Service running (pid = 12312) Channel B configured for 500KHz sample rate, yadayadayada

# gen -verbose -c d .... # set up wavegen D... Gen v1.0 Service running (pid = 12312) Forking a copy for Channel D Channel D ...

# .... # more stuff Gen v1.0 Service running (pid = 12312)

# strobe Gen v1.0 Service running (pid = 12312) Error: no channels enabled.

# gen -verbose -c bd -go # enable B and D Gen v1.0 Service running (pid = 12312) Channels B and D enabled

# strobe # start outputs Gen v1.0 Service running (pid = 12312) Machine started (channels BD active)

# kill -HUP 12312 # shut down the service

(or, use whatever special command you want to bring the machine to an ORDERLY shutdown)

Elsewhere, you comment that these are on an SD card. All bets are off. Note that the "data" can actually MOVE AROUND (physically) on the card even if you are not writing to it. I.e., the equivalent of reading a block (hundreds of bytes) of data from a file ON DISK and the disk drive deciding that the portion of the medium that held that piece of the file was flakey -- it had to employ "too much" error correction for "nominal performance" and has elected to take it upon itself to MOVE the data to another portion of the FLASH array.

"Have no fear! I will do this *for* you and, in theory, you will not know it is happening or *has* happened -- except for some occasional DELAYS in accessing the device..."

Depending on your (data) usage patterns, you might elect to interleave the data from every channel so *a* read gives you samples for ALL channels: sampleA[1] sampleB[1] sampleC[1] sampleD[1] sampleE[1] sampleF[1] sampleG[1] sampleH[1] sampleA[2] sampleB[2] sampleC[2] sampleD[2] sampleE[2] sampleF[2] sampleG[2] sampleH[2]

This ensures that data for all channels is available at the same rate: any successful read operation yields samples for all instead of JUST the channel that was present in the file.

If, however, each channel operates at a different rate, then this approach would SUCK (because you would be reading data from the card at a constant rate based on the needs of the highest bandwidth channel and having to "do something" with all those "extra" samples that you've been accumulating as a consequence of this interleave.

Why are *you* specifying the design of the code? What do "your guys" have to say about how it should be *structured* (then, YOU can add whatever "spin" you want on the design based on how you want it to *look* to the user)?

Vote

U

upsidedown 11 years ago

How big are your files ? Megabytes, Gigabytes or Terabytes ?

Do you have a 64 bit processor ?

With some combinations of those parameters, you could use memory mapped files.

Just allocate big 16 bit integer tables the size of the files. Map the files into virtual memory. Feeding the DAC is just copying an element from the table at a time i.e. an assignment statement DAC1 = Table1 [ i++ ] . No need for fread etc. The operating system virtual memory management will load the data from the file to the table automatically.

Loading data from disk may take a variable amount of time, so some form of "FIFO" would be required. In a memory mapped file, just make a memory read some hundred kilobytes ahead of the actual data use, which will prefetch the data into physical memory and is ready, when the DAC needs it. You only need to reference a single byte in a virtual memory page and the whole virtual memory page is loaded into physical memory.

If the multiple files are on the same physical drive, in order to optimize disk seek times, prefetch a few megabytes sequentially, before going to the next file,

Vote

U

upsidedown 11 years ago

Apparently this is a 32 bit processor and the files are large, so you can only allocate a window (hundreds of megabytes) into each file and move around that window along the file. Either use a single window and a FIFO or two windows to feed each DAC. In the two window case, the prefetch can be done into one window and the actual DAC reads from the other window. As soon as the DAC finishes with a window, it is moved ahead in the file and prefetched.

In a previous post, I assumed that a rotating disk is used and hence the need for seek time optimization, but since this is a SD, that wouldn't be needed. Of course a good match between processor virtual memory page and disk block size is an advantage.

Vote

Linux question

Join the Discussion

Didn't find your answer?