Linux question

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jan 22, 2015 8:08 AM

But, you have *8* of these running. So, you're really dealing with

8ms, effectively. Are you going to insist the user *dedicate* the machine to your application? Never try to access the SD card while it is running (i.e., delete or preprocess *any* of the files there)? Never start a new desktop session... or serve up web pages from the same box? Insert a "new" USB peripheral? etc.

Big, wired down buffers will allow you to *approach* the card's maximum performance. You should tailor your accesses to the block size of the device so you can get the most out of the controller

*in* the card. Perhaps even deferring accesses until you *know* you can get a complete block in the "next access".

If the OS is performing the read/write, remember that it is copying into it's local buffers before *you* see the actual data. E.g., if you request 37 bytes, chances are, it will read some default amount ("block" which will probably differ from the "block size" in the NAND FLASH on the card) and give you the first 37 bytes while holding onto the balance in its buffer. Your *next* request will first be satisfied from that buffer before the OS again turns to the physical device for any additional data required to satisfy your request (again buffering any "left overs")

If it runs for days, you don't have a problem. The problem comes when it wants to run "much quicker" (i.e., higher sample rate)

Ever *expect* to "enhance" the product by allowing files to be sourced from "magnetic (or SS) disk"? I.e., if so, you should reflect those future changes in your initial design.

Well, a freeze on *one* channel could cause the others to intentionally similarly freeze. Depends on how you *want* it to perform (i.e., maintain lock sync between all channels REGARDLESS... *or*, let each channel run however it can!)

Note that the possibility of different channels (and combinations of channels) "starving" can occur.

I'm not convinced that you can even make that guarantee. Your environment (OS) gives you none.

Think of it this way: you've got a BJT that can handle Icc of 20A.

*YOU* only need to accommodate a 15A load.

But, someone *else* is pulling 0-20A *while* you are trying to control

*your* load.

Granted, protection circuit can keep the Q from cooking itself when you

*two* guys aren't cooperating with each other. But, just because you were able to draw X amps *now* (before the clamp kicked in), doesn't mean you *will* be able to draw that X amps in the future! That "someone else" could elect to pull 19.8A at that unfortunate time... [The "someone else" may not even have direct control over how much he is pulling; his *load* may be dynamic and "decide for itself" that it needs to do something different, *now*, that results in greater power consumption]

An RTOS makes guarantees to its "users": "This operation will happen in *this* manner. COUNT on it!" An MTOS (Windows, Linux, etc.) just says: "I'll try my best to make this happen as 'good' as possible" (for *some* definition of "good")

My "network speakers" are essentially doing the same thing that you are -- except over much larger physical distances. Pulling "samples" off the network and reproducing them (audio) in sychronization with each other -- despite the large distances involved. (imagine running your hardware on ten different PC's and expecting them all to remain in lock sync with each other in different offices!)

I can do this because my RTOS lets me *know*, a priori, what level of performance to *expect* from it. It *will* deliver data at the required rate -- unless there is a failure in the network fabric, "noise" on the line (corrupts packets and the retries don't happen in the required amount of time, etc.).

[i.e., your SD card is my "file server PLUS network transport"]

As I have local intelligence on the receiving end of the link, when any of those clients (speakers) see that data is just not arriving "in time", they can shut down the audio cleanly (you definitely don't want the audio to sputter and pop as data trickles in sporadically while the system recovers).

[In my case, when the system recovers, you probably want the "playback" to resume. If "recorded content", then you can pick up from where you left off -- or, rewind a bit for some sense of continuity (remember it may not be *music* content so resuming in the middle of a spoken *word* is probably not as desirable as rewinding several seconds so the listener can recall what *was* being said leading up to the interruption. In *live* content, what's past is probably "past" (though even that can be negotiated)]

Gen can *be* the "file daemon" as well.

I.e., if you were building a single channel device, you could envision something like:

# gen file.dat 500KHz gobbledygook Gen v1.0 Done!

to configure that one channel and starting it.

Or, one command to configure and another to start. Or:

# gen file.dat 500KHz gobbledygook Gen v1.0 Configured for 500KHz. Loading file.dat; please wait... (Jeopardy theme song plays) File found. Press ENTER to begin synthesis...

Whether your configuration stuff results in actual tweeks to the hardware *as* each command is executed (i.e., specifying the sample rate causes gen to *immediately* tweek a PLL/divider)

*or* whether it causes those requirements to be *noted* somewhere (e.g., in a temp file) and *imposed* on the hardware the instant you type "go" is an implementation issue. The UX/UI doesn't really change. [E.g., a user might specify a file name and defer specification of the sample rate until a later time. Perhaps you only preload the FIFO with 10K samples if the sample rate is 50KHz instead of 500K? So, that action can be deferred until it absolutely *must* happen.]

But gen can sit there AFTER it has filled the FIFO and wait for the "command" to "go".

[Note that gen can detach itself from the controlling terminal and run in the background *as* a daemon. So, the user types "gen" and gets a command prompt *immediately* -- or, after gen has preloaded the FIFO. You have to get used to thinking about more than "sequential commands" as the machine can "keep something running" even while it is allowing you to specify *other* actions.]

Welcome to *my* world! :>

Yup. But, you have to instrument in all of the cases that you expect the system to operate!

E.g., how does performance change if the SD card encounters an error and has to remap a bad block? Or, if the user elects to defragment his hard disk while running your board? Or, starts an OpenOffice session so he can type up his observations on the "experiment" that your board is running? Or, something starts hammering on his network interface while your "app" is trying to keep up with the FIFOs? Or...

I.e., deciding to pull a bit of Icc while *you* are expecting to have a certain GUARANTEED level of collector current available for your load...

My point is to "preread" data from the SD card to further decouple the SD card's performance -- and the OS which acts as your intermediary -- from the REQUIREMENTS of your hardware (FIFO size + sample rate).

I.e., imagine you could read that entire 128GB card into "RAM" in the PC

*before* the user types "GO". Now, the speed of the SD card is not material to the REAL-TIME performance of your device. (it may ANNOY the user if he has to wait an hour for all that data to transfer but that's a separate issue).

By buffering data *in* the PC's memory, you enhance your operating margin wrt the SD card + OS performance. You've taken one piece of variability out of the equation (i.e., you've already got the data *off* the card... or, at last, have EFFECTIVELY enhanced the size of that 32KS FIFO by another 10K, 100K, 1MB, etc. -- whatever you can afford to set aside *in* the PC's memory space)

*You* don't have an option. The OS decides when/if swapping. (unless you literally eliminate the swap partition so there is no place *to* swap).

You want this to be VERY VISIBLE to the user! I.e., if he complains that the output waveform had "severe distortion... as if it had STALLED at points during playback", *he* wants to have SEEN a message on the console saying "buffer overrun on channel X" (whether he sees that while playback is occurring *or* after it has completed). I.e., you want his call to be "why did I get this message?" instead of "why did the output have so much distortion?". This saves you valuable steps in sorting out the problem.

"Is the big red LED blinking?" "Um, I didn't notice." "Can you rerun the experiment?" "Yeah, I just did. Now it SEEMS to be working..."

(because you have no performance guarantees, you also have no NONperformance guarantees! I.e., it can fail now... and run correctly for the next 100 invocations. Then, suddenly start choking, again. User's can't reliably tell you what is happening *inside* the OS, applications, SD card's controller, etc. You don't want to give them an excuse to blame your hardware or

*their* choice of OS!)

A friend has offered me an older Tek Logic Analyzer. Windows 98 based (!). You can bet the design either prevents the user from installing "foreign" (yet 100% valid) W98 applications as they could interfere with the performance of the instrument. *Or*, the design takes into consideration the lack of guarantees from the OS and operates *independant* of it!

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jan 22, 2015 8:11 AM

Excellent idea! I would further add samples (or a way to specify skews from the command line) that allow the user to verify the timing relationships *between* channels. I.e.,

"type the following commands. display and sync scope on the channel A output and examine channel B. You should see a square wave offset by 180 degrees from that of channel A (or whatever)"

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jan 22, 2015 3:16 PM

We call the FIFO mode "Playback." We do have another operating mode, "Wavetable." In Wavetable mode, the channel RAM is used as a real RAM, not a FIFO, and the wave gen makes repetitive waveforms by looping through an arbitrary region of RAM. In that mode, the user can specify a phase shift and change it on the fly.

That is a separately enumerated bag-o-worms. It's my job to define them both, invent the command line interfaces, and explain them both.

--

John Larkin         Highland Technology, Inc 
picosecond timing   laser drivers and controllers 

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jan 22, 2015 5:57 PM

Yes. But Matt's suggestion above allows you to provide the user with something akin to a scope's "calibrator" (check probe compensation) to quickly/easily verify the operation of your software *and* hardware by providing a known (set of) signals to the output(s) -- in "playback" mode. Something similar could also be pushed into the RAM to test "wavetable" mode.

I use a similar approach to verify the synchronization of the local oscillators between my network speakers. E.g., turn off the amplifiers in the nodes to be tested (so you don't *hear* the test signals and/or risk damaging the attached speakers) then push the test signals to each node and verify the proper phase (time) relationship between them.

[Of course, this requires the nodes to be physically proximate for the test (unless you've got 100' scope leads! :> )]

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jan 22, 2015 8:16 PM

Last box like this that we did, we had some demo waveform files and scripts that ran demos. The user just has to type demo3

That's easy. We'll do that, and put some scope pics in the manual to show them what to expect.

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 3:53 AM

My prime user is happy to have a slowish setup time; then he wants a playback session to start and run right. 500 Ksamples/sec is unusual; he generally acquires data slower. I shouldn't have trouble explaining the limitations.

We have done some experimenting with file transfer rate vs block size, and there seemed to be no speed advantage above 1KB per read. We will do more optimization.

We'll always transfer some fixed block size, 4K maybe, into the FIFO. It won't pass through DRAM; we'll request the file read directly into the FPGA. I don't know if Linux will do any caching or stuff that we can't see; I hope not.

Sure, but we can test it and see the real-world limits. And give the user some policies, like don't do file activity and expect max waveform output.

No. It's a command-line program that terminates.

We'll double-buffer most things in the FPGA, and actually install things like frequencies and "go" bits when the user says STROBE. That lets us start and change multi-channel things coherently.

File opens are asynchronous; they're just setup-time stuff.

No, gen does its thing and terminates. The user will invoke it again soon, or will invoke other commands. He'll wait for a command prompt (or an error response) before sending another command.

We have 512M of DRAM on the microZed.

--

John Larkin         Highland Technology, Inc 
picosecond timing   laser drivers and controllers 

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 3:54 AM

Nice, if confusing. We don't know if the controller in the ZYNQ can talk fast to the I and U type cards; we'll find out.

--

John Larkin         Highland Technology, Inc 
picosecond timing   laser drivers and controllers 

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 3:59 AM

I'm leaning towards flag words in shared memory. Zero prop delay.

--

John Larkin         Highland Technology, Inc 
picosecond timing   laser drivers and controllers 

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- M
- mroberds
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 4:40 AM

Yeah, that's part of what I was thinking - they can get some known outputs fast. The other part was related to the user having to convert the waveforms they want to use to the binary format, at least for Playback mode. If you ship a test file that is a 1 MHz sine wave, the user can look at that file and instantly tell whether you want signed/ unsigned, twos-complement/sign-magnitude, big-endian/little-endian, etc. Yes, all that stuff will be in the documentation, but nobody ever reads that. They'll just try to create a file and then get mad when it doesn't work; having a working example can help fix that situation.

Wavetable mode probably needs something similar. Maybe it can read from the same test file, just reading the file into RAM once, and then letting the wave generator loop over that part of RAM.

Matt Roberds

- L
- Lasse Langwadt Christensen
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 10:26 AM

Den fredag den 23. januar 2015 kl. 04.53.35 UTC+1 skrev John Larkin:

snip

A MicroZed has 1GB RAM

-Lasse

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 4:34 PM

Oh well, it's still not enought to run the OS and store the waveforms.

Software caching the waveforms through DRAM would probably slow things down.

--

John Larkin         Highland Technology, Inc 
picosecond timing   laser drivers and controllers 

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- D
- DecadentLinuxUserNumeroUno
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 7:05 PM

Use a low latency kernel.

I would use a pared down minimalist version of RedHawk.

and did you look over that youtube link I provided some days ago now (nobody ever responded)? That guy and a few others back in the days that keyboard was popular were finding ways of PIPING data to it all the time. Hell, you could use such a keyboard as part of your gearset for this application and have incremental status updates pasted onto its display, and even use it to start and stop the entire process. Or use the G19 version for a full vga resolution color screen version. Maybe harder to hack though.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 7:16 PM

Running 0.5 Msample/s with 2 bytes/sample and 8 channels is 8 MiB/s, so even with 512 MiB that would allow storage for 1 minute, which should be more than adequate as a prefetch buffer.

Are you using a single SD for both OS bootstrap and data files or do you have a separate fixed SD for the OS or a replaceable SD for data ?

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 8:19 PM

If we need to run very long files, and we can't keep up with the throughput requirements, adding more buffering just postpones the inevitable FIFO error. The 32K sample FIFOs in the FPGA are probably big enough.

Everything is in the SD card. When we're playing waveforms at high rates, we'll have to discourage the user from causing much extra file activity.

We just figured that we can shift the FIFO output forward and backwards in time, to tweak channel-channel timing alignment, without causing the universe to implode.

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- L
- Lasse Langwadt Christensen
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 11:09 PM

If not, allocating a few megabytes of memory and accessing that from the FPGA isn't that complicated

I use it to stream a framebuffer to a SPI lcd display,

formatting link

5 pins and a bit of RTL to get a display with capacitive touch that work with standard linux graphics

you could try to use an USB stick for the waveforms, USB sticks works out of the box on the Zynq and it might even be faster

-Lasse

- M
- mroberds
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Jan 23, 2015 11:27 PM

Maybe you could arrange it so that /home (or whichever directory the user usually dumps files in) gets remounted read-only when high-rate playback is happening. This doesn't help if the user causes a *read* of a bunch of data, but maybe it will slow them down a little.

Matt Roberds

- L
- Lasse Langwadt Christensen
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Jan 24, 2015 1:20 AM

ristensen:

.

s

en-panel-i2c-spi-serial

with standard linux graphics

I just tried it, seem SD/USB is about the same read speed ... root@linaro-alip:~# hdparm -t /dev/mmcblk0p2 /dev/sda

/dev/mmcblk0p2: Timing buffered disk reads: 56 MB in 3.10 seconds = 18.04 MB/sec

/dev/sda: Timing buffered disk reads: 56 MB in 3.00 seconds = 18.65 MB/sec root@linaro-alip:~# ... the interesting part is that it can run the test on both devices at the same time and get the same performance

-Lasse

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Jan 24, 2015 1:50 AM

The standard port from the ARM to the ZYNQ FPGA fabric is relatively slow. Using a "high performance" port would require us to build a DMA controller into the FPGA and write a corresponding driver for the Linux side. There are advocates for that, but I sure prefer to not go there.

If we use DMA, the file demon would have to read waveform data into DRAM, and then the FPGA DMA things would have to hoover it up. It might be faster to just do the file transfers directly into the FPGA, which is the current plan.

Interesting. It would stick out of the box.

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Jan 24, 2015 1:54 AM

Cool. What type of SD card was that? We got 10-12 MB/sec with a SanDisk class 10 card. I don't know if the ZYNQ hardware can talk to the U3 sorts of cards in their fast modes.

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- L
- Lasse Langwadt Christensen
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Jan 24, 2015 2:09 AM

Christensen:

en

in:

rms.

ings

,

he

rk with standard linux graphics

o

a ?

e

Sandisk ultra, calss 10 rated for 48MB/sec, I believe the Zynq is limited to 25MB/sec

-Lasse