High USB throughput requirement

navman · 2011-06-02T09:39:16+00:00

Hi,We are developing a project which requires a sustained USB transfer rate ofavg. 4Mbytes/sec. Basically a microcontroller has to read data from 16bithigh-speed ADC and send it to USB very quickly. The ADC converts a datavalue every 0.5usec, so microcontroller has to read this data (16bit) andsend it to USB, ie 2byte/0.5usec = 4MByte/sec. On the PC side also, this 4MB/sec data rate has to be read through andwritten to a file continuously. 1) Which is the best approach on hardware to achieve this? Any microcontroller suitable for achieving this? Almost all popular 32bit uCs likeLPC17xx, PIC32 etc have only a full speed USB (12Mbits/sec max). So it isok if this cannot be an on-chip USB solution. We can do with externalhigh-speed USB controllers such as FT232H. 2) What would be the best approach to handle this data in flow on the PCsoftware side (we plan to use or C#)? Your inputs would be most valuable in ensuring we have a working design thefirst time. --------------------------------------- Posted through

M

Mark Borgerson 15 years ago

Yes. It takes some work to get the Dxxx drivers working. In the long run it can be worth the effort. I've used them for a number of apps-- but usally at rates well under 1MB/sec---primarly due to limitations on the embedded systems end.

True Dat! I brought up that option without enough thought. If you don't have a big budget and a relaxed time line, stay away from writing specialized Windows drivers.

Agreed. At the embedded systems end, a few protos and tests are common, but not terribly expensive. At the windows end, a few prototype host programs are common and not that expensive. Writing custom drivers does get you into another, and more expensive, arena.

Mark Borgerson

Vote

U

upsidedown 15 years ago

100 MiB is about the maximum of continuous buffers that you can expect to be allocated from the virtual address space on a 32 bit Windows. The 2 GiB user address is fragmented by various DLLs loaded at some "nice" address boundaries and in some cases, the larger contiguous virtual memory areas are in the order of 100 MiB.

If you need more buffer space, you would have to allocate several separate buffers by several malloc() calls, each being less than 100 MiB.

Vote

U

upsidedown 15 years ago

If by 10M you are referring to 10base2/5/T, then that can not handle the OP's requirement for 4 MB/s = 32 Mbit/s (net) transfer requirement.

A dedicated 100baseT cable with dedicated Ethernet card on the PC with raw Ethernet frames (or UDP) should be able to do it, since the line loading is about 40 % with large frames (requires buffering at least

500 samples (1000 bytes) bytes at the source.

Vote

D

David Brown 15 years ago

I know Windows memory management has never been considered top of the class, but I really hope that this is completely wrong. /Physical/ memory will be badly fragmented, but /virtual/ memory space should not have the same problem.

Vote

D

David Brown 15 years ago

I found the Dxxx drivers and DLL themselves to be fine, and the documentation is not bad. But when I first used it, I tried using their Delphi interface wrapper (since that application was in Delphi). The wrapper code was terrible - full of global variables that limited you to a single connection (and this was for an FT2232C - with two ports!).

Another choice for the FTDI chips is to use libftdi - it is built on top of libusb, but made to be convenient to use with the FTDI devices. It is cross-platform, which is always nice.

No matter what your budget, stay away from Windows drivers. Your budget would be better spent elsewhere.

Another handy thing about using libusb is that you can use pyusb with it, and program in Python.

So your choices are using MS's device driver kits, learning about the low-level nasties of Windows, dealing with a dozen different versions of windows, getting your drivers signed, going through hoops to get your drivers installed or uninstalled, etc, etc,

or...

Install libusb-win32, python and pyusb. Connect to your usb device using a high-level interpreted and interactive language.

usb development is still easier on Linux, but libusb-win32 brings at least some of that ease to windows.

Vote

U

upsidedown 15 years ago

Unfortunately, the _virtual_ memory fragmentation can also be a problem, if different vendors link their DLLs to different virtual addresses and then leave lots of memory between their libraries.

If two DLLs are linked for the same virtual address, one has to be relocated to an other address each time the program using this DLL is started. This means that fix-ups has to be written into many memory locations in the DLL code, thus the fixed DLL must be private copy during run time and hence a private copy must be saved into the page file for each program using this DLL.

If the DLL is linked to a virtual address that does not cause address conflicts, the actual DLL file can be loaded directly into memory without fix-ups (and hence can be shared in memory by different programs) and there is no need to store the code pages into pagefile, since they are all available (and intact) in the original DLL file, which was of course the original idea for using shared loadable libraries.

Vote

D

D Yuniskis 15 years ago

Sorry, typo: 10M sb 100M (as the GB reference following should suggest). Point being that you can deploy in a wider variety of means than you could with USB (e.g., what happens when/if your USB cable needs to be

10 ft longer? or, when the host is across a continent?)

Vote

D

D Yuniskis 15 years ago

Great! So, if the OP's problem can use *your* sample code on *your* PC with whatever other applications are coexisting at that time, then his problem is solved! :> (i.e., your test tells him nothing about what he can expect in *his* application on *his* hardware, etc.)

I can pull ~91MB/s on my SB2000. Would the OP be better advised to use *my* hardware, instead? :-/

You don't know that. You don't know what PC hardware he is intending to use. You don't know what sort of medium that "file" he is expecting to write "continuously" resides on. You don't know what else is going on in the PC at the same time. etc. And, you don't know what the end user is likely to have happening when *he*/she runs the product!

Huge buffers give you elasticity to ride out periods of momentary overload. At the opposite end of the spectrum, imagine the OP having a *one* byte buffer -- from the USB device *and* to the filesystem. Consider how brittle this sort of environment would be.

Putting a PC (running *any* sort of "desktop OS") in this sort of real-time environment boils down to a crap-shoot; what are the consequences of dropping data? How often can you tolerate this sort of loss? Will you even be able to *detect* that you've lost data??

Desktop OS's make no guarantees on the services they provide. If that's a *Windows* PC, the problem is only worse!

Use a desktop machine (and OS) when you have *no* real-time constraints to deal with. Use it just as a pretty display. Otherwise, sooner or later, you are *going* to get bit when those non-guarantees aren't met. And, chances are, you'll just see some anomalous results and wonder what happened. You probably won't have a smoking gun to explain that data was dropped, etc. So, after a while of head-scratching, you'll write it off as a "fluke" (secretly hoping/praying that it never comes back again since you are clueless as to its cause).

I always find this sort of "analysis behavior" amusing... you've got a deterministic system (though you may not be aware of all of the issues that affect that determinism!) that has done something "wrong" (unexpected). How can you assume/hope it was "just a fluke" and that the circumstances that caused it won't reliably manifest as soon as you've *sold* it? :-/

Vote

A

Arlet Ottens 15 years ago

Of course the OP needs to verify it works on their PC. My test is just an indication that there's plenty of bandwidth available on a typical PC. So, instead of wasting time on buffering schemes, it's worth trying it in a very straightforward way by opening a file, and just write 512 byte blocks to it, measure the worst case latency for a day or so, and see if there's reason for concern.

All true, but we also have no reason to assume a 100MB buffer is going to fix any of that. If your average throughput is less than 4MB/sec, a large buffer will overflow just the same.

Huge buffers also means leaving less memory available to the system, and degrading performance for other users/processes. What if these huge buffers end up being paged out to disk ? In that case, you'll be transferring the same data to/from the disk several times. In that case, smaller buffers may be better.

Obviously, having a 1 byte USB transfer is going to be terribly inefficient. However, 512 bytes is the max packet size on USB. Maybe you'll need a few of those to guarantee back-to-back transfers, but beyond that, there's not much to be gained.

Detecting overflows is easy on the embedded device. If the PC doesn't get the data quickly enough, its internal buffers will overflow.

Agreed, but the OP wanted to use a PC.

Vote

G

George Neuner 15 years ago

Yes. Too many small buffers are a bad idea.

Throughput is increased but so is latency. The OS typically has a maximum burst transfer size and will move an overly large buffer in chunks. Since the OS typically is doing other things, any of these chunk transfers may block for longer than the actual transfer time.

The best you can hope for is to size your buffer to match the OS's burst transfer size for the target storage device. For a desktop OS that's typically Likewise, here, the larger the chunks, the more efficiently

Single and double buffering both are degenerate cases of FIFO.

Yes, bigger (or more) buffers build in elasticity to accommodate hiccups in the system.

George

Vote

G

George Neuner 15 years ago

Actually it's worse than that: a DLL may conflict with any allocated address space within the process - not just with another DLL. If the automatic (compiled in) base address is in use, the DLL will be rebased to an open address.

If multiple different applications load the same user DLL, each may have its own copy depending on its own local DLL link/load order.

Similarly, if a program loads user DLLs dynamically using LoadLibrary, then multiple instances of the program each may have a private copy of the loaded DLLs depending on the order of loading.

The heavily used system DLLs which are referenced by nearly every program have been carefully and separately based into kernel space above the 2GB limit so that they normally appear in the same location in every program. Lesser used system DLLs and user DLL are far more likely to be rebased.

64-bit Windows shares this insanity, but tries much harder to load DLLs only once. It randomly picks a free (high heap) address when loading a 64-bit user DLL. The vastly larger address space makes it much less likely that DLL loads will conflict.

George

Vote

D

D Yuniskis 15 years ago

Of course. OTOH, the OP hasn't qualified anything other than average data rate...

And you don't know what else is going on in the system -- either in userland *or* in the OS.

Sure. My point was you can trivialize a two-buffer scheme:

while (FOREVER) { while (!buffer1_full) { fill(buffer1) } while (!buffer2_full) { fill(buffer2) } }

with the obvious counterpart for the consumer.

The problem still remains, though: you have no way of knowing when your solution is "good enough". You can only identify instances where it will *fail*. And, the lack of a failure doesn't mean that it "works" (reliably).

Separate the real-time constraints from the non-real-time. Trying to deal with RT in a nonRT framework is a recipe for "Gee, I wonder what just happened there?" type bugs (with no "resolution") that lead you to throwing more and more horsepower at what *should* otherwise be a trivial task!

If you are forced to work in a desktop OS, then moving the data acquisition out of user land usually gives you a bit more control over how and when it all happens. You can wire down the buffers (instead of being vulnerable to faults), have *some* control over "scheduling" (even if it is at the ISR level), etc.

Vote

G

George Neuner 15 years ago

That's true, but it helps to know whether the task is soft or hard RT. Plenty of OSes can handle soft RT. Using this thread as example, even if the data acquisition itself must be HRT, adequate buffering can turn transmission and spooling it to storage into SRT tasks.

IMO its all about controlling the environment.

Back in the 90's I was building HRT (as in drop dead if you miss) machine vision on Windows NT whilst simultaneously running a user operation GUI. Running a RT priority thread, *desktop* NT could do

Vote

D

David Brown 15 years ago

I remember trying to get a high-speed timer event on a Win3.11 system long ago. Standard timers would not work faster than about 50 ms, IIRC, no matter how low a millisecond timeout value you used. The easiest solution was to play a small video with Windows Media Player at the same time as the program was running - WMP used the hidden API to increase the timer resolution! But a slightly more reliable method was to open a comms port at 9600 without connecting it to anything, and write out data regularly, capturing the "transmitted" notify message...

Vote

D

Dombo 15 years ago

Indeed

True.

If the goal is just to decrease the chance of data loss this strategy is fine. However if no data is to be lost under any circumstance, this strategy only relaxes the timing requirements at a certain level, but there is still a hard deadline that needs to be met to avoid that the buffer will ever overflow.

Questionable considering that the cycle time of a 100MHz Pentium is already 10ns and quite a few instructions take more than a cycle (and ignoring latency of a cache miss or a miss predicted branch). My personal experience with Windows NT on a 180Mhz Pentium Pro was that the interrupt latency of a kernel mode driver was typically about 25us, but under extreme load it could exceed 1ms. At the time we saw also that things like video cards could do really nasty things on the PCI bus that badly screwed up timing.

If embedded NT is anything like Windows XP embedded I doubt that that is true. It is exactly the same OS, but allows you to configure more precisely which parts get included in the image and which parts not. If it appears to be more predictable it can only be because less processes/services are competing for CPU time.

Real-time extensions for Windows exist for a reason, and even with those extensions it is still hard to guarantee that you always will meet your deadlines. There is just too many things (not just software!) in a PC based platform that can bite you that you do not really control.

In Windows you can create a kernel mode driver that can do just about anything it likes. Of course the API it has to deal with is quite different that the API for applications

With increasing system complexity it it becomes increasingly difficult to give hard guarantees. Many systems are simply too complex and have too many unknowns to calculate the worst case scenario. The best you can do is often do many measurements under worst case conditions, and makes sure you have plenty of margin. As long as no humans are at risk that might be just good enough.

Vote

D

Dombo 15 years ago

Those were the multimedia timers, which still exist in current versions of Windows. The good thing was that they were very accurate, the bad thing was that you were allowed to make only a very few OS calls in the handler as it was essentailly a ISR on Windows 3.1. On later version of the Windows this changed, and the accuracy became significantly worse as well.

The standard timer just posts a message to the windows queue, on Windows

3.1 with its cooperative multitasking that message had to wait for any other message posted before it for any other window to be handled. The timer message could sit a really long time in the queue before it was handled; the frequency you set was a best case (idle system) only. It was certainly not good enough for for example playing a MIDI song, for that multimedia timers were invented.

Vote

D

D Yuniskis 15 years ago

Sure. And, the consequences of missed deadlines. E.g., can you "re-request missed data" or is that data *gone* forever.

Hence the idea to implement "big enough" (whatever that means) buffers in a driver where you have better guarantees on system behavior, etc. It still doesn't give you the tools you need to decide when to

*abandon* a deadline, etc. (e.g., when the "value" has decreased to the point where it becomes an increasing *cost*!)

Yes, but you don't usually have much control over that in a desktop world. Can you prevent the user from launching another process/application? *Installing* a (rogue) application? etc.

Thankfully, most of the devices I've designed were "closed" systems. You (I) know what the hardware *can* do (when it is functioning properly) and what your software's demands are -- along with the external environment. Makes coming up with a "closed form" (pun intended) solution much easier.

One of the systems I am designing currently is *open* (third party extensible). The added effort to design the RTOS to support (potentially malignant) application extensions is on a par with the design of the entire system itself *without* those extensions! I've had to implement ledgers on damn near all resources -- time, space, bandwidth, power, etc. -- just to guarantee that The System works regardless of the malevolence (or IGNORANCE) or any add-on applications.

[i.e., it is very clear to the user that the add-on application is misbehaving and not The System -- because The System keeps operating properly... the "add-on" shows all the signs of stress!]

I still think trying to "back port" an RTOS onto a system that was inherently designed to be NON-RT is a disaster waiting to happen. You just do things differently when you want to be deterministic/predictable -- instead of just "typical".

Vote

D

D Yuniskis 15 years ago

Ha, yes! I used this same approach on an event driven MTOS (where the jiffy didn't drive the scheduler) for similar results. In my case, there was real traffic present so it was a "free" parasitic process.

Vote

J

Jon Kirwan 15 years ago

David, that's because the 8253 (8254 on PC/AT) timer that was used turned the standard 4.772728MHz chip clock, divided by 4 into a 1.193182 MHz input, by 65536 providing a (roughly)

18.2Hz interval interrupt event that Windows used.

There was nothing much you could do, using system calls, to improve that because all they supported was counting interrupt events.

I did this kind of stuff all the time under both DOS and Windows, though. Either latch the counter values to get better precision but be forced into polling, or else not infrequently, simply reprogram the timer chip to provide faster interrupts and just update the usual Windows/DOS software so that it knew things were still happening and could still let _your_ programs hooked to the traditional calls get some time on the silly schedules that the standard calls allowed for you.

You just needed to dig deeper. You could have written your own VxD and gotten there just fine like the rest of us had to do.

Jon

Vote

U

upsidedown 15 years ago

The floppy driver was problematic as well as some ethernet chips. Some video cards hijacked the PCI bus against the PCI specifications. By specifying which VGA, Ethernet cards are supported, the timing could be quite predictable.

Vote

High USB throughput requirement

Join the Discussion

Didn't find your answer?