embedded gigabit ethernet

- S
- Steve at fivetrees
  
  Contact options for registered users
posted
18 years ago

Tue, Aug 16, 2005 12:20 AM

I have a new project coming up that calls for a data acquisition board (wot I have to design) to deliver said data over ethernet at a minimum of

6.25Mbytes/second (not including packetisation/checksum/TCP/protocol overheads). I confess to some trepidation.

Now, I'm an old hand at comms and microprocessory. However this is a couple of orders of bignitude larger, bandwidth-wise, than I'm used to. I've dealt with various CPU families, but am currently most at home with the H8 and H8S. Which may or may not be relevant - I'm expecting to have to switch to a different architecture. No worries.

I'm also expecting to use a gigabit ethernet device with as far as possible zero-copy DMA capabilities. Raw data will be acquired over a bus (possibly the raw CPU bus, but non-DMA) from several hardware sources with relatively little overhead.

I have two questions: - Which CPU family should I be looking at? Maybe ARM, or PowerPC? - Which TCP/IP stack should I be looking at?

All suggestions (preferably hygienic) gratefully considered.

Steve

formatting link

- B
- Bryan Hackney
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Aug 16, 2005 3:33 AM

Bandwidth woes noted. Too much of just about anything good is impossible.

I use the Axis ETRAX100LX, which is getting a little musty. Getting about

7 Mb/s with full TCP (p2p) is about max for this thing. Maybe a little bit more - approaching 8 Mb/s.

It has 100 mbit ethernet with DMA. It is a 100 MHz core. I say it's getting a little musty because I think they should have boosted their core speed by now, but other than that, it has been a win.

It's Linux, and their own architecture.

- J
- Jouko Holopainen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Aug 16, 2005 4:05 AM

Look at Intel XScale chips. Some of them include gigabit ethernet on chip (does not include phy). They have "microengines" capable of unloading the TCP/IP overhead from main CPU.

--
  @jhol

KK (Boogiteorian alkeet / Juice Leskinen Grand Slam)

- R
- Richard H.
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Aug 16, 2005 4:07 AM

Ouch. 50Mbps? Sounds like some tight code. Plausible with 100Base-T, but Gig-E will afford a lot more headroom. I'd suggest documenting a lot of assumptions (caveats) in your scope of work.

Foremost, getting this level of performance out of TCP will prove to be non-trivial in terms of protocol tuning and buffer RAM. (You'll have to buffer for TCP retransmission, which will affect the efficiency of your copying to the Ethernet NIC.)

The amount of protocol tuning and buffer RAM increases with the round-trip time between the two network devices, and is also heavily limited by the receiving device's TCP tuning. Seriously consider if the application can tolerate any data loss, and whether UDP might fit the bill better. It could have a significant impact on your specs.

Cheers, Richard

- J
- Jeremy Bentham
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Aug 16, 2005 11:23 AM

For raw speed, a DSP might be best. We're currently doing a streaming application for a customer using a TI 32C6204 and 100baseT, and it is really quick.

We didn't use gigabit Ethernet as we thought it'd be a major increase in complication for a negligible increase in the overall throughput. Last time I did any tests, the main bottleneck was actually in the data receiver; most PC-type systems are designed around peak-and-trough Internet traffic, not continuous data capture.

The DSP should be able to transmit over 50 Mb/s continuously, but to handle this we're having to use UDP with streaming acknowledgments; essentially a souped-up TFTP, with backwards-compatibility for the occasional slow transfers.

We'll be benchmarking in a few weeks; I can let you know the results.

Jeremy Bentham Iosoft Ltd.

- B
- Bryan Hackney
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Aug 20, 2005 2:51 PM

[...]

Please let me (us) know what you decide on. This is on the edge of experience for many if not all small systems developers. Thanks.

- J
- jro
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Aug 20, 2005 5:43 PM

I, too, have been tasked with a similar project for work recently. I need to be able to stream data that will be produced at a bandwidth in excess of 200 Mb/s (from an A/D converter). This task is primarily a transimssion task (minimal reception required; certainly nothing high-speed on the reception side). Essentially, just grab the data, packetize it up into a TCP packet, and dump it out ASAP. The other end of the pipe can essentially be thought of as a top of the line PC with oodles of RAM/disk storage to support the data coming in.

While still evaluating possibilities, my current plan is to use a Virtex4 FX12 FPGA which has an embedded PowerPC core already in the chip and an integrated tri-mode ethernet MAC hard core also integrated into the chip. Xilinx also has their Gigabit System Reference Design (can be found at:

formatting link

but it is currently setup to support only a few eval boards (and is a bit much for what I am looking to do). However, Xilinx just released something much closer to my ideal solution in the form of an app-note called "Minimal Footprint Tri-Mode Ethernet MAC Processing Engine", which can be found here:

formatting link

This setup uses uIP, a scaled down TCP/IP stack by Adam Dunkels, integrated to run on an UltraController 2 (which is essentially a utilization of the PowerPC core in the Virtex4 that requires no external memory; 16K of program memory and 16K of data memory in the form of BRAM are utilized right on the FPGA fabric to hold code/data). Anyway, the app note shows an example webserver running on the ML403 eval board. I am hoping to play with this over the next few weeks to see what works and what doesn't work with this setup. I'll report back new stuff as I find it.

Yeah...there definitely isn't a ton of stuff out there right now for references on gigE in embedded stuff. Hopefully this will change with time (and requirements) change.

Regards, John Orlando

formatting link

- M
- Markus Zingg
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Aug 20, 2005 9:33 PM

You should not compare a webserver reading mostly static data to your data aquisition task with a 200MB/sec bandwidth.

Under TCP, you must be able to resend segments up to the so called flight size. That said every segment sent out which got not yet acked by the oponent can be lost on it's way, hence must be re-sent if the need arises. That's why TCP is reliable after all. With a 200MB/sec bandwidth the number of not yet acked segments will be impressive. uIP is a fine piece of code but it was designed with restricted embedded resources in mind and obviousely this is in contrast to the very high bandwidth your application requieres. So, unless you can "reproduce" your aquiered data there is no way around local buffering. How do you want to do this with only 16KB?

If you can afford to loose data, use UDP instead. If not, you will have to solve this problem, and I'm honestly interested to hear about your aproach.

Markus

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Aug 20, 2005 10:42 PM

Markus Zingg wrote: (and omitted attribution)

I have no idea what throughput he needs, but the burst speed of

200MB/S will already be a load. It might be possible to use error correcting coding with the packets and UDP, assuming he has a dedicated link. He should also carefully characterize the burst length needed and the overall throughput rate. It may be that the raw error rate with a dedicated link is small enough to allow simplification.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson

- S
- Steve at fivetrees
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Aug 21, 2005 12:40 AM

Certainly.

Dame fortune has bought me some time - client needs a lower-bandwidth ethernet board first. I shall bide my time and research.

Steve

formatting link

- S
- Steve at fivetrees
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Aug 21, 2005 12:42 AM

Not a million miles away from what I'm doing.

Very useful data, John. Thanks.

Steve

formatting link

- S
- Steve at fivetrees
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Aug 21, 2005 12:50 AM

Agreed. I said much the same to the customer - who fortunately has vast experience of doing this kind of thing (albeit not from an embedded POV). TCP is possible (and preferable) so long as there is a failure mechanism (think "degree of confidence in the link") which allows for failure to *not* result in a retransmission queue beyond available memory. Makes sense, esp. with a strictly local link.

Steve

formatting link

- J
- jro
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Aug 21, 2005 4:35 AM

Yes, UDP is certainly a candidate solution to the aforementioned problem. Since the embedded side will be communicating with a local PC (could literally be a crossover ethernet cable), I'm guessing that the few dropped packets that potentially could occur with UDP won't be too much of an issue (and the "end" application can deal with this in my case). I would agrue that though simple, the uIP implementation for UDP would certainly suffice for what I need (i.e., a 200 Megabit/sec data pipe). The way folks usually get large amounts of data through TCP is by playing with the ethernet frame size (making them larger than

1500 bytes...i've seen frame sizes up to 9000 bytes). Obviously with only 16K there isn't a ton of room to buffer up packets for TCP. I may still give it a go to see what the limits are practically.

As always, I appreciate the input.

Regards, John Orlando

formatting link

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Aug 21, 2005 11:36 AM

There seems to be two uses for TCP/IP, to handle corrupted frames and handle flow control (congestion). With a dedicated point to point link the frame corruption rate should be quite low. In a point to point link, the flow control problem is limited to the receiving PC and its ability _at_all_times_ process the data at the incoming rate.

If the PC can be made to reliably operate at the incoming data rate, I would suggest dropping the TCP and using UDP instead. I would suggest just using a simple UDP protocol with a serial number blindly transmitted from the embedded device.

However, first you should check that the receiving PC and all the application software is really capable of processing the data at that rate, even when the user is doing some kind of activities on the PC (if allowed during the actual capture). Using an other PC, generate the same rate of frames as the embedded system and check how the receiving PC performs.

In many real time applications you do not even expect to always get

100 % of the data, but it is extremely important to design from the beginning (and not just at a later stage when you discover that some data is dropped) what to do when the sample is missing (e.g. some sample specific signal quality indicator).

For statistical analysis, you could then discard those sampled that are not OK.

In a polled system, if a single sample can not be read due to an _intermittent_ communication failure, there is not much point in requesting a reread of the value, since the next poll will eventually bring a fresh value anyway. The only information that you need is that a sample is missing and do something sensible (e.g. hold the previous value instead of inserting 0).

In an audio system, if you get a missing sample, it might be a good idea to get the next few good samples and using some high order (>1) interpolation to calculate the missing sample. In a video system, it might be acceptable to repeat an old frame.

In a system aiming for 100 % reliability (such as TCP), the receiver will have to handle the normal average traffic and in addition to this also handle any backlogs due to communication failures, thus, the physical throughput should be well above the average required throughput to handle the backlog.

Anyway, the decision, does the customer need 100 % reliability or 100 % real time performance should be first checked. Usually the customer says that both are needed, but in that case the bill will be doubled or tripled due to the extra hardware required (e.g. multiple PCs), which usually will return the customer to square one.

After the customer really knows what he/she really wants (i.e. is ready to pay) can some sensible decisions about the protocol be made.

Paul

- R
- Richard H.
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Aug 21, 2005 6:25 PM

Paul,

Agreed, the combination of real-time streams and TCP is sketchy, especially when the data rate is very high. UDP is by far the simplest approach, but requires a tolerance for loss due to receiver congestion or damaged packets.

If lossless delivery is absolutely necessary, TCP is better than rolling your own protocol. But it requires a lot more resources on both ends, especially in the sender - several hundred kB of buffer for the sliding window, several MB to queue the input stream during congestion periods (low throughput), and enough CPU to manage all the buffering and drain the queue after a congestion backlog.

The performance of the receiver (or the latency between the devices) will also hugely affect the sender's buffer requirements. The TCP Window Scaling option will probably be mandatory, and jumbo Ethernet frames will be a great help if supported by all involved.

Richard