I'd like to share with you my last development - a L3 protocol for transmission of data between low resources FPGA and an embedded system.
It may allow you to use low cost/low resources FPGAs together with cheap Ethernet switches and simple Linux based embedded systems (or just reflashed Linux based Ethernet routers working as both a switch and an embedded system) to create data acquisition systems concentrating data from multiple channels.
Because the Ethernet link does not provide reliable transfer of packet, we need to assure an acknowledge mechanism (like in TCP). However if the transfer speed is R bytes/second, and the maximum acknowledge latency is T, then we need to buffer at least R*T bytes. The problem is that at transfer rates of 100Mb/s or 1Gb/s the latency must be really small if we want to use only internal RAM of FPGA for buffering. E.g. for 100Mb/s (roughly 10MB/s) and 40kB of internal RAM, we can accept latency of only 4ms. For 1GB/s it could be only 400µs.
To keep latency as low as possible, I've decided to implement my transmission as L3 protocol. As it is supposed to be used only in private networks, I have decided to use unregistered 0xfade Ethernet type.
The sources of both: FPGA part (designed for Xilinx SP601 board, but written with portability in mind) and of Linux driver (compiled with kernel 3.3.3, but should work also with other 3.x kernels) are published on the alt.usenet group. See the post: "L3 protocol for data transmission from low resource FPGA to Linux embedded system" The link to the Google archive is:
I'll publish newer versions of my sources also on my website (yet almost empty):
Generally the system is very simple. The kernel module may receive data from a few FPGA based "slave" devices (defined by the "max_slaves" parameters, when loading the module). Data received from each slave are written to the kernel buffer, which should be mmapped to the user application, which is supposed to preprocess the data and possibly send the result further through standard TCP/IP connections.
The system of course doesn't handle routing (due to low latency requirements). The FPGA even doesn't need to handle ARP protocol. In the user application we need simply to register a slave with particular MAC connected to the particular network interface, and this slave will be visible as /dev/l3_fpga0, /dev/l3_fpga1 and so on. In my sources there is a very simple application, which connects to the slave, and verifies, that this slave sends consecutive integers.
In system like this a nontrivial problem is to set the appropriate rate of packets sent from the slave to the embedded system. In my system it is achieved by introducing of certain delay between the data packets. This delay is adaptively adjusted during the operation of the system. The appropriate delay is found analysing the ratio between all sent packets and retransmitted packets: Nretr/Nall If the data packets are sent too quickly, the acknowledge packets from the embedded system are received too late, and the packet is retransmitted before acknowledge arrives. The same may occur if the embedded system is overloaded with packets from different slaves and drops some packets. Therefore if the Nretr/Nall is too high, we have to increase the delay. If the ratio Nretr/Nall is near to 0, we may reduce the delay. Such a simple algorithm works quite satisfactory.
The sources are published partly under the GPL licence (the Linux driver and user application) and partly under the BSD license (my FPGA sources). Sources of FPGA part contain also very slightly modified Ethernet MAC
Please note, that my sources are the first iteration. They use some "quick and dirty" solutions. I hope to prepare more mature version and describe it in a normal publication (of course I'll send information about it, as I've done with my sorter:
I hope that my solution will be useful for somebody.