400 Mb/s ADC

- J
- Jeff Peterson
  
  Contact options for registered users
posted
20 years ago

Wed, Nov 19, 2003 3:15 PM

We are building a new radio telescope called PAST

formatting link

which we will install at the South Pole or in Western China.

To make this work, will need to sample (6 to 8 bit precision) dozens of analog voltages at 400 Msample/sec and feed these data streams into PCs. One PC per sampler.

The flash ADCs we need are available (Maxim), but we are finding it difficult to get the data into the PC.

One simple way would be to use SCSI ultra640, but so far I have not found any 640 adapters on the market. Is any 640 adapter available? anything coming soon?

or we could go right into a PCI-X bus. has anyone out there done this at 400 Mb/s? is this hard to do? FPGA core liscense for this seems expensive ($9K), with no guarentee of 400 mByte rates.

is there a better way?

thanks

-Jeff Peterson

- N
- Nik Simpson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Nov 19, 2003 3:51 PM

How big is a sample?

Not clear from this whether you mean Mbit/s or MBytes/sec. If you mean Mbit/s then obviously that's not a hard problem to solve. If as I suspect you do mean Mbytes/sec then a PC (by the conventional definition) isn't going to cut it because typical PC motherboards don't support PCI-X at any frequency, they are still limited to 33MHz/32bit PCI which just isn't good enough.

So the first step will be identifying a motherboard (probably with a workstation or server classification) that supports PCI-X at least 100MHz, which gives a peak theoretical throughput of 800MB/s, but a sustain probably closer to 400MB/s. Then you need to define what you are doing with the data, for example you could be:

Just capturing the data performing some operation on it, storing the results and throwing away the sample
You might be actually planning to capture to disk 400MB/s for a sustained period which has some pretty hairy implications for storage capacity.

I do know of one site doing something on a similar scale, and that's a US Airforce project called Starfire Optical Range

formatting link

at Kirkland AFB. I don't believe this project is heavily classified (I certainly didn't have to sign anything before helping them on the storage subsystem in 2000) so it might be worth contacting them to see if they can help you spec out a system.

--
Nik Simpson

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Nov 19, 2003 4:02 PM

Do you really need to transfer raw data? Can you do some front-end processing to bring the speed down? If answers are 'yes' and 'no' respectively, think of repacking. You can pack 8 bytes into one 64-bit word. This brings the speed down to 50 MW/s, which should fit into regular 64/66 PCI.

/Mikhail

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Nov 19, 2003 6:03 PM

The first thing I would think of would be to buffer it and then read it later. You don't say how long this data stream will be, or if this is a peak rate with a much lower average rate. Does it have to go to disk at that rate?

You could collect the samples into 64 bit words and write then into SDRAM at 40 or 50 MHz.

If it has to go to disk at that rate, I would work on the hardware to get it onto the disk without a processor in between.

-- glen

- J
- Jeff Peterson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Nov 19, 2003 6:50 PM

8 bits.

i do mean 400Mbytes/sec. and yeah, PCI 33/32 wont cut it.

we accumualte averages (of cross products of fourier tranforms)

we wont store the raw data, just a very much reduced set.

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Nov 19, 2003 7:27 PM

What are you going to be using for calculations? If you are planning on using DSP cards, then you don't need to go through PCI. You coud transfer data directly from your A/D card to a DSP card using for example FPDP interface.

There is a number of companies who do similar things for radar and sonar. Look at the products by Pentek, ICS, Gage, etc...

BTW, I think this discussion drifted away from the FPGA topic, so it probably doesn't belong here...

/Mikhail

- N
- Nik Simpson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Nov 19, 2003 7:52 PM

So the basic problem is getting 400MB/s of data into memory and processing it, but are you reading 400MB every second, or sampling say once every ten seconds. If it's every second, then you've got a bigger problem because I'd be surprised if you can process it fast enough to get the job done before the next sample comes along.

So disk output bandwidth is not going to be a problem, what you are looking for is a way of getting 400MB/s of data into memory for post-processing, correct. Is it possible to break-up the input stream, so for example instead of reading a single stream of 400MB/s, you've five devices reading 80MB/s in parralel? Is the design of the device capturing the data set in stone or can it be "parrallelized" if so it would make the problem much simpler and any solution more scalable and less expensive.

--
Nik Simpson

- J
- Jeff Peterson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 12:30 AM

we will take about 64K samples, then can pause while processing... however all the time we are pausing we are losing data. so we do want to keep the duty cycle up. 50% dudty cyle is not a problem. 5% would be.

this could work. for example we have considered using 2 x scsi 320 interfaces. might work but its a bit of a kludge, and if we got the two interfaces out of sync we would have a real mess.

- J
- Jeff Peterson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 12:38 AM

i dont believe we can reduce the data rate by pre-processing.

yes, repacking might allow a 64/66 PCI to accept the data. i worry that we will spend lots of time and money, but the margin will be insufficient for it to actually work. i have heard that some PCI cores are not too efficient.

-Jeff

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 1:02 AM

Jeff, I think you need to do the 'we accumualte averages (of cross products of fourier tranforms)' in your FPGA. That way you dramatically reduce your data rate down to something sensible. It sounds tricky, but not as tricky as getting 400M x 8 bits x 50% duty cycle = 1.6 Gbps into a PC and processing it! cheers, Syms.

- N
- Nik Simpson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 1:09 AM

From my limited understanding of FFTs the actual processing should be something that could easily be multi-threaded and would see pretty close to linear scalability with additional CPUs, so at the very least an SMP system with at least 2-4 CPUs would help, and assuming it's 64bit floating point then a 64bit CPU like Opteron or Itanium might come in handy. If the idea of multiple data streams is possible (and the synchronization problem can be overcome) then if workload does scale well with CPUs, a cluster of low-cost single CPU systems each processing part of the data stream would be worth looking at as this could be easily scaled, i.e. five systems each handling an 80MB/s stream might be cheaper and faster than one big system trying to crunch 400MB/s. Additionally, if designed this way, then you could add additional systems in order to increase the duty cycle, i.e. 10 systems handing 40MB/s could be relatively cheap and would have roughly 2x the duty cycle of the original 5 systems.

Is there any way to insert synch markers in the data stream so that the problem of data streams getting out of sync can be handled?

--
Nik Simpson

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 5:06 AM

Spend money and time on what? With regards to PCI, I am pretty sure it will work. You can ask PCI crowd on the PCI mailing list

formatting link

they will tell you for sure.And it doesn't have to be a core, you could use an industry proven silicon, e.g. from PLX. I would be more worried about processing all this data in your PC. I don't think any PC can do FFT's while keeping up with such a data flow. Let's say you want to do 1024 point FFT. At 400 MSPS it will take only 2.56 us to accumulate a new block of data. The latest and greatest ADI ADSP-TS201S can do a 1024-point complex FFT time in 16.8 microseconds. I doubt any of the Intel chips can do it faster. AFAIK, TI DSP's aren't faster either. So, in my opinion you will either need an array of fast DSP's or some sort of FPGA based processing. Trying to do this kind of processing in host doesn't sound feasible to me.

/Mikhail

- L
- Larry Doolittle
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 5:29 AM

As stated elsethread, if you give up trying to get this throughput on a conventional PC platform, you probably can do this on a "big enough" FPGA. From your memory needs alone (64K x 6 x some overhead in which to do your FFT) you're probably looking north of an XC2V2000, and the single chip price is measured in the thousands of US$. For the c.a.f group to estimate with any precision the smallest practical part, you need to do things like compute the number of bits precision you need for your butterflies. The 96 18x18 multipliers on an XC2V3000 would come in real handy, especially if they didn't need to be cascaded for more precision. If you can make your design work at 200 MS/s (DDR), Even 32 multipliers would let you run the FFT as fast as data points stream in -- although that would also require 16 x 64K x 18 bits storage, out of reach for the current Xilinx offerings at least.

I know who I'd ask first for help (ahem-ray-cough).

- Larry

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 8:57 AM

You should definitely talk to High Energy Physics People. Like the STAR experiment at BNL or ALICE at CERN. Talk to the data aquisition and Level 3 Trigger people there. You probably can just buy boards with fast links and DSPs from them.

If you want to design it yourself, here are some comments:

1) If you use a busmaster device you and you want to read data with 50% duty cycle you can buffer the events in your readout board and reduce the data rate to 200MByte/s. You add one event of latency. 2) The fastest slots on a PC Mainboard are the memory expansion slots. It's an easy to design hardware interface and if you use a server mainboard with multiple memory channels you get a hell lot of bandwidth. I remember seeing a cryptoaccelerator on a DIMM somewhere and SUN used to place graphics boards in memory slots.

If your political environment is similar to high energy physics, than if you can reduce the duty cycle it does not really matter how expensive the readout boards are. With a large FPGA on a PCI board you can try to perform all computations on the board and achieve a 100% duty cycle.

Kolja Sulimma

- M
- Maxim S. Shatskih
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 9:07 AM

...and forget Windows support. Only the specially hacked Linux will be your friend.

Sorry? Sun used S-Bus for them, which is not memory slot.

-- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation snipped-for-privacy@storagecraft.com

formatting link

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 3:32 PM

???? The need to write their own driver anyway.

I do not know much about windows driver programming, but it should be possible for a driver developer to map arbitrary physical address ranges to user space. You need chipset specific code to enable access to the dimm after boot, because it must start disabled to prevent windows from using the memory. But as they use the board only in a single setup, this is no problem at all. Anyway, an experiment of that type is likely to use an real time OS anyway, neither windows nor plain vanilla linux. Maybe OS9 or VxWorks.

They did, but they also had UMA archtiectures based on DIMMS.

Kolja Sulimma

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 4:59 PM

Why dont you get an AGP Graphics processor, and try to connect your ADCs to the GPU Memory Bus. Run a PCI card for graphics on the PC.

The GPUs are programmable , so you might even be able to do some processing inside...

Since you only need 400 MSamples/S, you could live with the Maxims.

If you want to get some real speed, then maybe something like the Atmel TS8308500 (500 Mspl/s), TS8388B (1 Gspl/s) or TS83102G0B (Gspl/s) could be of interest. Going up to Giga Samples per second, would make your problem worse though :-)

formatting link

--
Best Regards
Ulf at atmel dot com
These comments are intended to be my own opinion and they
may, or may not be shared by my employer, Atmel Sweden.

- M
- Maxim S. Shatskih
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 7:36 PM

Easier! Just add /MAXMEM to Windows's BOOT.INI, and it will skip some of the BIOS reported memory. So, for the second sight, the think looks easier.

Surely.

-- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation snipped-for-privacy@storagecraft.com

formatting link

- N
- Nik Simpson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 8:28 PM

The trick is knowing which physical memory slots are affected by the BOOT.INI statement. An alternative is simply to grab physical memory address space for a device driver during the boot sequence and lock Windows out of it, DataCore uses that approach for it's cache in SANsymphony.

--
Nik Simpson

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Nov 20, 2003 8:34 PM

On a sunny day (Thu, 20 Nov 2003 00:06:40 -0500) it happened "MM" wrote in :

A little while ago in sci.crypt there was some talk about the first optical processor. Basically this is an LED array with multipliers that can do 125 million complex

128 point FFT or 500000 DFT 16 K size per second.

formatting link

The thing itself is a normal DSP with the optical array (you can buy that separately too). Normal logic, if you interfaced a FPGA you could go faster perhaps, those gallium arsenide LEDS switch at 20 GHz... No idea what it costs, perhaps less then you think. Download the datasheet .pdf, maybe it is of use... JP