Which PCI core for Cyclone II board?

- B
- Brian McFarland
  
  Contact options for registered users
posted
17 years ago

Tue, Jul 18, 2006 6:55 PM

Does anyone have experience with more than one of the PCI cores out there? I'm working a PCI card that's still in the early stages of the design. I'm hoping to be able to do pretty much everything on a single Altera FPGA - most likely a Cyclone II device.

I've looked at the PCI Compiler from Altera. It seems very poorly documented, which I think will make the backend difficult to develop and complicated. The one from opencores.org looks like it's easier to use, better documented, and has the advantage of being LGPL. I haven't looked at the one from Eureka much.

Any comment's regarding PCI compliance, ease of use, compatability with Altera parts, etc?

- A
- Antti Lukats
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jul 18, 2006 7:19 PM

"Brian McFarland" schrieb im Newsbeitrag news: snipped-for-privacy@p79g2000cwp.googlegroups.com...

the easiest is the free PCI target from lattice, free download

I have used it on more than 5 different boards, altera and xilinx based it has worked almost always first time tried - just set pin constraints and thats it

there are other free alternatives as well, but the lattice pci target is the simplest

Antti

- M
- Mark McDougall
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 12:49 AM

I've used the opencores PCI core on an Altera Cyclone II (EP2C35) dev board.

No implementation/compatibility issues at all really. Nice if you have wishbone peripherals you want to hook up on the back end - very easy to get going. Had opencores DMA mastering over PCI too.

Performance is another issue. If you're looking to push PCI burst transfers to the limit, then it's not so straightforward. Not saying it can't be done, but it will take some effort.

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266

- K
- Karl
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 7:39 AM

The Altera PCI Compiler has an option to generate it as a Avalon peripheral, which then enables SOPC Builder to do the back-end work for you. This is as easy as it gets.

SOPC Builder Ready:

formatting link

SOPC Builder:

formatting link

Look in the PCI Compiler installation folder for the 'sopc_flow' and you will find an ready to use example.

Kind regards, Karl.

- A
- antonio bergnoli
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 7:56 AM

sorry antti, but do you mean that at lattice they have a downloadable pci ip retargettable to any fpga? (i.e rtl code?) if yes could put here the link?

Antti Lukats ha scritto:

- A
- Antti
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 8:18 AM

antonio bergnoli schrieb:

sure get it here

formatting link

100% verilog source included!

Antti

formatting link

- A
- antonio bergnoli
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 9:47 AM

Thanks!! and could you suggest a good staring poit to learn pci standard? better if for free..:-)

Antti ha scritto:

formatting link

- B
- Brian McFarland
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 1:19 PM

That's another thing i meant to ask about in my original post. Is there noticible performance difference between the different available cores?

This is my experience with anything PCI related, so I'm still not very clear on whether I can get away with a target, or if I'll need to mastering/DMA capabilities. I know that theoretically, the max tranfer rate of the bus is 133 MB/s w/ 33MHz systems. The intended customer of this thing is going for low cost, so I'm not going to assume that it will be used with a computer that supports 66Mhz or 64-bit transfers. Ideally, I would like to be able to guarantee 54 MB/s with pretty much equal I/O rates (27MB/s into and out of the device). Most of the time, rates should be lower than that, but just about the max I could ever need it to be. This card will probably be the only thing on the PCI bus of the PC it's plugged into, and the primary purpose of the PC will be to process the data to & from this device, so having moderately high CPU usage requirements is not much of a concern.

- E
- Eric Crabill
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 5:32 PM

Hi Brian,

An important consideration regarding the PCI bus (say, in a desktop PC) is that you generally cannot make any guarantees about bandwidth. There are too many things outside of your control -- the O/S, other devices using the bus, the performance capabilities of the host bridge device...

That "133 MBytes/sec for PCI32/33" is the physical bus bandwidth -- how much data could possibly fly over the bus. It is not an indicator of what your device may realize in a typical system.

If you need performance, and your host is going to be x86 based with a commodity chipset (certainly not "perfect"), you absolutely must include some kind of bus mastering capability. For all practical purposes, it will be very difficult to get any kind of data burst initiated by the CPU to your device target. For that reason, a target will not suffice unless you are building something that only needs a dribble of data -- a standard serial port, or an LED flasher. :)

Eric

- B
- bart
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 5:47 PM

I am no lawyer, but it my understanding that the Lattice reference design is intended for use on Lattice devices. The license agreement says something like: "for the sole purpose of programming Lattice programmable logic devices."

If you use the Lattice PCI reference design, it sounds like you should use the LatticeECP2 device or another Lattice FPGA, a list of which you can find here:

formatting link

H> > but do you mean that at lattice they have a downloadable pci ip

formatting link

- A
- Antti Lukats
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 6:04 PM

"bart" schrieb im Newsbeitrag news: snipped-for-privacy@p79g2000cwp.googlegroups.com...

Hi Bart,

well yes -- everyone should read the license of course.

I would gladly use that lattice pci core on lattice boards, and hopefully one day I will - but as I dont have any lattice-pci fpga boards so I have evaluated the lattice-pci core on different PCI boards.

[snip - self censoring, the deleted text goes to lattice in private]

all the use of it I have ever done is fpga board initial testing - nothing more.

I have never considered using it in any products based on non-Lattice silicon (because of the license)

but - the fact that this core is available - it brings some attention to Lattice if someone mentions it. I should have mentioned the license clause of course - but the link I provided did land on the license agreement and not direct download.

Antti

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 19, 2006 11:39 PM

The key idea is that most CPUs (or host bridges) only transfer one word (32 bits) per transaction when it is reading/writing to a PCI target.

Look at the timing diagrams for simple target transfers. How many cycles do they take?

54 is 40% of 133, so you have to do a whole transaction in 2 cycles. It just isn't going to happen.

My only one-word-per-transaction observation is several/many years old. Things might be better now. I wouldn't bet on it without seeing a nice picture on a scope.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.

- M
- Mark McDougall
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jul 20, 2006 12:34 AM

That's the raw rate based purely on the signaling. Once you add the PCI protocol on top of that, which includes arbitration phases etc, IIRC the maximum *theoretical* data transfer rate is more like 120MB/s.

Having said that, I've worked on a design which included an Altera PCI core and a DMA bus master transferring large chunks. In a *desktop* PC running Win2K, the *sustained* throughput was around 100MB/s. FWIW IIRC the same hardware under Linux didn't get much over 80MB/s, but that's another story.

What type of transfers are you looking at? Will it be PIO (single byte/word/dword) transfers? Initiated by the PC? Or DMA, initiated by the card? Is the data isochronous 27MB/s? Or can it be buffered and transferred periodically in large chunks?

Your answers to the above questions will determine the suitability or otherwise of any potential solution.

As Eric pointed out elsewhere in this thread, typically host PCI chipsets won't burst more than a single cache line. And even that requires attention to how you configure your PCI memory space.

Single host reads can be very inefficient, as often the target must disconnect whilst the data is fetched. In the meantime, the bus is free for other peripherals to grab. For example, with the opencores PCI core (for which *all* reads are posted), we're seeing reads disconnected

*twice* before the third succeeds, albeit on a non-intel platform where the host can be quite slow to retry.

In a nutshell, if you're bus-mastering DMA in reasonable chunks, then

54MB/s should be easily achievable. If not, then you need to characterise your transfer profile before I could comment any further.

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266

- B
- Brian McFarland
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Jul 21, 2006 5:11 PM

I just found out how much the liscense for the altera core costs. Considering that production quantity will be relatively low, I would like to avoid using it if this is possible / practicle. Does the opencore one do bus mastering well enough to acheive the kind of tranfer rates I'm hoping for?

Mark McDougall wrote:

It's an I/O interface that will be constantly receiving and transmitting something at 270 Mbps both directions using 8B/10B encoding. Which means potentially, we could want the card to transmit and receive 27 MB/s. However, in this particular application, rate of the real data will be closer to just 2MB/s. If there's a latency due to buffering & block tranfers, it's probably not a concern as long I can have large enough FIFOs on the FPGA that they never become empty while I'm filling the PC side buffer. The whole reason for this interface is to modify the input data stream and send it back out and the delay caused by CPU time is probably going to be considerably more

-- although i'm not sure how much processing it will take because our customers are writing the software that does it and I have no direct way to contanct their developers.

- M
- Mark McDougall
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jul 25, 2006 1:10 AM

It's a bit difficult to give an accurate answer with the above-mentioned "requirements specification". ;)

It's going to depend on how much latency you can tolerate. If you were able to wait for a few KB to be accumulated on each side before transferring, you'd have absolutely no problem achieving your 27MB/s in each direction. Of course, that introduces large delays in your stream.

OTOH if the application isn't tolerant to large latencies and, for example, you needed to do single 32-bit PIO transactions, then we've seen fetches from *memory* on the back-end of the PCI core take up to 20 PCI clocks to complete on the host (shave a few clocks off if your data is in a register or read-ahead FIFO, for example). That brings your throughput down to around 5MB/s - total!

There's a lot of latency introduced when pushing data through the PCI core FIFOs in each direction. Obviously if you can stream large chunks that latency becomes insignificant w.r.t. throughput. PCI retries on posted reads also add to the equation.

Nutshell - you need to work out *exactly* what latencies you can tolerate.

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266

- M
- Mark McDougall
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jul 25, 2006 1:18 AM

In that case my response would be that I don't have sufficient detail in the requirements to propose a solution.

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266

- B
- Brian McFarland
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jul 26, 2006 1:27 PM

Well I gave up on trying to find free ( and legal :-/ ) info about PCI online and ordered the mindshare PCI book. It hasn't arrived yet, but I began just writing my own PCI module. I was kinda hoping to be able to do this project w/o getting too deep into the specs of PCI, but I don't think that's going to happen.

Mark McDougall wrote:

- M
- Mark McDougall
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jul 27, 2006 12:55 AM

I'd suggest hooking up the opencores PCI core if you have available hardware just to get a feel for what's involved. Once you get the gist of how it hangs together it's really quite simple to hook up something to the back end. The DMA controller shouldn't be that difficult either (although I realise I'm speaking with the benefit of hindsight).

BTW I'd suggest you look into CDBG from probo.com when bringing up a PCI core.

From there, you could invest a little time in benchmarking your application. Even if you end up deciding that the opencores PCI core is not the way to go, you've no doubt (a) learned something about PCI and (b) established a peformance testbench for your final solution.

BTW the Mindshare book is certainly going to be a big help in ramping up on PCI.

IMHO, you're going to need bus-mastering DMA to get 54MB/s out of PCI, and that's a *lot* of effort to do from scratch! Just verifying the design is going to be a mammoth effort - take a look at the size of the testbench module in the opencores PCI design to get an idea!!!

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266

- J
- johnp
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jul 27, 2006 1:20 AM

If you decide to use CDBG from probo.com to play with your PCI design, you should note that the version on the web only works with Win98 and earlier.

I haven't released a Win2K++ version for free usage yet, but we do have one (and a Linux version as well) that are used internally.

CDBG has two modes of operation: - one mode has the tradition peek/poke commands - one mode is a C interpreter that lets you write C code without screwing around with DPMI, etc.

ALSO - I suspect it will be next to impossible to get 54MB/sec transfer rates without bus mastering.

John Providenza

Mark McDougall wrote:

- M
- Mark McDougall
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jul 27, 2006 2:40 AM

Yes, but if you're constantly re-configuring your FPGA w/PCI core, then you'll be re-booting constantly as well. The most time-efficient way I've found to do bring-up of a PCI core, or even back-end peripherals, is to have *DOS* booting off your HDD and run CDBG from there.

Yes, the C interpreter is quite nice for 'scripting' tests. I brought up the opencores IDE controller with opencores DMA and opencores PCI using C code to do both PIO and DMA IDE accesses.

The nice part was the fact that I could transcribe the C code almost line-for-line into Verilog for the equivalent HDL testbench routines.

And for further re-usability, I ended up using the OCIDE core in a NIOS-based design (no PCI), which allowed me then to use the original CDBG C code almost unchanged as test routines for that project too!

So thanks John, you've saved a lot of time for me at least!

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266