bare-metal ZYNQ

- J
- John Larkin
  
  Contact options for registered users
posted
4 years ago

Wed, Jun 12, 2019 11:32 PM

Assume I'm a pointy-haired boss trying to help one of my guys.

I think that...

The Xilinx ZYNQ (FPGA+ARM on a chip) has a hard boot loader. It figures out what the boot device is (serial flash, SD card, whatever) and reads in a secondary boot program, which the Xilinx tools provide as part of a build. That loader then reads the entire FPGA config bitstream into DRAM, and sets up a giant DMA transfer to configure the FPGA. That's all standard in the tools.

But what if there's no DRAM? My guy thinks he will have to write his own ARM application, which is booted at load time, and inside that would be a routine to read from the boot media and configure the FPGA in chunks, using a small uP RAM buffer, maybe DMA or maybe not. He figures he could do that in a few days.

Seems to me that Xilinx should support booting up a ZYNQ without DRAM. Does the tool chain support that (people here think not) or is there some loader already coded somewhere?

(Our support, through a distributor, isn't very good.)

Thanks

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- G
- Gerhard Hoffmann
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 12:07 AM

Am 13.06.19 um 01:32 schrieb John Larkin:

A Zynq without RAM is like a car without tyres.

Maybe you can peek in the sources for the Red Pitaya; it is also based on a Zynq and there are a number of Linuxes available for it. I have one, but have avoided digging that deep.

If the loading interface is only remotely similar to other Xilinx FPGAs, it should be easy to replace the one large DMA xfer by many small ones.

regards, Gerhard

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 12:35 AM

There's enough of sram internal to the chip for many applications.

Yes, but I was thinking that someone has already done the work.

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 6:14 AM

On a sunny day (Wed, 12 Jun 2019 16:32:35 -0700) it happened John Larkin wrote in :

That thing runs Linux? Does not Linux use the DRAM?

If not using Linux and DRAM then a simpler cheaper FPGA board?

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 12:45 PM

Our Zynq 7 boards have DRAM (natch) but we choose to load the PL (FPGA fabric) after boot rather than as part of the FSBL.

What you want to do is possible, but you won't get much support from Xilinx. You will also lose some of the built-in security features. Considering the costs of DRAM (plus the extra PCB layers you will need) you might be better off putting it on the board. I'm assuming you don't have really large volumes, of course.

Allan

- L
- Lasse Langwadt Christensen
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 1:02 PM

torsdag den 13. juni 2019 kl. 01.34.11 UTC+2 skrev John Larkin:

formatting link

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 3:09 PM

I said "bare metal."

Separate FPGA and CPU chips is an option that we use a lot already, but it needs a chip-chip parallel interface that uses a lot of balls, or a slow SPI link.

The NXP uP that we usually use for this combo, LPC3250, looks to be EOL, so we're looking for a next-generation product platform.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 3:12 PM

Cool. I'll pass that on to the guys.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 3:31 PM

Consider NXP's i.mx RT family. They have Cortex-M7 cpus at about 600 MHz, and have quad SPI or octal SPI links. These are typically for flash for booting, but you could use one link for the flash and one for a high speed interface to the FPGA.

- M
- Michael Kellett
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 3:42 PM

May not be relevant right now but you could look at the ST M-7s as a step beyond the NXP part. STM32H7... 400Mhz, lots of RAM and flash on chip 64 bit FPU, and multi bit SPI ("dual quad SPI" in ST speak) which would ease that uPFPGA bottleneck a bit. They claim max 133 MHz clock on it so with 8 bit data that's quite quick. The downside is that the FPGA will need to pretend it's a flash memory but that may not be too hard.

MK

--
This email has been checked for viruses by AVG. 
https://www.avg.com

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 5:35 PM

On a sunny day (Thu, 13 Jun 2019 08:09:16 -0700) it happened John Larkin wrote in :

OK, just did a read of the 80 pages datasheet of the LPC3250. While reading I was thinking about the chip in the Raspberry pi Broadcom BCM2835 -- 2837 but that has no ADC.. but does have HDMI out.. There exists a FPGA plugin board for the Raspberry.

It is a pity that so many things go EOL in a short time, OTOH it is a throw away society. And very strong competition does kill some products.

It all depends on what you want to do.

A Raspberry plus some external ADC 35$ + ?? VERY powerful platform, really, GCC compiler, Linux, lots of I/O. USB, Ethernet, HDMI, analog video out, analog audio out, GPIO for extra boards... SDcard, camera interface, logic level serial, PWM, PLL frequency generators, and although every year a new model, the basics stay more or less the same, quadcore now, lots of DRAM, availability...

Depends on what you call 'bare metal' these days.

formatting link

I have several in use...

It is sort of moving to an ever higher level of integration.

- D
- Dimitrij Klingbeil
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Thu, Jun 13, 2019 10:05 PM

The chip-chip parallel interface is quickly becoming a chip-chip serial interface, now that most higher-end embedded CPUs have PCIe.

NXP i.MX series has many variants with PCIe. So do many DSPs from TI.

It looks like nowadays PCIe gets to be the go-to interface both between CPU and DSP and between CPU (or DSP) and FPGA. Few balls and high speed.

For CPU-DSP, the application CPU is the typically the root complex and the DSP(s) is(are) typically the endpoint(s). The endpoint side can send interrupt packets when it has data (or otherwise requires attention).

Regards Dimitrij

- R
- Richard Damon
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jun 14, 2019 2:52 AM

It has been awhile since I used that chip, but my memory was that what you are describing was the two stage boot loading process. There is a First Level Boot Loader put into the internal flash of the device that loads a program into the internal SRAM of the part from a limited selection of sources (mostly limited to what you could load from with a simple boot loader). This program is often just a Second Level Bootloader, but could also be a simple 'bare metal' program. The Second Level Bootloader generally had the ability to configure DRAM and load the program it was loading into it, but it did not need to.

The other task normally done by the Boot Loader was to load the configuration data into the FPGA, but that could also be put off till later.

When Booting to Linux, the Second Level Boot Loader actually just loaded GRUB, and then GRUB loaded Linux and started it. GRUB and Linux required DRAM, and much of the documentation assumes going to Linux, but the tools did support other configurations.

- M
- Michael Kellett
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jun 14, 2019 8:11 AM

Jan - do you know of a good, simple and fast way to get the Pi to exchange data with an adjacent chip (uP or FPGA). Using USB or Ethernet doesn't count as simple (or very fast for small data packets.)

MK

--
This email has been checked for viruses by AVG. 
https://www.avg.com

- T
- Theo
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jun 14, 2019 9:14 AM

Hmm... it's not the same, but on the Intel Cyclone V parts (and others I think) there's just a FIFO. You can push in bitstream words, and configuration only happens when the full bitstream is provided and it meets some kinds of checks.

The Zynq appears to drive such a process via DMA - the PCAP in chapter 6 here

formatting link

It doesn't say as much, but I wonder if it's possible to transfer in chunked DMA. The Linux driver probably has to chunk anyway, given the RAM buffer you want to transfer may not be in contiguous physical memory.

As to support for this in the tools, without DRAM you're probably running a custom OS, so there's a limit to what they can do.

On the Arria 10 one 'normal' boot process is: ROM bootloader reads SD card, starts u-boot, which writes FPGA bitstream then boots Linux. Now you mention it, I think u-boot must be running without DRAM because the DRAM pins are only configured by the bitstream. So it could be worth looking to see if a similar process works on Zynq.

(instead of SD card, QSPI and other storage is also selectable)

Theo

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jun 14, 2019 9:44 AM

On a sunny day (Fri, 14 Jun 2019 09:11:36 +0100) it happened Michael Kellett wrote in :

Sure, first for FPGA there is this:

formatting link

this connects via GPIO.

I notice a lot more big names have now FPGA stuff for raspberry.. Just google 'raspberry FPGA board;.

Depending on your definition of 'fast' with a micro, the Pi had logic level RS232 via /dev/ttyAMA0, also hardware SPI (or software SPI of course), i2c the same.

Here used as a a large LED matrix display driver:

formatting link

You can also use 8 bits from GPIO and do byte level transfers, a typical example of 'fast' is this:

formatting link

that also uses a FIFO hardware buffer to get a smooth timed data stream even during OS task switching.

8 bits (or more) transfer with handshake will work with most micros.

Here the Pi as JTAG programmer:

formatting link

Stepper motor driver, lots of other i2c chips..

formatting link

USB is slow on my older Raspberries at least, ethernet is OK. I would prefer ethernet in some applications because of the galvanic isolation.

What is simple? Everything is simple once you have dunnit.

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jun 14, 2019 4:54 PM

n

kin

PWM,

I looked at this some time ago with the original rPi and found there was so mething like a parallel, DMA driven port on the header. I don't recall if it was 4 bits or 8 bits, but it had potential for speed. I seem to recall many discouraged me from trying to use it so I didn't pursue the effort. I t may have been a lack of support for the DMA control in the software. I w ould have been using it to do timer based accesses.

Looking at the software available for the rPi at that time I was not encour aged to dig into the low level software. The Beagle Bone with the two dedi cated I/O processors would have been a better choice for the work I was con sidering. Probably still is.

I think the rPi is used for a lot of things where other single board comput ers would be better simply because that is the first box of salt on the she lf.

--

  Rick C. 

  - Get 1,000 miles of free Supercharging 
  - Tesla referral code - https://ts.la/richard11209

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jun 14, 2019 5:15 PM

I worked at a company where I designed a board with a Xilinx FPGA on it. T his company had a separate FPGA group under software. The software and FPG A people were trying to boot the FPGA and couldn't make it work. My boss t old me to go down and help them.

When I got there there was an FPGA guy who had been a disti FAE for Xilinx before this job, his newbie FPGA person, a much more senior FPGA consultant , a manager and a software guy writing the code to download the chip. I ha d to rather butt my way in to get them to talk to me rationally since there was a certain level of frustration at this point.

I told them there was a short list of things you had to do right to get the chip configured and there would be no symptoms to tell you what was wrong. So do those few things right and it will give you a DONE flag and work. We went through the list and they said they had tried all of those things.. . I asked if they had done them all at one time. They hadn't... so they d id them all together and it worked. I asked if that was everything they ne eded, they said yes so I left. The manager was flat out floored that I had fixed it so quickly. But anyone who had actually worked at the board leve l to configure a Xilinx part would have known this.

So the whole group, including the former Xilinx FAE had never actually brou ght up a board before.

Later the software guy made a huge stink because of the 100 kHz noise on th e output of the class-D amplifier. In reality his voice compression wasn't working and he never even tried running a simple audio output to test that everything other than the vocoder was working.

People who haven't done much find very simple things complicated.

--

  Rick C. 

  + Get 1,000 miles of free Supercharging 
  + Tesla referral code - https://ts.la/richard11209

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jun 14, 2019 5:49 PM

I tend to regard things I learned long ago as being simple, even when they aren't. I have to watch that tendency when explaining things.

Cheers

Phil Hobbs

(Who is working on how to explain things to a jury in 10 days or so.)

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC / Hobbs ElectroOptics 
Optics, Electro-optics, Photonics, Analog Electronics 
Briarcliff Manor NY 10510 

http://electrooptical.net 
http://hobbs-eo.com

- M
- Michael Kellett
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sat, Jun 15, 2019 12:14 PM

Thanks for the stuff Jan, I don't think I explained quite what I meant by fast (although I did say that Ethernet wasn't fast enough).

So fast for me, for the applications I have in mind is:

round trip < 1us (less than 50ns preferred) - easy to do with FPGA memory mapped to uP and pretending to be a RAM - but I don't see how to do it on a Pi. Sustained data transfer rate > 100MiB per second in both directions simultaneously.

You can do this kind of stuff with the Prus on the Beagleboards but it would be nice if it were possible on a Pi.

Simple means (in this context) not using lots of other fancy chips over and above the FPGA and not needing to use a GHz serial interface. (although if the PI had one spare that I don't know about I might have a go.)

I had wondered if the the camera or audio interfaces might be re-purposed.

MK

--
This email has been checked for viruses by AVG. 
https://www.avg.com