So anyone here who has what it takes to improve the SPI driver in the
rPi? I'm not certain that it will ever suit the needs of control
systems, but certainly there is a lot of room for improvement over the
existing driver. I would be interested in designing a board to go with
the rPi, but it would need a better driver.
Well, I'm bit-banging SPI from userland, as the kernel driver
didn't exist when I was designing my app, and I haven't updated
it for a long time. Bit-banging SPI from userland works fine
Out of curiosity, what's wrong with it that you want to fix?
[email address is not usable -- followup in the newsgroup]
The only issue with it is the transfer start-up latency.
Data transfer is fine - once its going. The latency affects lots of
little transfers - such as reading an ADC at more than 10K samples/sec.
This table has the details:
however don't trust the latency columns - not 100% sure they're anywhere
near accurate - the code is at
I do know that I wrote a userland program to transfer audio samples
straight from micrphne to headphines on a AMD64 linux station., It coped
pretty well apart from when the program started up - then the program
spewed out warnings about underruns and overruns. I even measured the
time taken to transfer the data. Utterly trivial. Its latency that gets
you - sometimes it can be several hundred microseconds before you get
your audio samples. And that's with a decent onboard sound chip, which
is a lot more than a A2D..
So something in the 'load and run a program' is occupying high level
interrupts for a long old time.
Everything you read in newspapers is absolutely true, except for the
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll
The timing issues. First, I've been told the existing examples only run
a a few kB/s transfer rate. More importantly when you are using an ADC
for many apps, it is important to trigger the conversion on a regular
period. Someone was looking for a 48 kHz sample rate which would have
needed to be triggered at a regular time to within better than a
microsecond. With any MCU not running an OS this would be child's play.
If the hardware supported it, even under Linux it shouldn't be hard.
I can't yet say for sure if the hardware supports this or not.
I was reading more in the peripherals handbook and SPI does use DMA. I
haven't found a link to a timer quite yet. I also haven't found a way
to trigger the ADC from a clock and then have the SPI transaction also
triggered by the clock... unless the clock creates an interrupt and does
the SPI transfers using PIO. That could work at 48 kHz I expect. It
would also give the best timing accuracy.
The DMA engines in the SoC can be linked to many peripherals in the chip,
but the current Linux SPI kernel driver does not use the DMA engine.
(I've just checked the source).
It works like a UART with a FIFO - copies data from memory into the FIFO,
waits the the nearly empty interrupt, lather rinse repeat.
I do not believe this is the cause of latency as I've verified that once
a transfer has started then it can continue at full SPI clock speed for
the duration of the transfer. This non DMA method will consume a few
more CPU cycles though, but I've not noticed it on the longer transfers
I've been doing (to an LCD display)
Looking at the code more, it seems to me that the way the transfer gets
kicked off is to enable the transmitter with no data, then use the
end-of-transfer interrupt to start the very first transfer. This may
well be a contributing factor to the latency when doing small transfers
- there is a comment in the code to the effect that trying to fill the
FIFO first before starting the transfer doesn't work.
It would not surprise me if this was some sort of limitation in the SoC -
it would not be the first one I've encountered...
Link to the source:
Do it in a kernel module and you have the best chance of succeeding. My
experiments to get an interrupt into Linux userland works well, but are
limited to about 66Khz at 100% cpu usage. (so 48Khz isn't going to leave
Also the trigger can be the CE line - it will go from high to low when
a transfer starts - that's supposed to be used by SPI devices to reset
themselves and synchronise their clock and to enable the MISO pin on
the peripheral (usually wire-or'd with othe SPI devices)
I've now forgotten what the original poster was trying to achieve
now. Also had some email in the past few days from someone trying to
synchronise 6 ADCs to do some direction finding...
This is all stuff the Pi's SoC was probably never really designed to do
- I suspect that half the peripherals are on the SoC because they're
left-over from the last one, or some special feature that some other
customer wanted, so may have been implemented to "just work" for some
other application rather than being more general purpose for what the
Pi world is looking for...
But once upon a time they said a Pi would never drive "neopixels" and
now it does...
In this context I'm not sure what latency is and I don't think it
actually matters. What matters would be that the transfers happen with
a defined period. It doesn't matter so much when the first one happens
as long as the process continues with the specified interval between
I tried looking at this source, but why do people have a love affair
with dim text? Comments can be colored differently without making them
hard to read... lol
More important is the low to high transition (end of CS or CE) which is
often used to clock the data into the "active" register to do something
in the peripheral. But it completely depends on the device. ADCs used
for signal processing (like the faster SAR devices) have a separate
convert signal which can be driven by a clock. Someone who was making
the rPi into an oscope did that at 10 MHz. But then he didn't have a
way to sync the data reads to the clock, lol.
Same task. His posts in another forum were the impetus for me to look
into this. To be honest, the more I dig the more I think I need to add
an interface device that lets the ADCs look like a FIFO. It would handle
all the detailed timing. The rPi would just need to keep up with the
But at this point I am curious about the combination of timer, DMA and
SPI. Or maybe SPI isn't the right interface and a parallel bus is the
way to go.
"Designed to do" often has little meaning in MCUs. They toss a bunch of
generic stuff together to address a market position and let the
developers figure out how best to use it.
It's the time from when your program says: transfer this 3 bytes over
the SPI bus (which at the same time reads 3 bytes in) and the call
returning to your program. If this time is the same as sending those 3
bytes over the bus then your effective rate is halved. Or worse if the
latency is higher.
If copying a million bytes in one operation then it's not an issue -
only when sending a very small number of bytes.
Here's a thought:
"Manually" Bit-bang SPI. Not hard.
Have 6 SPI ADCs. Connect their MOSI pins together to one gpio pin on
Connect their clock pins together to one pin on the Pi.
Connect their CE lines togther to one pin on the Pi.
Connect their MISO pins to 6 separate pins on the Pi.
So now you have 3 output pins, 6 inputs pins.
Make sure the clock & CE lines are the right polarity to start with.
(clock low, ce high I think)
Assert the CE line (take it low) then clock out the command that tells
the DACs to start the sample. They will all get this at the same time
(to within the length of the wires!)
At the same time (as your program is wiggling the clock), shift in the
6 inputs into 6 variables. One bit out, one bit in - that's how SPI works.
And hey presto you've just done 6 concurrent readings.
The clock speed will be limited by the basic GPIO software pin wiggle
speed - you're probably not going to clock it much faster than 5MHz
Then you just need to accurately time the above operation for the
duration, storing values every (e.g. 125 microsecs for 8KHz sampling) You
could crudely sit in a loop, work out the next time using gettimeofday(),
and timneradd () on 125uS, then do the sample, then spin on gettimeofday()
until it's >= the nextTime, (use timercmp() I think) and repeat for the
number of samples. It won't be as accurate as a hardware timer but it
might be good enough.
Then process the 6 arrays of sampled data and point an arrow in the
The sampling jitter caused by scheduling nondeterminacy in Linux is the
problem, and it is all but unavoidable.
It's why I earlier suggested that the simplest way to get deterministic
timing is to do the sampling with a "bare metal" microcontroller, then read
the samples with the Pi user process.
The cost of this solution is negligibly more, and the improvement in
accuracy and simplicity is priceless. ;-)
Processors are cheap--use the appropriate one for each job.
-michael - NadaNet 3.1 and AppleCrate II: http://home.comcast.net/~mjmahon
Yes, but why add hardware if it is not needed. I'm not yet convinced
that the hardware in the BCM chip won't do the job. It has all the
components. It could just be a matter of figuring out how to use them.
Even if I add an MCU or better an FPGA, I still have to figure out how
to make it all work. Even on an MCU you can't do it all in software,
the timing just won't work well enough.
Are you saying the text is not a grey color in your browser? I suppose
different screens show it differently, but this just looks absurd to me.
Heck, I once refused to sign a contract once because they wrote the
"small print" in well... small, grey print. I don't get the rational.
I didn't see the RAW button. I'll have to download the file and open it
in my text editor.