SPI Interface

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 12:21 PM

It entirely depends what you mean by user level.

I was thinking 'at user privilege and part of a process whose parent is a user process'

Because that is what affects the real time response, or not.

Where the code actually resides is irrelevant.

--
Everything you read in newspapers is absolutely true, except for the  
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll

- R
- Rob
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 1:23 PM

You wrote "in TCP IP user level libraries, not hard linked into the kernel."

That is plain WRONG.

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 3:11 PM

ISTR there was an OR in there, but since you seem sufficiently isnecure as to want to be seen to win a USENET argument, I'm not contesting it further

--
Everything you read in newspapers is absolutely true, except for the  
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 6:52 PM

I don't understand you. Faster things will use hardware (DMA) as it is hard to do the fast things in software and it detracts from the "essential" resource, the CPU. It is the slow things that are usually done in software because it often isn't worth the trouble to provide a hardware interface. Graphics in particular has an entire processing chip to handle which runs much faster than the CPU doing the same task.

Yeah, so?

--

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 7:00 PM

I agree with most of what you wrote. The "relatively little interference with bus throughput" is weasel worded so no one can dispute it. Ignoring the "relatively little" part I do disagree. Any transaction on the buses of a modern CPU will degrade performance. More transactions degrade performance more. That is why they still have dedicated graphics memory on higher end PCs. The graphics unit can access memory all it wants but more importantly, it no longer interferes with the CPU access to memory.

--

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 7:01 PM

I agree totally.

--

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 7:08 PM

Obviously you have not heard of the memory wall...

formatting link

--

Rick

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 8:16 PM

I missed the split: Burst mode DMA is used for very fast transfers when its beneficial to stop the CPU clock and slam the data across faster than PIO can do it.

OTOH, when each data item in the block is only made available at intervals of several CPU clock cycles you use DMA in cycle-stealing or transparent modes because these both allow the block to be transferred without the CPU and the tasks its running needing to be aware of the transfer until it has ended. Which of the two modes is used depends on the capabilities of the hardware.

Nope: doing it on software is always more effort than using available DMA hardware. You have only two choices:

(1) no interrupts. This prevents the CPU doing anything else because the required logic looks like this:

while transfer incomplete if device status register = data available copy data item from data register to memory[address++] end-while

this was formerly quite a common way to move data between RAM and floppy in 8-bit systems.

(2) use interrupts. This requires the hardware to be capable of raising interrupts as each data item arrives and for the CPU etc to be fast enough to allow the interrupt handler to wake up, move the data item to RAM and and exit before the next data item arrives on the device and raises the next interrupt. If the interrupt is missed the data item is lost, corrupting the data block being written to RAM.

Note that this will take MUCH longer to deal with each data item than sitting in a tight loop waiting for data to arrive. Apart from needing more instructions the interrupt will always push some registers onto the system stack and exiting from the interrupt handler will pop them off the stack and back into the registers.

If you don't use DMA there are no other options available for servicing a block transfer device.

Yes. a Burst mode DMA transfer.

To bring matters back on topic, the RPi hardware uses DMA to handle almost all multi-byte transfers. Its DMA controller is capable of operating up to eight DMA channels in both Burst and Cycle-stealing modes, so it is unlikely that an RPi OS would do programmatic block i/o: why would it need to be programmed that way when it can use DMA burst mode to transfer data blocks to fast devices such as the GPU and DMA cycle-stealing mode to talk to slower devices such as USB-attached disks or the network interface (which attaches via a USB hub).

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 8:50 PM

Uh, DMA is not the only way to transfer data. DMA requires hardware. If it is available it can be used. If it is not you do PIO.

I don't get what your point is.

You are waxing long over a very simple matter. DMA is important when moving larger blocks of data at high rates because it is often not practical to do so with the CPU. Too much horsepower gets used moving the data.

PIO has minimal impact on the CPU when it involves low data rates. The spin loop you describe can take up very little CPU resources if done in a low priority task which is interrupted to run other tasks. A good example is a UART. At 9600 bps the byte come in at about 1 kHz. No reason whatsoever to tie up a DMA channel for this.

Don't make this more complicated than it is please.

No, it's not on topic yet. You are still talking about the general and the topic is SPI. Do you have any knowledge about how to make the rPi SPI interface more useful? Right now it seems like it is useful for taking temperatures or something similar a few times a second and not a lot more.

The SPI driver seems to fall far short of what is needed for real time ADC collection.

--

Rick

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 9:08 PM

NOt necessarily that bad

The minimum is push instruction pointer and jump to the service routine

All that has to do is probably push the accumulator or whatever, and a possibly memory pointer, load the accumulator from the port put the value in what the memory pointer is pointing at, and increment it, and check for wraparound: then set a flag for a user level process to pull the data out of the buffer.

Then pop ACC;POP pointer, IRET

as it were.

10-12 instructions and 3 heap accesses and 6 stack

Sitting in a loop polling a port is very wasteful

--
Everything you read in newspapers is absolutely true, except for the  
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 10:16 PM

My point is that if DMA is available you use it for block transfers regardless of the interval between data items becoming available for transfer. You use DMA because its easier and usually uses less CPU cycles than programmatic transfers take.

I'm going on about it because you clearly don't understand whats involved in programmatic transfers and especially don't understand the overheads in the interrupt-driven approach.

Total bollocks. A watch loop occupies 100% of the CPU because it can't be interrupted. It can't be interrupted because of the risk that a data item will appear and control won't be given back to the loop until after the item has been lost because it was overwritten by the next item. Once such a loop has been interrupted there's no way that you can know how long it will be before it gets control back particularly if you're mad enough to put it in a low priority task.

Think what happens if your low priority loop gets preempted by a higher priority task which has one or two higher-priority worker tasks: none of these may run particularly long, but between themselves they're quite capable of locking your low priority loop out for long enough to cause your loop to miss one or more data items. No imagine that while this is doing on an online user or two start typing....

Your data transfer is toast.

UARTs running at that speed are usually handled via interrupts, which are quite fast enough for a single threaded interrupt handler to handle several UARTs.

However, its also quite normal to run them up to 115200 baud, at point DMS may start to look a lot better than an interrupt-driven setup, particularly of the line is transferring exclusively packet data.

I'm not. Reality is often more complex than you appear to realise, particularly if you're trying to do all this in a multi-threaded process that runs in userspace.

If you don't understand the general principles, then you haven't a hope in hell of understanding a specific case.

Nope, but at present I don't have any need to do anything with it.

You've already been told what's needed. Its called a low latency kernel driver with a message-level API. How's your C?

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 11:05 PM

Yes, I know, but I've also been in places where that approach was needed.

A good example was when I was writing the customisation code needed to install the Flex09 OS on a 2MHz 6809. The issue was the sector read/write subroutine for the double density floppy drives. A 300RPM DD drive has an

80 Kbytes/sec transfer rate, which amounts to 25 CPU clock cycles per byte transferred. Off the top of my head that's enough time to execute between 5 and 8 instructions, so loop polling with a really tight loop worked well in this case and left enough headroom so that the potential timing mismatch and resulting jitter between the actual time the program loop checked the controller for a new byte and the rate at which the controlled received them didn't cause problems.

However, I think it would have been a very narrow squeak with interrupts.

The FIRQ interrupt's stack push and pop operations only use 6 cycles (push/pop PC and CC) plus another to read the RTI instruction, leaving 18 cycles for the interrupt handler. That should allow time for 4 to 6 instructions to be executed. Tight but, I think, possible.

It couldn't be done with an IRQ interrupt, which stacks/unstacks A,B,DP,X,Y,U,PC and CC registers. Just pushing/popping that lot and reading the RTI instruction has eaten all 25 cycles!

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 15, 2014 11:08 PM

DMA takes *zero* CPU cycles. Of course you will use DMA even for single byte transfers if it is available. But it is a lot more programming work to set up and for low data rate apps it is usually not worth the small benefit. So PIO is still used frequently. I've even given you an example, the UART. They mostly have FIFO buffers and so blocks can be transferred. But the low data rate doesn't justify tying up a DMA channel.

Ok, if you are going to persist in the notion that I don't know what I'm talking about I'll just quit discussing this with you.

Of course a wait loop can be interrupted. It is done all the time on multitasking apps. Yes, there is a requirement for the app to be run again before the max interval between transfers, but that is no biggie.

Also, PIO can be done via interrupts. Again very common.

Unless there is a handshake.... You are all wrapped around the axle thinking there is "one right way". Computers have done many things in many ways over the years and they all work given the right conditions.

Are you saying they change the UART driver when the bit rate is changed? Hardly.

You don't need to be offensive. I'm not trying to insult you am I?

Ok, so why are you in this conversation?

You are a trip...

--

Rick

- R
- Rob
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Sep 16, 2014 8:16 AM

Yes, that is correct. In those days I had a TRS-80 (Z-80 processor) and the floppy sector read was a similar tight CPU loop. It could just be done with interrupts disabled. The Z-80 required even more cycles per instruction.

Later I wired up a harddisk with SASI controller and the transfer rate was about the same (80 KB/s) as the CPU would not transfer much faster. (the disk itself could do maybe 150 KB/s)

The first PC disk controllers used the same method, but by then the CPU was faster and the disk controllers (loudly) boasted "1:1 interleave" so the disk transfer rate was 600 KB/s by then. This was still transferred using programmed I/O with interrupts disabled. Only later, when the disk controller already had been integrated with the drive (IDE), the first DMA modes became a reality.

- D
- Dr J R Stockton
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Sep 16, 2014 7:05 PM

In comp.sys.raspberry-pi message , Sun, 14 Sep 2014 23:01:22, Martin Gregorie posted:

MicroSiemens!

-- (c) John Stockton, near London. Mail ?.?. snipped-for-privacy@physics.org Web - FAQish topics, acronyms, and links.

- M
- Martin Gregorie
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Sep 16, 2014 8:37 PM

True only when the transfer is in progress. Code musr be executed to start a transfer and (probably) to tidy up after it.

Or maybe not. Its probably easier and faster to use an interrupt service routine for single byte transfers - provided that the data arrival rate (NOT the same thing as the character read/write time calculated from the baud rate) isn't high enough to stomp all over the interrupt handler.

Indeed.

Any nobody has said otherwise, in the right circumstances.

I've yet to see a UART with more than a single character buffer. That's not to say they don't exist, but they are certainly not common and probably not cheap. OTOH are you quite sure that a 'line buffering UART' isn't in reality an i/o card containing a standard single character UART, some RAM and a slow 8-bit CPU? I've certainly seen quite large systems with lots of these i/o cards fitted - one per terminal.

It seems quite obvious that you don't - on this subject at least.

...but it won't be a wait loop that can miss data if its interrupted for too long: some real time OSes have process schedulers that can obey this type of restraint and deterministic enough process execution to make that possible, but that just ain't the case for most general purpose multi- tasking systems, and this category includes Linux and all the other common desktop and server OSes.

.. but only if the data arrival rate is low enough for the interrupt handler to keep up with it. See the following messages where both Rob and myself describe cases where interrupts were too slow and DMA didn't exist.

If you're using a spinloop to drive a block transfer its because you don't have time for handshaking, interrupts of anything else. Stop introducing red herrings.

Don't be daft. You look at the requirements and pick a suitable way to code the driver. A line that's used for very high speed synchronous block transfers is never going to suddenly switch over to working with async characters as low baud rates.

The same hardware could well be used in both cases but the software would be configured to use the appropriate driver for the line's intended use.

Usually not, but you *are* giving a remarkably good imitation of a deliberately obtuse individual.

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |

- T
- The Natural Philosopher
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Sep 16, 2014 8:55 PM

Obviously havenyt done any ahredaqre design simce 1982

formatting link

The reasosn we all went 16550 from 8250 is that the damn thing had several bytes of buffer, and could communicate faster than 9600 baud

Sheesh.

I was writing a device driver for that back in the early 80s..

--
Everything you read in newspapers is absolutely true, except for the  
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll

- M
- mm0fmf
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Sep 16, 2014 9:10 PM

You haven't got a clue. The most common UART in the world is the 16550 family which features a 16byte FIFO. You'll find one in every PC made after about 1986. So that'll be billions of the blighters everywhere. They are were not expensive when new. And you've never come across one, puts all your anecdotes into perspective methinks.

Have you actually programmed anything modern? You know designed this century?

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Sep 16, 2014 9:15 PM

I guess I wasn't clear. I mean single byte at a time transfers like you might get from an I/O device. The issue there is whether it is worth it to tie up a DMA channel or not. In a PC, most likely it is not worth it unless there are other reasons than performance for using the DMA. An example is the case of SPI where DMA might be used to make the transfer happen on a more regular basis than an interrupt could.

I think you have not looked at UARTS lately. The ones used in PCs started growing FIFOs maybe 10 years ago, starting with just 4 or 8 chars. Now they are up to 64 and 128 I believe. If you are running windows, plug in nearly any serial interface dongle and go to the advanced setup and you will find a setting for "Use FIFO buffers (requires 16550 compatible UART)".

Ok, this is my last reply to you on this. I'm showing above proof that I am right. Believe it or not.

Yes, data can be missed if the process is interrupted for too long. That all depends on the length of the interruption and the time the I/O can wait. Interrupts are no different. Even on embedded systems where you control all the details you have to work very hard to know for sure an interrupt won't be missed because of a higher priority interrupt.

So?

Enjoy your week. :)

--

Rick

- J
- Johny B Good
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Sep 16, 2014 10:00 PM

ahredaqre?

The 8250 single byte buffered UART used by the IBM PC and XT models which was then replaced with the 16 bit 16450, backwards compatable with the old 8250 in 1984 for the IBM AT PC. This was eventually usurped in 1987 by the 16550A chip which had a 16 byte character buffer designed to ease the problems of serial communication over the com ports in a PC running a multitasking OS.

The functions of the 16550A were eventually subsumed into the ASICs that formed the "Chipsets" used in place of the TTL logic glue that made up a typical MotherBoard.

At least for the 16 bit version of the 8250, the 16450. It would have only been in the late 80s that you'd have been developing drivers for the 16550A chip (or embedded equivilent).

This issue of interrupts per byte or word of data transfer in a UART was resolved over quarter of a century ago according to my old friend, the Winn L. Rosch Hardware Bible (2nd edition 1992), which described it so much better than a wiki article ever could. :-)

--
J B Good