SPI Interface

On 13 Sep 2014 10:46:26 GMT, Rob declaimed the following:

Unless you have multiport memory, the main processor may be stalled waiting for memory access while the DMA processor is running. At best, you may alternate memory access between DMA and main processor.

{Strangely, the mainframe I got my BS in CS on, and my first multi-tasking desktop machine, both had multiport memory: The SDS/XDS Sigma series had a four-bank/four-port memory system, allowing the CPU and up to three IO processors to chase each other through memory; and the Amiga "CHIP" memory could be given fully to the IO processes while the main processor ran computations in extended memory}

--
	Wulfraed                 Dennis Lee Bieber         AF6VN 
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/
Reply to
Dennis Lee Bieber
Loading thread data ...

My first personal system did the latter; it was 6809 based: that system used a two-phase clock and a memory-mapped display based on a 6845 CRTC controller. The latter was run out of phase with the main clock, so both the 6809 and the 6845 had full speed RAM access with no danger of conflicts.

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |
Reply to
Martin Gregorie

Not sure that has been carried through to modern chips though.

What I do know is that trying to do real time audio sample processing under linux, heavy disk access is the one thing that causes packet drop and buffer over and underruns

OK that's on X64, which is different hardware. But it ought to have DMA in use - more so than a PI and its a faster processor.

--
Everything you read in newspapers is absolutely true, except for the  
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll
Reply to
The Natural Philosopher

Most fast processors have at least one level of caching. This mitigates the effects of memory interference from concurrent I/O, which usually accesses memory directly.

A bigger cause of larger delays is queuing delays for "disk" access, particularly when the OS does not implement an I/O priority queuing mechanism.

--
-michael - NadaNet 3.1 and AppleCrate II: http://home.comcast.net/~mjmahon
Reply to
Michael J. Mahon

Of course not! It was essentially a waste of 50% of the memory bandwidth, not something you would want to do at todays system performance.

Yes, on a good system there is no such problem. Either you have some misconfiguration or you are confusing "heavy disk access" with "too little memory".

Reply to
Rob

I don't know if carried over or not. The approach was of relatively limited use apart from fairly autonomous devices that could more or less be left to get on with the job once it had been configured and set running.

The 6845 had two ports that could be used to configure or interrogate the chip, which held 18 single byte parameters. These set things like where the display memory was, the number of lines * characters and where on the display the cursor was at present, but essentially all it did was to step through the address range used by the the display memory, scanning it line by line and injecting pauses for row and page flyback times. The addresses it generated were used to pull the data out of each display byte, combine the data with a character scan line number and feed that into an EEPROM acting as a character generator. The output from that went into a shift register and was clocked out to feed a modulator: IIRC if the character was under the cursor the 6845 could invert the scan line(s) so you could show the cursor as anything from a blinking or steady inverted block to a steady or blinking underline.

DMA support is, IIRC, a property of the MPU or its support chips. If they won't do DMA you can't use it because its effectively a hardwired interrupt handler that's just bright enough to react to an interrupt by grabbing a register's content, slapping it in the indicated byte, incrementing the address and going back to sleep until the next interrupt.

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |
Reply to
Martin Gregorie

Not my undrestanding of how DMA works at all.

IIRC form many years ago, you load up and source address, target address (or i/o port) and a count, and it steals ram access in between CPU cycles. Of course if your RAM is being accessed full speed anyway, its gonna slow the CPU down.

I think. Modern chipsets are - weird.

But two things cant be accessing e same RAM bus simultanoeusly, If the processor can hit the ram BUS flat out there are no gaps to steal.

Therfore its only faster if during the time its happening the CPU is - for example - operating out of onboard cache.

You need an ARM hardware expert to give chapter and verse.

I found some refernces to issue between cache ram and real ram when using DMA..

--
Everything you read in newspapers is absolutely true, except for the  
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll
Reply to
The Natural Philosopher

Jeez in the early 1990's on 386 machines with even smaller drives and RAM running windows for playgroups I was running 250kHz 16bit under DMA to disk and with graphical outputs. You can do many things which is not just the OS, but help the OS. Often we would have a seperate partition or drive for saving to disk or reading from and would start any major sampling with empty partition so one programme only was actually accessing that partition, cutting down OS queuing and managemnet of the drive and allocation tables.

We only really worried about dedicated hardware processing above 500kHz as ISA bus was only 6MHz. Having played with cards with many MHz of data sampling including video.

Some software would pre-allocate a large file then overwrite the blocks as contiguous blocks to cut down hassles, not so easy on SD card or USB Flash drives, use a proper disk even via USB if you want high throughput without waiting for block erases and other directory/storage management as well.

Multiple processes writing to same drive causing queuing and finding next available block delays, amongst other things.

Please dont say lack of memory and other such things as we were using

16MB and less systems in those days.

Give people more memory and they only fill it up with unecessary tosh, just look at their pictures, tunes and huge browser histories :-)

Big data, more like mainly big garbage heaps..

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk 
    PC Services 
  Raspberry Pi Add-ons 
 Timing Diagram Font 
 For those web sites you hate
Reply to
Paul

....

DMA was about DIRECT Memory Access, i.e. peripheral on some micros memory block transfers especially for memory mapped I/O, using a DMA Controller was grant cycle, block or burst or hog modes depending on system architecture. The HARDWARE controller then accessed memory directly (RAM for those who know of nothing other than semiconductor memory). The DMA controller could be in the peripheral (as in PCI devices and many older systems and mainframes), or a dedicated controller chip like 8x86 and ISA bus, or part of the micro-controller as in many 8/16/32 devices like the Broadcom chip.

The processor or more correctly the BUS arbitration logic even on 8 bit or 16 bit micros, determined it was not using the bus (as CPU was now busy doing things internally) and would allow DMA controller to take the bus and would if necessary have to wait for DMA controller to relinquish the bus. Few systems had the ability to force the DMA controller to hand system back as with many DMA controllers or buses spread over cabinets there was not enough wires to do so. Also often most peripherals only required either one or two access (ADC) or small block (disk drive or network card) accesses and these can be sorted by application and sometimes OS.

Multiport memory was not required for DMA support. Usually inside special controllers like graphics controllers and high speed comms that may or may not have used DMA. Often this was so the data could be output or input at high speed without breaks into the peripheral while the processor was also accessing via the bus. In vast MAJORITY of systems multiport memory was not part of the main system RAM.

Often in an instruction on MOST computers/micros there are several clock cycles or phase of the clock where the CPU is internally bound doing calculation or decoding instruction where bus is available.

These days with multi-level caches and pipelines the bus is less busy than people think it is.

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk 
    PC Services 
  Raspberry Pi Add-ons 
 Timing Diagram Font 
 For those web sites you hate
Reply to
Paul

+1
--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |
Reply to
Martin Gregorie

Think about it: if you ignore activating a DMA transfer (telling it where to put the next block from the selected i/o device and how long it is) and ending it (telling it the transfer from this device is finished) it can only do more or less what I described. Anything is a waste of effort and silicon.

Exactly so, but there still has to be some silicon in the MPU and/or its support chips that can be tasked with watching for low level i/o interrupts from the controller and either storing each byte it returns (read) or passing each byte to the controller to be output (write).

Exactly, but spinning rust is MUCH slower to respond than anything inside the MPU or i/o controller which is why DMA is interrupt driven and has little impact on the other silicon-silicon activity that the MPU is carrying out.

In silicon terms quite a lot of time passes between DMA interrupts implementing data transfers between disk controller and RAM. At 100 Mb/ sec a byte arrives every 0.1 uS but an RPi is executing an instruction in something like 10 nS or one every 0.01 uS (assuming 10 clock cycles per instruction: [1]). IOW, during a DMA transfer, other memory accesses only get pre-empted 10% of the time. [1] this is a semi-educated guess - real numbers welcomed.

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |
Reply to
Martin Gregorie

... and in the burst rate and buffering ability of the source peripheral?

OK: I didn't know it could be in the peripheral or i/o control chip but had thought it would be part of the processor. Thanks.

That's clear.

Yes, that much has always been clear.

Indeed.

Good point.

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |
Reply to
Martin Gregorie

I don't have numbers, but a CPU has to fetch instructions in addition to data. DMA doesn't. So DMA should never be slower.

--

Rick
Reply to
rickman

Really? So memory speeds don't need to continue to get faster?

--

Rick
Reply to
rickman

more that we are pushing the limits of the silicon

CPU clock speeds haven't changed much in the last few years, and memory is pretty static speed wise too.

the drive is towards lower power and towards smart pipelining and more onchip cache and better utilisation of it

Like this particular processor has more cache than my last, but it isn't substantially faster. That has odd effects: It seems to run e javascript interpreter faster, but other things are broadly similar.

--
Everything you read in newspapers is absolutely true, except for the  
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll
Reply to
The Natural Philosopher

Yes, I understand that, but DMA can't transfer a byte until the peripheral has read or written it and that completion signal has been presented to the controller chip. This overall peripheral transfer does not happen at anywhere near CPU clock speed. Even if it is a PCI-attached SSD or a packet buffering network device it will take considerably longer than one CPU clock cycle to have more data ready to be copied into memory by DMA. So I have to disagree: DMA will *always* be slower than clock cycles.

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |
Reply to
Martin Gregorie

what about if the peripheral is a memory mapped graphics card?

Pretty sure DMA is used to push data across to that too.

But its been a long time..

--
Everything you read in newspapers is absolutely true, except for the  
rare story of which you happen to have first-hand knowledge. ? Erwin Knoll
Reply to
The Natural Philosopher

That's been the case for quite a while, depending on what level of memory you're looking at.

With a carefully designed set of memory caches its quite possible to get good performance while using different speed caches by balancing the speed of access with the access rate for that cache.

Take the ICL 2900 mainframe as an example: the Order Code Processor, which executed OS and application binaries, had a 2 KB memory (the page size) and each OCP was single threaded. Each time an OCP executed an i/o request or got to the end of the current page it generated a page fault. This caused the next scheduled page in the page store to replace the current page and start to execute. The page store was directly connected to the OCPs and so this level of page swap was pretty fast. However, if this got full it could kick less frequently used pages out to a disk- based page store. It used a Least Recently Used algorithm to select pages for deletion.

I used these machines for several years and don't recall them having slow responses due to paging problems despite running 10-12 unrelated, major online green-screen applications with a total of 300-400 users. These were all multiuser database applications written in COBOL.

The page store on our machines was a whole 16MB.

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
org       |
Reply to
Martin Gregorie

I'm not sure where you are getting this. A bus transfer takes the same amount time no matter who the master is. The real point is that the DMA acts in parallel with the CPU. It is a bit like having another CPU dedicated to doing data moves, but there are no instruction fetches. In no way will DMA be slower than PIO.

--

Rick
Reply to
rickman

You are mistaken on many counts here. CPU clock speeds haven't increased for quite some time. The Pentium IV was clocking at over 3 GHz. But they had gone to a very deep pipeline which mitigated the faster clock speed by taking longer to fill on exceptions. Then they backed off from such a long pipe, which would have slowed the clock but speed was maintained by the inherent speed increases from the finer pitch process geometries. But this still performed faster and required faster memories, in fact necessitating the migration of the memory interface from the North bridge to the CPU. Most recently the push has been for lower power at the expense of the potential speed increases. So we have traded speed for low power keeping the clock speed the same.

Memory speed has been a limiting factor since before CPUs have had cache, hence the need for cache. Today the memory bus is filled with cache fills whether there is a cache miss or because the cache is filling on speculation. The advent of multi-core processors brought us the "memory wall" where adding processors gains little performance because of memory speed limitations. Each generation of memory ups the interface speed and fetches wider words from memory to supply that speed. The point is there is still demand for faster memory, and memory designers are filling it.

--

Rick
Reply to
rickman

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.