Linux: DMA transfer to memory mmapped into user application

W

wzab 14 years ago

Hi,

I need to write a driver in which the DMA engine (namely the PDC in AT91SAM9260) transfers the data directly to the buffer mmapped into the user application (allocated by vmalloc_32_user or consisting of single pages allocated by get_free_page, each page is in the DMA accessible zone - so no bounce buffer exists, and is locked in memory). The length of the buffer may be ca. 1 or 2 MB. The DMA engine supports chained DMA, so in the interrupt routine I can keep track of the last received data, program DMA register for the next transfer and update buffer head value visible from the user application. However to make the last received data visible for the user application I also need to synchronize the DMA buffer (in this case it is probably equivalent to simple invalidation of CPU cache).

If I use the standard DMA mapping, then I have also functions like dma_sync_single_for_cpu.

However in my case, I'd like to avoid all the overhead associated with maintaining of multiple mappings (one for each page) and calling of dma_sync_single_for_cpu.

What I'm looking for is a kind of "sync_single_page_for_cpu". Analyzing the implementation of dma_sync_single_for_cpu I have found the function ___dma_single_dev_to_cpu which seems to do the necessary operations - but it is not recommended for direct use.

Maybe another approach would be better and cleaner - I should create an sg mapping, building sg_list containing entries for each individual page of my buffer. However then I should have a function like dma_sync_sg_range_for_cpu (just as we have dma_sync_single_range_for_cpu) to synchronize only the last updated part of the buffer. I don't want to synchronize all the entries in my sg list.

Maybe I should simply imitate the behaviour of dma_sync_sg_range_for_cpu using the following part of code:

formatting link

In fact, as I know, that no bounce buffer is used, call of __dma_page_dev_to_cpu(page, offset, ength, dir); should be enough.

Currently I use less efficient solution with temporary buffer located in internal SRAM, where PDC stores the received data, and from where tha data are copied in the bottom half of the interrupt routine to the vmalloced buffer.

-- I'll appreciate any suggestions, TIA & BR, WZab

Vote

W

wzab 14 years ago

I have published sources of the driver mentioned above. They are available at alt.sources usenet group. See the post "Lite high speed synchronous mode driver for USART in Atmel AT91SAM9260" or simply follow the link to Google archive:

formatting link

62a9b

I hope, that the above driver may be useful for someone else...

-- Regards, WZab

Vote

U

Ulf Samuelsson 14 years ago

2011-11-19 11:39, wzab skrev:

The advantage of DMAing to SRAM is that you avoid problems associated with different peripherals competing for the SRAM. Then again, the amount of STAM in the SAM9260 is fairly limited.

Best Regards Ulf Samuelsson

Vote

W

wzab 14 years ago

Yes, I know. That's why I use the approach with attempt to reserve the buffer from both SRAM areas.

However DMA from PDC to SRAM results in necessity to copy data from SRAM to the final buffer using the CPU.

Maybe DMA directly to SDRAM could provide sufficient bandwidth

I can boost the priority of the USART in bus matrix, as it was discussed (also with you ;-) ) here:

formatting link

457 , of course in this case I should increase the priority of the access from PDC to EBI, writing at91_sys_write(AT91_MATRIX_SCFG3, 0x010a0010); and then at91_sys_write(AT91_MATRIX_PRAS3, 0x00200300);

So my main problem now is how to "synchronize" the buffer between PDC and CPU with maximum achievable efficiency...

-- Thanks & Regards, Wojtek Zabolotny

Vote

Linux: DMA transfer to memory mmapped into user application

Join the Discussion

Didn't find your answer?