atmel_serial driver modifications for 6Mb/s throughput in synchronous mode

Hi,

I need to transmit serial data at rate 6Mb through the AT91SAM9260 USART. I have succesfully modified the driver to work in synchronous mode (see thread:

formatting link
) However then I have faced strange problems at high data rates. I use USART with DMA. Tha application sets tty into "raw" mode. Anyway at 6Mb/s I've found, that some data are lost, but neither errors nor even messages in system log were generated. I supposed, that this is caused by DMA buffer overrun (the buffer length of 512 bytes is sufficient for ca 580=B5s of transmission at 6Mb/ s, so the user space application may have problem to receive data on time). Therefore I've modified the length of this buffer (PDC_BUFFER_SIZE constant) from the original 512 to 65536. However after this modification transmission was even worse. Maybe some parts of the code rely on fact that DMA buffer is e.g. not bigger than one memory page? Maybe some pointers used to communicate with this buffer use only limited number bits? Has anyone faced similar problems with the atmel_serial driver?

Looking at the amount of code assosciated with the TTY layer surrounding the serial driver, I'm considering getting rid of atmel_serial driver for this partical USART port and writing my own lightweight and optimized DMA based driver for high speed transmission of raw data. Maybe such solution already exists?

-- TIA & Regards, Wojtek'

Reply to
wzab
Loading thread data ...

formatting link

I don't know about the driver you are using, but we've had severe problems with Atmel chips and SPI running at high speed. Atmel is proud to have DMA for various peripherals, but they are too stupid to implement any kind of FIFO buffers. If the MCU bus gets too busy, the DMA system can steal no bus cycles and it simply drops the received data without telling anyone. I guess the same problem affects data transmission.

And yes, this really happens. We had to change our hardware and stop using SPI at all. In the good old Motorola MCU's (68k series eg) have both DMA _and_ FIFO and so they work reliably. Wish Atmel borrowed a brain and fixed their chips.

You may try to place the serial buffer in the internal MCU RAM - it has larger bandwidth and on the newer chips, it can be accessed even while the external memory bus is busy.

Good luck.

-jm

Reply to
Jukka Marin

hread/thre...

s

DMA

o

sing

MA

fixed

e

Thanks a lot, it seems, that you are right. So I'll need to move my USART dma buffer to the internal SRAM. As I can see, there are two areas with length of 4KB at physical addresses

0x20000-0x20fff and 0x30000-0x30fff I have found a structure at91sam9260_sram_desc describing how this area should be mapped by the Linux system:
formatting link

However I can't see any support for allocation of DMA buffers from this area :-(. There is no special zone (well for memory consisting of one page it makes no sense to define such a zone ;-) ). Does it mean, that I should allocate the buffer myself by something like: request_mem_region(AT91_IO_VIRT_BASE- AT91SAM9260_SRAM0_SIZE,PDC_BUFFER_SIZE,"usart") ? I wouldn't like to tie my buffer to the first PDC_BUFFER_SIZE bytes of this area. (There may be other drivers using SRAM for their high-speed buffers - eg MAC). Is there any memory management provided for allocation of small part of the SRAM? I've found the:

formatting link
, but it didn't provide me with all neded info...

In fact allocating of DMA buffers in SRAM for all USARTs will be an overkill. I have only one USART running at 6Mb/s, so it seems that anyway I'll need to customize the platform code for USART allocation... I'll appreciate any hints...

-- TIA & Regards, Wojtek

Reply to
wzab

OK. I have found a patch with full implementation of Ethernet TX buffer in SRAM: ftp://

formatting link
It makes a little clear how to allocate buffer:

+#if defined(CONFIG_ARCH_AT91) && defined(CONFIG_MACB_TX_SRAM) +#if defined(CONFIG_ARCH_AT91SAM9260) + if (request_mem_region(AT91SAM9260_SRAM0_BASE, TX_DMA_SIZE, "macb")) { + bp->tx_ring_dma = AT91SAM9260_SRAM0_BASE; + } else { + if (request_mem_region(AT91SAM9260_SRAM1_BASE, TX_DMA_SIZE, "macb")) { + bp->tx_ring_dma = AT91SAM9260_SRAM1_BASE; + } else { + printk(KERN_WARNING "Cannot request SRAM memory for TX ring, already used\n"); + return -EBUSY; + } + } +

However it doesn't explain how to avoid collisions between different drivers needing to use buffers in SRAM. E.g. in my case (Ethernet connected data acquisition using USART via optoisolation to receive data from AD converters) I need both Ethernet and USART to have buffers in internal SRAM...

-- Regards, WZab

Reply to
wzab

I have found the patch:

formatting link
which makes use of
formatting link
Maybe this is the right way to go. I'll share my results if I find a satisfactory solution...

-- Regards, Wojtek

Reply to
wzab

formatting link

The new AT91SAM9Gx5 chips have get rid of the PDC in favour of a real DMA controller with built in FIFO. This will allow burst access to the SDRAM. With the pricing of the SDRAM going up, the DDR2 interface of these chips, will make them more cost effective.

BR Ulf Samuelsson

Reply to
Ulf Samuelsson

Yes, I know, but unfortunately my hardware platform is fixed, and I really have to force THIS hardware (AT91SAM9260) to provide required performance...

BR Wojtek

Reply to
wzab

wzab skrev 2011-10-16 23:31:

Anything running high speed with the PDC needs to understand that the PDC needs access to the bus *often*. At 6 Mbps and 10 bits per character, start, 8 bit data, stop, it will take 1+2/3 us to handle one character. the PDC will request the bus, when there is a byte in the holding register. This byte needs to be moved to the memory before the next byte is received. If any group of peripheral occupies the bus for more than 1,67 us, then you lose characters on reception. If you put the receive buffer in SRAM, then the bus matrix will be of great help, as long as nothing else is eating up ALL the bandwidth to the bus. A PDC transfer should then only take 10 ns.

It should be OK to have the transmit buffer in SDRAM, saving the precious 4 kB of SRAM that normally is used for data. You might also consider setting up the bus matrix to prioritize the PDC. This is not done by default.

If you have to use Ethernet, you might be screwed unless you put the receive buffer in SRAM.

BR Ulf Samuelsson

Reply to
Ulf Samuelsson

Well, I have managed to allocate the DMA RX buffer for the particular USART in SRAM. It required small modification in atmel_serial.c . However now of course I got the kernel panic, when the atmel_tasklet_func called atmel_rx_from_dma, which in turn called dma_sync_single_for_cpu. As for ioremapped memory the mapping didn't exist, it lead to kernel panic.

Well, I can check if the currently serviced USART uses buffer in SRAM and bypass calls to synchronization functions, but I'm not sure if for this platform is it safe. Aren't there any cache mechanisms active for SRAM memory? When the PDC stores something to the SRAM buffer is it immediately visible for CPU, or should I invalidate associated cache?

-- BR & TIA, Wojtek

Reply to
wzab

Maybe for the buffer ioremapped from the SRAM memory I should simply directly call consistent_sync (as it is done in fact in the dma_sync_single_for... routines:

formatting link
)?

-- BR & TIA, Wojtek

Reply to
wzab

Oooops, sorry, I have not noticed, that consistent_sync does not exist for arm architecture. The implementation of dma_sync_single... for arm is different:

formatting link

-- Wojtek

Reply to
wzab

I have changed the implementation of atmel_rx_from_dma to use the __dma_single_cpu_to_dev and __dma_singlu_dev_to_cpu for USART whitch uses RX buffer in SRAM. However then I get the Kernel BUG at:

formatting link

Investigating the problem more thoroughly I've stated that my buffer is allocated at 0x200000 (SRAM0) and ioremapped to 0xc4890000, while high_memory is at 0xc4000000. Therefore my buffer address fails the test of address validity at

formatting link
.

I've checked if location of my buffer above the high_memory location makes synchronisation unnecessary (in fact I thought, that virtual memory returned by ioremap is accessed bypassing the cache mechanisms), but when I simply skipped the synchronization routines, I still got my data stream corrupted.

So either my SRAM0 located buffer still requires the DMA synchronization (but how to provide it?) or even though it is located in SRAM i still get lost characters due to "traffic jams" in the Bus Matrix (as Ulf suggested a few posts above). If the latter is the case, I should probably boost the priority of USART in the Bus Matrix, but this problem seems to be not weel documented :-(. I have found

formatting link
but the only places where the matrix seems to be used are:
formatting link
and
formatting link

Does it mean, that Linux on AT91SAM9260 uses the Bus Matrix in its default state?

-- TIA & Regards, Wojtek

Reply to
wzab

wzab skrev 2011-10-23 21:48:

formatting link

formatting link

formatting link

Linux does not set up the Matrix. This is (should be) done in at91bootstrap. The Atmel at91bootstrap hardly touches the matrix. The argument beeing, that only the customer knows what the priorities should be. I checked in at91bootstrap v2.13 into openembedded.

formatting link
where I set up the matrix for the SAM9263. It was done as an example, but I have not verified the throughput details.

BR Ulf Samuelsson

Reply to
Ulf Samuelsson

where I set up the matrix

I have put the matrix setup code into the board initialization code: (board-mmnet1000c, probably derived from board-sam9260ek.c)

static void __init ek_board_init(void) { /* Serial */ at91_add_device_serial(); /* USB Host */ at91_add_device_usbh(&ek_usbh_data); /* USB Device */ at91_add_device_udc(&ek_udc_data); /* SPI */ at91_add_device_spi(ek_spi_devices, ARRAY_SIZE(ek_spi_devices)); /* I2S */ at91_add_device_ssc(AT91SAM9260_ID_SSC, ATMEL_SSC_RX); /* NAND */ at91_add_device_nand(&ek_nand_data); /* Ethernet */ at91_add_device_eth(&ek_macb_data); /* MMC */ at91_add_device_mmc(0, &ek_mmc_data); /* I2C */ at91_add_device_i2c(NULL, 0); /* LEDs */ at91_gpio_leds(ek_leds, ARRAY_SIZE(ek_leds)); /* Push Buttons */ ek_add_device_buttons(); /* Modify the Bus Matrix settings */ printk("AT91_MATRIX_SCFG0 was %x \n",at91_sys_read(AT91_MATRIX_SCFG0)); at91_sys_write(AT91_MATRIX_SCFG0, 0x010a0010); printk("AT91_MATRIX_SCFG1 was %x \n",at91_sys_read(AT91_MATRIX_SCFG1)); at91_sys_write(AT91_MATRIX_SCFG1, 0x010a0010); printk("AT91_MATRIX_PRAS0 was %x \n",at91_sys_read(AT91_MATRIX_PRAS0)); at91_sys_write(AT91_MATRIX_PRAS0, 0x00200300); printk("AT91_MATRIX_PRAS1 was %x \n",at91_sys_read(AT91_MATRIX_PRAS1)); at91_sys_write(AT91_MATRIX_PRAS1, 0x00200300);

}

I have also checked, that my ioremapped buffers do not require DMA synchronization.

Anyway the problem has not been succesfully solved, as the overhead introduced by the whole tty related layer of atmel_serial driver is simply to big.

When I receive data up to ca. 170kB/s everything works fine. When the data stream reaches the desired 340kB/s, some data are lost probably not between the USART and SDC but due to software buffer overruns.

Probably I should get rid of the whole, very good, but from my point of view too sophisticated atmel_serial.c driver, and instead I should write a minimalistic DMA based driver passing data directly from DMA buffer to mmapped memory in the user space...

Thanks for help in resolving of bus matrix related problems.

-- BR Wojtek

Reply to
wzab

Ooops, of course I have mistaken. It should be the memory buffer allocated in kernel driver, mmapped by this driver for the user mode application.

To move data to the user space memory I should use get_user_pages, but this again is too complex for this purpose ;-).

-- BR Wojtek

Reply to
wzab

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.