Typical flash-based storage features and performance

Hello,

We use IDE flash-based storage in our embedded systems. For several years, our supplier has provided us with PQI DiskOnModules:

formatting link

I find their performance rather lacking.

Media transfer rate write 1.2 MB/sec (typ.) read 4.1 MB/sec (typ.)

Interface burst transfer rate PIO mode 2 - 8.3 MB/sec (max)

The data sheet also mentions a pair of DMA signals, but I can't figure out how to enable DMA.

Do modern flash-based IDE-compatible storage solutions offer more features, like DMA bus mastering, multi-word DMA, Ultra DMA, higher PIO modes and better throughput (both interface and sustained)?

Or are the DOMs I have typical of what is available today?

# hdparm -v /dev/hda

/dev/hda: multcount = 0 (off) IO_support = 1 (32-bit) unmaskirq = 1 (on) using_dma = 0 (off) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 500/8/32, sectors = 128000, start = 0

# hdparm -I /dev/hda

/dev/hda:

ATA device, with non-removable media Model Number: PQI IDE DiskOnModule Serial Number: DOM6B00011677 Firmware Revision: ra03.00e Standards: Likely used: 1 Configuration: hard sectored not MFM encoded head switch time > 15us fixed drive disk xfer rate > 5Mbs Logical max current cylinders 500 500 heads 8 8 sectors/track 32 32 -- bytes/track: 0 bytes/sector: 528 CHS current addressable sectors: 128000 LBA user addressable sectors: 128000 device size with M = 1024*1024: 62 MBytes device size with M = 1000*1000: 65 MBytes Capabilities: LBA, IORDY not likely Buffer type: 0002: dual port, multi-sector Buffer size: 1.0kB bytes avail on r/w long: 4 Cannot perform double-word IO R/W multiple sector transfer: Max = 1 Current = 0 DMA: not supported PIO: pio0 pio1 pio2

# hdparm -i /dev/hda

/dev/hda:

Model=PQI IDE DiskOnModule, FwRev=ra03.00e, SerialNo=DOM6B00011677 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=500/8/32, TrkSize=0, SectSize=528, ECCbytes=4 BuffType=DualPort, BuffSize=1kB, MaxMultSect=1, MultSect=off CurCHS=500/8/32, CurSects=128000, LBA=yes, LBAsects=128000 IORDY=no PIO modes: pio0 pio1 pio2 AdvancedPM=no

  • signifies the current active mode

# hdparm -t /dev/hda

/dev/hda: Timing buffered disk reads: 18 MB in 3.00 seconds = 5.99 MB/sec

# hdparm -t --direct /dev/hda

/dev/hda: Timing O_DIRECT disk reads: 20 MB in 3.30 seconds = 6.06 MB/sec

Regards.

Reply to
Spoon
Loading thread data ...

In article , Spoon wrote: ...

All the DMA stuff you can forget about - no serious modern device would use a parallel IDE interface (more properly called P-ATA). All the more modern interfaces don't care about DMA. Furthermore, at the speeds you quoted, the problem is not DMA or PIO; it is the internal transfer speed of the flash chips.

There are much higher performing solid state memory disk devices. I just tested two different of them (names of manufacturers withheld); their performance ranged from 105 MB/s writing all the way up to 390 MB/s sustained read or write (the slower one was flash based, the faster one RAM based with battery backup and firehose dump to persistent storage). Naturally, neither of them had "IDE" interfaces (nobody who cares about actually storing data would use an IDE interface); these are twin-tailed 2 GBit fibre channel.

These devices are available in capacities ranging from about 16 GB to

1TB. Some of them have the formfactor of a disk drive (1.8", 2.5" or 3.5"), while others are rack-mounted boxes. Many have fibre channel interfaces, but similar models are available with Infiniband, SCSI (soon to be SAS), and SATA. Performance goes from good (dozens of MB/s sustained) to extremely superb (as measured above).

Do a web search for "flash disk" and "solid state disk". Contact the usual vendors. And brace yourself for the prices - it is not unheard of to spend anywhere between $10K and $100K for a multi-GB to TB drive. You get what you pay for.

If you are asking for a free lunch (namely, a device that is for example 16 GB, 50 MB/sec, and only costs $200): I don't think free lunches exist, but I would be delighted if someone found one.

-- Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca_dot_us

Reply to
_firstname_

Un bel giorno Spoon digitò:

DMACK and DMARDY, they are missing in some CF adapters too. Make sure they are connected to the IDE controller.

You should. In my experience with CompactFlashes used with PC104 systems, enabling the DMA mode had a tremendous impact on performances (either T/R and CPU occupation).

Almost every CompactFlash (some are UDMA, some are multiword DMA). I don't know if this is true also for DOMs.

--
emboliaschizoide.splinder.com
Reply to
dalai lamah

In the past have used this particular DOM module. Not only the transfer rate something to be desired, since it only supports PIO mode transfers it also hogs the CPU, blocking the entire system while reading and writing to disk.

There are also DOM modules from the same manufacturer which do support higher transfer rates (quite a bit faster than yours, but still a lot slower than a real hard disk) and supporting UDMA transfer modes which reduces the CPU load quite a bit. These modules are not much more expensive than the one you mentioned.

Reply to
Patrick de Zeester

The sustained transfer rate of CF cards is limited by the flash itself, not the interface. The speed can be VERY different depending on the particular make and the model a card. Anywhere from 1 to 6 Mb/s for read and from 0.5 to 4 Mb/s for write. Unfortunately, manufacturers rarely spec those parameters; they provide only the interface data rates. That numbers are fairly useless but they look impressive.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

Un bel giorno Vladimir Vassilevsky digitò:

True, but with modern (and fast) CFs the limitation of the interface comes a lot earlier. For example, Sandisk Extreme III has a read T/R of about 2 MB/s in PIO mode, and 8-10 MB/s in DMA mode.

--
emboliaschizoide.splinder.com
Reply to
dalai lamah

That looks strange. Why the CF would be slower in PIO mode? Couldn't this be a limitation of your host controller?

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

Un bel giorno Vladimir Vassilevsky digitò:

It could be, even if it happened with two different PC104 boards. My conclusion was that for some compatibility reasons with legacy devices, PIO mode was kept deliberately slow. Anyway I don't know for sure, I was quite satisfied with the performances in DMA mode and didn't investigate further. :)

--
emboliaschizoide.splinder.com
Reply to
dalai lamah

We are using CF in our own board with our own drivers. Although I can use the DMA mode, I am avoiding it for the the maximum compatibility with the CFs of the different makes. I was surprised to find out that many of the CF features are actually optional and not guaranteed to be always supported. As for the CPU usage, there is no advantage in our case since the transfer is done by the bus master DMA regardless of the mode. As long as the PIO mode is set sufficiently fast, I can't see any dependence of the transfer rate from the further increase of the bus speed.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

I don't know that much of the CF particulars, myself. But I am somewhat familiar with ISA bus traffic and the various chipsets.

Programmed I/O over the ISA bus use IN and OUT instructions to perform their operations. The ISA bus rate has long been decoupled from the processor rate and usually is set up by the BIOS for something akin to around 8-9MHz, or about 120ns/clock. An 8-bit I/O or memory read/write requires 6 of these bus clocks. A 16-bit I/O or memory read/write requires 3 or 6, depending on how the hardware handles an available signal wire on the "newer" AT 16-bit bus signalling.

On first blush, one might argue this way: The DMA controller logic (whether an older explicit chip or part of a now-complex chipset) is permitted to operate independent of the instructions running on the CPU (and will operate faster if the CPU avoids placing pressure on the bus) and doesn't require any instruction fetch and decode while it is ticking away. However, on the modern cpus (PPro and later, let's say) all that won't get much in the way, since the processors are so fast and have plenty of internal cache, not to mention chipset queues and buffering and so on, anyway. So the limitation would seem to simply be the bus clock and similar performance either way.

A little closer thinking would say, 'wait a minute.' In the one case, the DMA controller hardware issues the I/Os where in the other the processor must issue them. And processor I/O is actually a front side bus transaction (4-7 front side bus overlappable clocks) and must be passed along by the chipsets to the south bridge (or equiv) to be performed. It must go through queues, following certain rules about that, etc. So perhaps these cause delays that aren't present in the DMA controller case, which operates in the periphery of the south bridge much more directly.

Still deeper and you would realize that these cpu-initiated I/Os must travel outbound via the PCI and sideband channels to initiate an ISA bus cycle emulation on the south bridge, the resulting read data (assume read for now) must then travel back via the south bridge onto the PCI and through the chipset back into the CPU. Then another I/O might be issued. It's a long process.

On the other hand, since even the PCI bus's burst transactions cannot follow the ISA bus DMA rules without extra, out-of-PCI-band signalling available and since ISA DMA for legacy reasons must be supported, the entire chipset includes specialized out-of-band signaling for DMA transactions. And these cause the PCI bus to hold off allowing inserted transactions while a DMA burst is in progress (to support the fixed requirements of ISA DMA.) A DMA transaction then, in effect, has the entire attention of the chipsets and no two-way requirement through queues and so forth from the CPU while all this is going on.

It's been years for me thinking anything about this stuff, but I could easily imagine at least a 2X and probably a 4X improvement over PIO.

Jon

Reply to
Jonathan Kirwan

You are an idiot. Literally hundreds of different flash modules are available with IDE interface and the features the original poster inquired about.

He is more likely to confront the annoying problem that many embedded boards don't run the signal lines necessary for DMA from the host chipset out to the flash socket or IDE connector.

--
  Thor Lancelot Simon	                                     tls@rek.tjls.com

  "The inconsistency is startling, though admittedly, if consistency is to
   be abandoned or transcended, there is no problem."	      - Noam Chomsky
Reply to
Thor Lancelot Simon

I suppose I should have stated that my message was cross-posted to comp.arch.embedded and comp.arch.storage. Given your answer, I assume you read my message in comp.arch.storage :-)

I don't need 16 GB or 50 MB/s. What I have today (128 MB) is enough.

I would be happy with 10 MB/s sustained reads and writes, an interface to match, and DMA support, to let the CPU carry on while data is moved to or from the drive.

Regards.

Reply to
Spoon

The motherboard is an EBC-2000 by Adlink with "standard" x86 parts. I'm not quite sure why Adlink calls it "embedded" :-)

formatting link

Reply to
Spoon

This is my biggest fear.

Suppose the OS decides to write 4 KB at 1.2 MB/s (that's 3.4 ms).

The nightmare scenario is if the CPU can't do anything between two

16-bit transfers, with interrupts disabled.

Everything is put on hold for 3.4 ms... My real-time processes would be quite unhappy.

I will ask our supplier to provide one such DOM for testing purposes.

Regards.

Reply to
Spoon

I'm confused by the sustained rates you mention. Do CF cards use a different type of Flash than that used in solid-state disks?

Because SSDs are competitive with HDDs these days. (SSDs are indeed flash-based, right?)

e.g. MTRON MSD-S25032 32GB 2.5" sustained read = 95.1 MB/s sustained write = 74.7 MB/s

formatting link

Regards.

Reply to
Spoon

Oops, sorry, I should have read your question more carefully, and not assumed something that isn't there. Storing 128 MB and reading/writing it at 10 MB/s is not my area of expertise, nor is realtime behavior of IDE interfaces and disabling interrupts for a few ms.

Apologies for injecting nonsensical chatter into an embedded systems discussion.

Now, if you wanted to store lots of GB or TB and read them at speeds on the order of GB/s, preferably from many hosts, I might get interested.

-- Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca_dot_us

Reply to
_firstname_

I don't know. The CF read/write speed is indeed very different depending on a particular make of a card and a mode of operation; and this is

*not* because of the limitation of the bus speed of the host interface.

I spent two days trying to achieve the ultimate performance by tweaking the CF driver. It appears that what is good for one card is actually bad for the other; and the speed tradeoff can be as much as two times. I tried Viking, Lexar and Sandisk.

I can't see much of the logical sense there; there is no detailed datasheets on those cards either. I will kindly accept any advice from someone proficient with CF at low level.

Mbytes or Mbits?

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

PQI's Industrial DOM web page is rather terse:

formatting link

they do offer the aforementioned datasheet:

formatting link

Could you provide a link to one of the faster PQI DOMs?

Regards.

Reply to
Spoon

Apacer offers a detailed (??) datasheet:

formatting link

Megabytes per second of course, i.e. 63% of the SATA/150 interface.

formatting link

Therefore not all flash-based storage is slow, it seems.

Reply to
Spoon

formatting link

Reply to
Patrick de Zeester

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.