CompactFlash + microcontroller weirdness

H

H. Peter Anvin 21 years ago

Hello,

I wanted to inquire if someone else has seen this kind of phenomenon before...

I have a design (currently implemented in an FPGA) which drives a CompactFlash from a microcontroller, specifically a T80 (Z80 clone) from opencores.org, with my own firmware. The clock frequency is

25 MHz.

The weird part is that it reads all cards fine, but when it comes to writing I get some extremely odd behaviour. I have two cards -- a 64 MB card from IOData and a 128 MB card from SanDisk, both used but which write fine in a PC -- very quickly giving me an unrecoverable error on write, with the error register set to either 81h or 40h. The failing sector is filled with garbage which doesn't look random; it has a lot of 55h bytes in particular. Furthermore, the sectors

*after* the failing sector, up to the next 4K boundary, is filled with a 16-bit pattern, usually, but not always, 0Fh 80h.

Occationally, too, I see entire 512-byte sectors filled with zero without an error being reported. After those, normal operation resumes fine with the next sector.

However, a brand new card, 256 MB from POI, works like a charm.

It just seems very odd to me. I'm mostly a software guy, so the hardware aspects of this project are largely new to me. However, I have tried to eliminate glitches or asynchronicities. The board is the Altera NIOS development kit, so I have no reason to believe the electricals are marginal as I might have if it had been a custom board. However, part of the reason I'd like to understand the phenomenon is that I might want to use this design as a prototype for a "real" hardware project in the future.

Detailed info (read only if bored):

It uses the 8-bit common memory mode of CompactFlash, reading/writing single sectors at a time, in LBA mode. The timing of the signals is as follows (1 cycle = 40 ns):

- A/Dout latched at the same time CE1# is asserted, so they should be stable for the duration of the transfer 1 cycle - CE1# asserted, WE/OE# deasserted 3 cycles - CE1# asserted, WE/OE# asserted -> WAIT# sampled during this time; if WAIT# is sampled low within 2 cycles of WE/OE# assertion, the access will be held until at least 1 cycle after WAIT# deassertion - Din latched at the same time OE# is deasserted 2 cycles - CE1# asserted, WE/OE# deasserted

After sending a command, the firmware will wait for BSY# assert and deassert; the pulse is latched in hardware and then polled by firmware.

I have not actually seen WAIT# being asserted by any card that I have tried.

The source code (Verilog) is part of the project at:

ftp://ftp.zytor.com/pub/fpga/abc80/abc80-10.zip

Thanks!

-hpa

Vote

L

Lewin A.R.W. Edwards 21 years ago

Is there any power-cycling going on during this test? I have encountered a similar, but not identical problem, with certain specific cards. But I tied our problem down to power-cycling after a write operation.

Vote

H

H. Peter Anvin 21 years ago

Followup to: By author: snipped-for-privacy@larwe.com (Lewin A.R.W. Edwards) In newsgroup: comp.arch.embedded

Generally, no, although on several occations the system hung and I had to remove the CF card from its slot. However, at that point the system was already hung.

Did you find a workaround for power cycle after write? Intervening read, or go into sleep mode?

-hpa

Vote

M

Markus Zingg 21 years ago

[snip]

Whenever I had weird problems like this they boild down to haveing too long wires from the microcontroller to the CF.

HTH

Markus

Vote

L

Lewin A.R.W. Edwards 21 years ago

No, the problem was much more evil than that. It turned out that the problem (zeroing of sectors) happened randomly, but ONLY if a write operation had occurred followed - at any time - by a power-cycle. Moreover, the problem was happening on the power-up half of the cycle. In other words, the problem appeared if we did this:

1 write something 2 maybe read something 3 switch off 4 switch on

If we pulled the card out at step 3, and put it in a card-reader, we observed that the data was ALWAYS OK. Furthermore, if we then put the card back in our device, it would never have a problem. We only saw the problem if we left the card in the device. I'm going to guess that if we left it for several hours between steps 3-4, it probably wouldn't have problems either, but that would have been too time-consuming to test.

I "solved" the problem by changing the PCB layout a little, adding a large tantalum cap on the CF power rails, and adding weak pull[up,down as appropriate] resistors on the control lines. I don't know which of those were important; we were under time pressure and had no time for more than one additional board spin.

Vote

L

Linnix 21 years ago

Both are necessary, as well as properly shielded PCB. The CF signal lines don't have enough grounding pathes. We were having noise problems unless there are power & ground planes on the PCB. However, that means 4 layers and double the cost on the PCB. After much experients, we end up with Alumimum shielding covers on both sides of the CF & PCB. That way, we can stay with 2 layers.

Vote

H

H. Peter Anvin 21 years ago

Followup to: By author: snipped-for-privacy@larwe.com (Lewin A.R.W. Edwards) In newsgroup: comp.arch.embedded

Yipes. OK, I don't have control over the board layout, this being an FPGA development board, but I tried turning on the FPGA "slow slew rate" I/O option in order to reduce noise. It doesn't seem to have had any immediate effect, but I will continue to investigate.

One more data item that I forgot to mention: at least one of the CF cards did once work in the design, but unfortunately on a version of the project for which the sources never had made it into CVS, and were lost due to a file server disk crash... oops. Since this was quite a while ago I was starting to wonder if there was an issue with the CF card not performing releveling properly. I'll definitely hunt for power sequencing and other such issues, though.

If you don't mind telling, on your board, do you assert CEx# for each transfer, or do you leave it (one or both) tied to ground?

-hpa

Vote

H

H. Peter Anvin 21 years ago

After the rather unanimous comments in this group that the kind of problems I'd been seeing are probably noise-related, I went back and looked at the schematic for the FPGA development board (Altera Nios Cyclone edition) I'm using.

It turns out that the power supply for the CF card is gated, but the pin list I'd used didn't include the cf_power control pin - thus it got left tristated and floating. Apparently the unreliable VCC was good enough for reading, but not writing.

Explicitly connecting cf_power to VCC solved the problem.

Many thanks!

-hpa

Vote

J

Jeroen 21 years ago

I guess the card could draw enough power via the protection diodes from the lines that are high most of the time.

I have that board too and used the CF card in one project. Some cards didn't work, it turned out that one signal called PACK or something like that is connected to the LCD port on J12. I had to cut the wire on the CF card connector to make it work properly.

You know you can also attach a harddisk to it at J11? The 40 pin connectors are IDE compatible and from a software point of view, it doesn't make a big difference.

Jeroen

Vote

H

H. Peter Anvin 21 years ago

Followup to: By author: "Jeroen" In newsgroup: comp.arch.embedded

Either that or the floating input provided some power, enough that the caps could keep Vcc up for reading.

Yeah; I'm not using the LCD port for this project, for this very reason.

Interesting; I guess that explains the pin sharing, and running the CF card at 5 V, which I otherwise found hard to understand.

In my application it wouldn't work, though, since I'm using the common memory mode as opposed to the IDE mode of the CF interface, and 8 bits to boot. Not a big deal; my application is recreating a 1970's computer; the OS can't even access more than 64 MB.

-hpa

Vote

CompactFlash + microcontroller weirdness

Join the Discussion

Didn't find your answer?