Robust configuration memory

- P
- Piotr Wyderski
  
  Contact options for registered users
posted
6 years ago

Sun, Feb 18, 2018 10:20 AM

Hello,

I need a moderate amount of non-volatile memory (for FPGA configuration purposes and the like), but can't tolerate configuration errors due to charge leaks, cosmic radiation or just the malfunction of the chip. I thought that something like RAID5 imposed on memory chips/SD cards would be fine. It would be extremely simple in an FPGA, but it creates a chicken and egg problem: how can you read the controller's configuration from flash if the flash itself might be corrupt.

It could also be done easily in hardware, but would require parallel-output chips, which are not particularly trendy nowadays and the SPI decoding circuitry would be insanely complex with simple gates.

I presume I am not the first person to have such a need, so what would you recommend me? It has to be autonomous only for (early) reads, the write mode part can materialize later within the FPGA. Any ideas?

Best regards, Piotr

- W
- whit3rd
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 11:27 AM

Flash is cheap; either redundancy (three copies and a vote) or checksums (two copies with two checksums) will warn of an error, and you can mark a block 'bad' and use another. Reading isn't stressful on flash, it's only the erase/rewrite cycles that you gotta worry about, eventually. So, minimize those.

Some implementations are available with internal ECC-checking, too.

- P
- Piotr Wyderski
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 12:24 PM

You're right, a single voting engine attached to three copies of the DOUT SPI line will do the job without the need to analyze the protocol. Problem solved, dead simple, thanks!

There will not be many write cycles, but the storage is leaky, so it will turn into all ones eventually. With the voting mechanism I can detect it early and re-program the chips with the correct(ed) content.

The interesting discovery now is the apparent lack of majority gates and the circuit is sufficiently complex not to fit within a simple logic circuit in a single package. It seems the best thing to do is to use an 8:1 MUX (say, 74AC241) and etch the LUT on the PCB. 6ns delay doesn't sound bad. Am I missing an even simpler implementation?

Best regards, Piotr

- C
- Chris Jones
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 12:57 PM

Simpler but not better: Tie them all together and let them fight it out. If you are somewhat merciful you could give each output a series resistor.

- P
- pcdhobbs
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 1:29 PM

There are some "configurable 2-input" gates that can be used for that, I think, if you play around with their truth tables a bit. (They actually have 3 or 4 inputs.)

Cheers

Phil Hobbs

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 3:36 PM

It should be pretty simple with a CPLD. You could also compare on a bit-by-bit basis, which shouldn't be too difficult, even limited to unit logic.

Put the redundancy in the FPGA. If a bit of the configuration memory is in error, it's unlikely to be wrong in each of the redundant parts.

You could use the FPGA to verify the contents of the flash, after configuration but before (application) enable.

There are all sorts of possibilities but a lot depends on unspecified requirements.

Not enough information.

- P
- Piotr Wyderski
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 3:46 PM

Which, again, is nowadays based on flash. Vicious circle. I don't want its configuration evaporate either.

And this is the correct approach: bit-by-bit voting on a serial stream. A single majority gate of still unknown implementation will suffice.

What do you mean? An FPGA won't boot from a corrupt bitstream. Do you mean many FPGAs?

But the flash can die just before/during configuration.

Best regards, Piotr

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 3:58 PM

OK, what error in the "checker" is going to give you an almost-good flash image for the target device? What I'm suggesting is an "equivalence" gate in the configuration stream.

Not majority. Just equal, if all you care about is "valid"/"not valid" configuration. If you want TMR, then you need voting, which is a little more difficult. But in this case, you probably need TMR in the application, too.

I thought this was your problem. If it doesn't boot, problem solved.

Sorry, I thought this was the desired outcome.

- P
- Piotr Wyderski
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 4:50 PM

OK, but no, the desired effect is to work correctly for as long as possible, even knowing about memory errors. Hence the RAID5 reference. I assume a configured FPGA works correctly, the problem is to provide it with a good bitstream, as I expect the flash chips to be least durable. Putting 3 instead of 1 doesn't increase complexity that much, but can greatly increase reliability.

In that case, don't most FPGAs already have all the needed checksum circuitry inside?

Best regards, Piotr

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 6:19 PM

Using a FPGA is going to a circular definition: The FPGA has some kind of semi-permanent memory for the configutarion.

I see two possibilities:

- Create the voter from discrete gates, - Use a processor to handle the redundancy, and get a problem of the processor code.

I once had a customer with an instrument with calibration data inside an EEPROM. It had the habit of losing the calibration occasionally, so in the next generation, a triple redundancy calibration store was made in the software. During years of use of hundreds of instruments, there was not a single case of redundancy correction. It was proven that the problems with the old generation were from flaky software.

--

-TV

- J
- Jasen Betts
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 9:51 PM

Slightly simpler.

............ . 4:1 mux . . . . X0 . ------ 0 A ---. A0 X1 . ------ C B ---. A1 X2 . -------C . X3 . ------ 1 . . . Y . ---------- ............

--
This email has not been checked by half-arsed antivirus software

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 11:15 PM

In that case, a CPLD should work. If the CPLD doesn't configure properly, there is little chance that it'll pass a configuration stream with a proper checksum.

OK, then use that fact.

- D
- Dimitrij Klingbeil
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Feb 18, 2018 11:23 PM

Make sure however that you have a way to disable the voting and access the memory chips individually for writing. For one thing, writing to all of them at the same time might be risky (in case a glitch happens), but also SPI EEPROMs tend to have a status register bit that is polled during writes to check for completion. With a voter still active, the write would be indicated as finished as soon as 2 out of 3 chips finish writing, with the third one at risk of getting into an undefined state.

Dimitrij

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Feb 19, 2018 2:36 AM

The pulldown will usually win.

Assuming the FPGA logic threshold is Vcc/2, and the FPGA and the flash chips use the same Vcc.

Sounds risky.

--

John Larkin   Highland Technology, Inc   trk 

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Feb 19, 2018 2:55 AM

Sum with resistors.

Comparator. LVDS comparators should be quick enough.

What, you don't like analog solutions to digital problems? ;-)

- B
- bitrex
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Feb 19, 2018 3:48 AM

Murphy's law dictates that you'll never get an isolated failure that can be majority-ruled two to one, you'll either get full agreement or a Mexican standoff.

The Space Shuttle used four active GPCs with a fifth one kept on warm standby designed only to run the de-orbit/descent program; if one was in consistent disagreement with the other three it was taken off-line and the mission continued, if another then went off the reservation then the mission would be aborted and the fifth computer brought up to run the descent program ASAP.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Feb 19, 2018 6:09 AM

It could be done with three resistors and a comparator. LVDS receivers typically have a +-100 mV offset spec, which would be good enough for a 3.3 volt majority decision. The resistors need to be big enough to swamp the flash chip output impedance variation, so a little speed might be lost.

Good place for a digital solution.

--

John Larkin   Highland Technology, Inc   trk 

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- P
- Piotr Wyderski
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Feb 19, 2018 8:45 AM

I like it, thanks! Unfortunately, it seems that there is no single 4:1 MUX in the TinyLogic series, so there is no real estate win, but OTOH I can have two such voters in the same package or use the second MUX for something else.

Best regards, Piotr

- P
- Piotr Wyderski
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Feb 19, 2018 9:12 AM

Sure thing, Dimitrij. After the configuration I'd even gladly jump to x4 mode, as the FPGA can easily handle multichannel majority voting on its own. Just wanted to close the loop during the most sensitive stage. I didn't expect it to be *that* simple.

The updates will be rare, so even a software-controlled bit-banging, one chip at a time, will be fully sufficient.

Best regards, Piotr

- P
- Piotr Wyderski
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Feb 19, 2018 9:25 AM

Majority voting in a system composed of even number of participants, jazzy!

Best regards, Piotr