Serial EEPROM or Serial Flash?

I need to save some data on a non-volatile memory. They are some parameters that the user could change infrequently, for example 10 times per day at a maximum. In the range 100-2kB.

As usually occurs, the parameters that changes more frequently (10 times per day) are fewer than parameters that changes very rarely (10-100 times in the device lifetime).

How to save those data? After discarding the internal MCU Flash (because of interrupts block during programming), I'm deciding if it's better a serial EEPROM or serial Flash.

First of all, I think SPI is better than I2C. SPI seems much faster:

10-20MHz against 400kHz-1MHz. At least for reading. Erasing/writing time is identical between I2C and SPI.

EEPROM or Flash? I know EEPROM can be written one byte at a time without erasing an entire block, against Flash that needs a sector erase before writing even a single byte.

The firmware would be simpler with EEPROMs, because I don't need to save the entire sector before erasing and restoring it during programming, when writing a single byte. With EEPROMs I can write a byte. Stop.

However I don't think this simple approach can be used in a real production. Suppose I have 10 bytes to write. What happens if the writing process is stopped at the middle, maybe after 5 bytes? How to protect the system against those events? I think one solution is to have at least two copies of data in the memory and switch to the other bank after all data is completely written, with an "atomic" write operation. This means I need to copy&paste an entire block everytime, even for a single byte change. And this is similar to Flash approach, where I

*need* a sector erase before changing a single byte.

What about the time? EEPROM write cycle is about 5ms for a 32-bytes page. For 128 bytes/4 pages, 20 ms. Flash sector erase time is 18 ms, plus 14us for each byte. The overall write cycle time is similar between EEPROM and Flash.

If the data are bigger, for example 1kB, the Flash technology wins. The sector size in Flash memories are usually bigger than 1kB. So I need to erase only one time (18m + 14u*1024=32ms). In EEPROM I have 32 32-bytes pages, so 5m*32=160ms. 5 times more than Flash.

I'm not considering endurance. EEPROMs are better (1000k write cycles) than Flash (100k write cycles), but I don't need so much write cycles in the entire device lifetime.

After those considerations, I think I will choose a serial SPI Flash, maybe 512kbit. The price is similar to serial EEPROM of only 64kbit. I see only one difficulty: how to write a good device driver to manage serial Flash: copy the content of a sector in another sector, making the changes on-the-fly, and switch from one sector to the other in one "atomic" operation. Could you suggest some code to study?

I missed something other? Please share your considerations.

Reply to
pozz
Loading thread data ...

Read time for this sort of thing is usually irrelevant. Assuming you are not badly constrained for ram, you read the NVM structure into memory at startup and use it from there. This makes it far easier to use, and to handle things like duplicate structures in the NVM with timestamps, checksums, etc.

Roughly correct. It is not an exact difference - there are devices

are "flash" devices where you can write a single byte, and the device handles reading the old page into a buffer, updating the byte, erasing the page and re-writing.

Multiple copies (at least 2) are key, along with timestamps or counters and checksums.

Yes.

Of course, it is possible to have multiple smaller blocks rather than just one big one. And you don't have to erase the old blocks until you need to re-claim the space.

Reply to
David Brown

In my actual project I have 2kB of non volatile data and a MCU with 32kB of RAM. I think I can load all in RAM at startup... even if it's silly, because there's an array of 100 users that I don't usually need. I only need to search for a valid user when one login attempt happens.

Why? I think it is easier because, if you have all data in RAM, you need to only write during normal execution (the reading is performed only at startup). In this case, you don't mix reading/writing during execution.

Anyway you can change only one byte of a 32-byte pages.

Hmm... strange chip.

Two should be sufficient to prevent corruption caused by interruption during the writing of a block of data. Maybe three (or more) are needed to face memory *physical* corruption (maybe for many writings). I think I can ignore this event in my actual project.

This is interesting. Why do I need a checksum? My approach is to use only a magic number plus a counter... and two memory areas. At first startup magic number isn't found on any areas, so the device starts with default and write data on Area1 (magic/0, where 0 is the counter).

When the configuration is changed, Area2 is written with magic/1, being careful to save magic/1 only at the end of area writing.

At startup magics and counters from both area are loaded, and one area is chosen (magic should be valid and counter should be the maximum).

I think this approach works, even when the area writing is interrupted at the middle.

Why do I need checksum? The only thing that comes in mind is to prevent writing errors: for example, I want to write 0x00 but the value really written is 0x01, maybe for a noise on the serial bus.

To solve this situation, I need checksum... but also I need to re-read and re-calculate the checksum at *every* area writing... and start a new writing if something was wrong.

Do you have a better strategy?

This is good to reduce writing time of a block of data. In my case I have 2kB of data, around 62 32-bytes EEPROM pages, that is

5*62=300ms. I don't like to block a task (in a cooperative OS) for so long time.

So I could thing to split data in smaller blocks. Maybe I can accept a task that blocks for 50ms maximum. This means 10 pages, 320 bytes. So my data should be splitted in 6 areas.

I think they are too much.

Surely... however at steady-state I have only messed areas that should be erased at every writing.

Reply to
pozz

At power up, when you read Area1 (for example) how do you know that the dat a are not corrupted (even if magic/counter are correct)?

Bye Jack

Reply to
Jack

How can he know if the write was successful and not aborted midway because of a power interruption. With those flash devices who have their "own life" you don?t even have a guaranty that things get written in the same sequence as the software thinks. So writing the "magic" after the data to make sure it signals updated data does not guaranty that physically it gets written before the data.

--
Reinhardt
Reply to
Reinhardt Behm

nt

ew

e

have a

I suppose the timing of the write are known (and/or there is a pin/status r egister that will tell if the write has ended), so just wait a little bit a nd read back the data.

Bye Jack

Reply to
Jack

Il 15/06/2018 09:54, Jack ha scritto:

In my application I think I can ignore two problematic events:

- the value really written is different for a noise on the bus signals;

- the value really written is different for physical damage of the memory.

If I ignore these events/situations, I think checksums aren't needed and I can consider only magic/counter.

Reply to
pozz

Because of the counter that is written as the last byte. If the write has interrupted, the counter of the new area contains an old value, so at startup is loaded old area.

I can't follow your arguments... if I send write commands in a sequence, how the data could be written with a different sequence?

Reply to
pozz

If this is a flash device with an internal controller that does the blocking/unblocking for you - meaning to change a single byte it reads a block, changes the byte, erases the block and writes the block - it has its own life. You must know that it does not cache things and "optimizes" writes by reordering them. Some flash drives are just doing this. If it is a dump device then you are on the save side. But you must make sure. With the usual EEPROMs (like the small 8-pin I2C devices) you are on the save side. They are dump. But I would never trust these flash chips without having a clear specification from the manufacturer about how they work. Some manufacturers do strange things to get nice "performance" numbers for marketing.

--
Reinhardt
Reply to
Reinhardt Behm

nt

ew

data are not corrupted (even if magic/counter are correct)?

.

You can ignore these events only if you don't care about the data. Else you need to find a strategy manage the errors (in general this is easy : just read back the data after you're sure that the write ended).

The issue is not after write, the issue is after powerup, when you read the data from flash, you don't know if the data is correct or corrupted, even if the magic number/counter a consistent. A checksum is a way to be sure of this (but you can't recover if something is corrupted). Another is to write the same data multiple times and then us e a mojority vote to decide if they are correct or not (and recover if ther e is something corrupted).

Bye Jack

Reply to
Jack

It makes the code much easier. It is particularly helpful if you have some sort of log structure in your flash for wear levelling and erase block handling. Just read through the flash at startup, reading each log block. If the checksum matches and the counter is higher than the current block in ram, overwrite the ram block with the new data. It is a lot easier to spend a few seconds doing that at startup, than to spend a couple of seconds scanning the flash when someone wants to log in.

Yes.

Writing - in particular, erasing - takes time. But it is not an action that people need to wait for, and it can be run in the background. What you need to avoid is that foreground tasks - user intervention or timing critical jobs - have to wait to read data from the flash because you happen to be in the middle of an erase cycle.

There are many variants around. Anyway, the point is that the distinction is fuzzy.

You use more for wear levelling. Often your flash chip is /way/ bigger than you need - perhaps by a factor of 1000 simply because that's what you have on stock, or that's the cheapest device in the package you want. Spread your writes over 1000 blocks instead of 2, and you have

500 times the endurance. Use a good checksum and accept that sometimes a block will be worn out and move onto the next one, and you have perhaps a million times the endurance (because most blocks last a lot longer than the guaranteed minimum).

You use checksums to ensure that you haven't had a power-out or reset in the middle of writing, and that the flash has not worn out.

You calculate the checksum for a block before writing it, and you check it when reading it. Simple.

Your writing should be done in the background. The background writer task is blocked - the system is not.

Reply to
David Brown

On 2018-06-15 Jack wrote in comp.arch.embedded:

Yes, write the data multiple times in blocks, including a checksum. At power up start reading from the first block an continue until you get a block with valid checksum. I don't think a majority vote is required, if you read in the same order you write, the first valid block has the most recent data if power failed some time during the block writing sequence. But beware of issues with 'too smart' flash controllers as others mentioned.

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

Psychiatry is the care of the id by the odd.
Reply to
Stef

Ah ok, this is not my case. I'm going to use a simple serial Flash/EEPROM without an internal smart controller. Most probably I will use EEPROM.

Reply to
pozz

Il 15/06/2018 11:20, Jack ha scritto:

is easy: just read back the data after you're sure that the write ended).

I know it should be better... however I think they are so rare events that I could ignore in my application (that isn't so critical for humans).

Reply to
pozz

So the writing process should be:

  1. write the data at block i+1 (where i is the block of the current data in RAM)
  2. read back the block i+1 and check if checksum is ok
  3. if ok, writing process is finished
  4. if not, go to block i+2 and start again from 1.

Only for this thing, you can write the counter as the last byte. If the writing is interrupted in the middle, counter hasn't written yet, so the block is not valid (because considered too old or empty).

Do you calc the checksum of all the data block in RAM, including padding bytes?

Are you thinking of a preemption OS? I'm using a cooperative OS.

Reply to
pozz

You might find it is simpler to use an MRAM or FeRAM device. They are a little more costly than EEPROM, but far from expensive, and you can read and write them randomly very quickly, without worrying about erasing or blocks.

Reply to
David Brown

no, you read back block i+1 and check _the whole block_ against what you ha ve in RAM, if it's the same ok, else try again.

Checksum is useful at powerup to verify that what you just read is not corr upted (because you don't already have a copy in RAM).

Bye Jack

Reply to
Jack

Yes.

Then comes step 5 (for flash with separate erasing) :

  1. If you have written block x, check if block x+1 (modulo the size of the device) is erased. If not, then erase it ready for the next write.

Note that it does not matter if the erase block size is bigger than the program block size - if it is, then your "erase block x+1" command will cover the next few program blocks.

Nope. You can't rely on that, unless you are absolutely sure that you

that, even if they provide an interface that matches it logically.

A common structure for a modern device is to have 32-byte pages as a compromise between density, cost, and flexibility. (Bigger pages are more efficient in device area and cost.) When you send a command to write a byte, the device reads the old 32-byte page into ram, erases the old page, updates the ram with the new data, the writes the whole 32 byte page back in.

The write process is done by a loop that writes all the data, reads it back at a low voltage to see if it has stuck, and writes again as needed until the data looks good. Then it writes again a few times for safety

- either a fixed number, or a percentage of the number of writes taken.

So it is /entirely/ possible for an interrupted write to give you a valid counter, but invalid data. It is also entirely possible to get some bits of the counter as valid while others are still erased (giving ones on most devices).

And that is just for simple devices that don't do any fancy wear levelling, packing, garbage collection, etc.

Yes, of course. The trick is not to have unknown padding bytes. I make a point of /never/ having compiler-generated padding in my structs.

So you have something like this:

#define sizeOfRawBlock 32 #define noOfRawBlocks 4 #define magicNumber 0x9185be91 #define dataStructVersionExpected 0x0001

typedef union { uint8_t raw8[sizeOfRawBlock * noOfRawBlocks]; uint16_t raw16[sizeOfRawBlock * noOfRawBlocks / 2]; struct { uint32_t magic; uint16_t dataStructVersion; uint16_t crc; uint32_t count;

// real data } } nvmData_t;

static_assert(sizeof(nvmData_t) == (sizeofRawBlock * noOfRawBlocks), "Check size of nvnData!");

When you want to store the data, run your crc calculation over the raw16[] from index 4 upwards, and set it in the crc field. Set the magic field with a different fixed value per program. Use the dataStructVersion to track versions of the structure of the data - so that updated firmware can recognise old data and quietly update it. Set the count to a new number each time.

Leave plenty of room to add new fields to your nvm structure.

It doesn't matter. In a pre-emptive OS, your background writer is:

static enum { idle, writing, checking, erasing } writerState;

void doWriter(void) { switch (writerState) { case idle : if (writeTriggered) { writeTriggered = false; startWriting(); writerState = writing; } break; case writing : bool stillWorking = pollNVMdevice(); if (stillWorking) return; startChecking(); writerState = checking; break; case checking : bool stillWorking = pollNVMdevice(); if (stillWorking) return; startErasingNextBlock(); writerState = erasing; break; case erasing : bool stillWorking = pollNVMdevice(); if (stillWorking) return; writerState = idle; break; } }

There - half your program is done :-)

Cooperative multitasking does not mean busy waiting for long-running

polling, etc., so that you can mix fast and slow tasks.

Reply to
David Brown

Why raw8[]?

I think you can avoid raw16[] too. If you have the function:

uint16_t calc_crc(const void *data, size_t size);

you can simply call:

nvmData.crc = calc_crc( ((unsigned char *)&nvmData) + 8, sizeof(nvmData) );

This is interesting. Suppose you have a new struct version and you are reading an old version from the memory. How do you read it? Do you put it completely in RAM and after copy it in the final struct making the changes from one version to the other? In this case you need a double space in RAM, for old and new version.

Or you declare a "unsigned char reserved_for_future[1024]" big array in the struct, so the CRC is calculated over the full area?

^^^^^^^^^^^ You mean cooperative.

Yes, ok.

Reply to
pozz

I typically also do something like this. The data structure is a union of the basic data structure with a preamble that includes (in very fixed locations) a data structure version, checksum/crc, and if a versioning store a timestamp/data generation number. A 'Magic Number' isn't often needed unless it is a removable media as it will either be or not the expected data, nothing else could be there (if the unit might have different sorts of programs, then a piece of the data version would be a program ID.)

Often I will have TWO copies of the data packet. Often organized something like this:

struct SystemParameters { // List of data parameters that need to be saved } parms;

union { uint8_t raw_data[DATA_SIZE]; struct { struct StandardHeader header; struct SystemParameters parms;

} flash_parms;

/* Check that sizeof(flash_parms) == DATA_SIZE, and that DATA_SIZE is a multiple of the flash sector size */

And a request to save the data will fill flash_parms with a 'NULL' fill (0 or all ones, or whatever I want unused flash locations to be), then copy the parms into flash_parms.parms, then fill in the header, and then program the flash. Only the part that copys parms to flash_parms.parms needs to be protected from other operation in the system updating the main parms structure.

The flash_parms structure can be further modified to be a union of the current data structure of the SystemParameters structure and any previous 'incompatible' version of it. (A version from just prior to adding elements at the end doesn't need to be listed). This way at startup I can read the values into that same structure, and if it was old, parse to old version in flash_parms to get the new version in parms, defaulting new parameters as needed.

Checksums / CRC are always over the whole data array (skipping over, or using a fixed value, the value of it stored in the Header).

(see above)

You can use cooperative or pre-emptive. If cooperative then your flash write routine should yield (like the example seems to show) while it is waiting for the flash to do the actual operations. (If pre-emptive you just need to block for a bit and let the pre-emption do its thing, or make the flash routine a background level operation so a spin wait doesn't impact other operations)

Reply to
Richard Damon

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.