Confused about Flash

One of my engineers left to go Indonesia and teach, or something, and I have inherited 17,000 lines of really ghastly, buggy, ugly assembly code for an embedded product. It looks easier to rewrite it from scratch than to try to fix it, so that keeps me off the streets for the rest of the month.

This thing has an ST flash chip, M29W400BB, which is 4M bits, used in

256kx16 mode. The datasheet is typically confusing. So please check me on this:

If I write a secret combination of words to a secret list of addresses in the chip, six writes total, I can tell it to erase one of its 11 sub-blocks of memory. Apparently I can't do normal reads during erase, so I can't run the code out of the same flash I'm erasing. I have to erase a block (to all 1's, like an eprom) before I can program it. A block erase can take up to 6 seconds, but I can poll it to see when it's done. Apparently I select which block is to be erased by writing

0x30 into any address of that block, as the last operation of the erase command.

(The datasheet is cute. I's not obvious whether writing to address "BA" means "write to address 0xBA" or "write to an address in the block". Seems like the latter makes sense.)

Write 0xAA to 0x555 0x55 to 0x2AA 0x80 to 0x555 0xAA to 0x555 0x55 to 0x2AA 0x30 to any address in block to be erased

wait 6 secs or poll for erase done

Programming flash is less clear. Apparently I execute a chunk of secret writes, one for each word I want to load, each with three command code writes followed by an address+data word write. "The final write operation... starts the write state machine." I assume from this that the actual burn of a single word begins after each poke-a-write-word command sequence, and it seems to take 10 us typ,

200 us max, and is again pollable for done.

Write 0xAA to 0x555 0x55 to 0x2AA 0xA0 to 0X555 data to target address

wait 200 usec or poll for write done

It sounds like here, once I erase a whole block, I can program any addresses within that block, as many or as few as I like, at any desired addresses, at any time. There seems to be no time constraints on how long it takes me to do this.

During erase or program, I again can't execute code out of flash, so I'll have to relocate the flash erase and write routines into CPU ram and run them from there.

Of course the datasheet has no straightforward "to write a block, do this..." stuff, or any examples.

Oh well, even if nobody answers this post, just typing it has helped me figure out what's probably going on.

John

Reply to
John Larkin
Loading thread data ...

I can sympathize. I wrote an assembly procedure for an AT161B atmel data flash ... same idea. Lots of command codes, etc. I'll send it to you if you wish. It's commented ... sorta.

Frank

Reply to
Frank Raffaeli

Along the same idea, the datasheets and appnotes for the Atmel dataflash are pretty clear, and give good details and even flow diagrams. While the actual commands are probably different from the ST one, the other info might give you some clues.

formatting link

--
Regards,

Adrian Jansen           adrianjansen at internode dot on dot net
 Click to see the full signature
Reply to
Adrian Jansen

Right- well it is clear you lack any capability for hierarchical partitioning of information, as I always suspected, and this will make the job doubly difficult for you. However, I applaud your openness and can't help but acknowledge your status as a symbol of hope for other untalented overachievers...

Reply to
Fred Bloggs

The Atmel Dataflash has RAM buffers on chip. Meaning you first fill the RAM then tell the chip to copy the RAM to the flash. Plus it is much, much faster.

Rene

--
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
& commercial newsgroups - http://www.talkto.net
Reply to
Rene Tschaggelar

John:

Many of the x8 and x16 FLASH devices in the market place use very similar programming algorithms. As such a search for App notes at other vendors sites can lead you to good insights for your ST part. For example look at this web link for some AMD algorithn flow charts that may help you:

formatting link

It is also possible to find some C and assembly language code samples that perform the standardized programming algorithms. Please let me know if you cannot find any and I can send you some samples to look at. My code is in x86 assembly language so should not be that hard to translate to other microcontroller platforms.

- mkaras

Reply to
mkaras

John, yes, the hard part is to have the Code in the RAM while doing the flash and hope the power stays on. Copy the code from Flash to the RAM, jump to it, and when it is done, whatever was done, do a clean reset. Since the code copied from the flash to the RAM is on a different adress, the code shouldn't contain any absolute jumps. Hmm, yes, the interrupts... This procedure used be done 20 years ago.

Perhaps there is a data line that reflects the (busy-) state.

Nowadays the controllers have boot code sections in the flash from where the application code flash can be handled. I know you already checked whether a redesign with modern hardware would make sense. Apparently it didn't.

Rene

--
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
& commercial newsgroups - http://www.talkto.net
Reply to
Rene Tschaggelar

You may also want to reference this link:

formatting link

Intel has been a champion of the CFI (_C_ommon _F_lash _I_nterface) for a long time now.

- mkaras

Reply to
mkaras

formatting link

I messed the link.....

Here is to the Intel CFI App Note:

formatting link

- mkaras

Reply to
mkaras

This is easily done by adding an appropriate section in the linker script file. You'll have to set the address to a RAM address, and the storage to somewhere in the flash. During code init, you initialize the RAM by copying the code, using symbolic labels that the linker provides, or that you add to the linker script yourself. The exact mechanism depends on the tool chain. It is very similar to the section that hold the initialized data in RAM, so it could be just a matter of copy/paste a few lines in the existing linker script, and make a few modifications.

This way, you can use absolute addresses, and just call the functions as normal code. The compiler/linker will take care of the details.

Vital interrupt handlers can be programmed in RAM, or another available memory area in the same way. Less critical interrupts can be disabled during flash programming/erasing.

If the CPU has cache support, you may be able to run the code from flash, as long as you can guarantee all the code is actually in the cache (some CPUs allow code to be locked in the cache).

Reply to
Arlet

I'm going to completely rewrite and test 17,000 lines of realtime assembly code (figure it'll be about 6-8 Kloc when I'm done) in the remaining days of November, and integrate it into an FPGA-managed, Ethernet equipped, DDS-clocked, picosecond-resolution timing box, and I'm going to get it right and elegant besides. One of my guys is concurrently redesigning the FPGA mess, in tight sync with the new uP code. Today's task is to redo the main program loop, the serial interrupt handler, and the command parser, taking time out only for beers and burgers with a couple of guys at the Beach Chalet. I've got a number of OEM customers waiting to design this box into their systems, and we have great hopes for this one.

What are you up to lately?

John

Reply to
John Larkin

Sheeesh! I doubt that Fred can tie his own shoes. He's just a blow-hard. Really good at criticizing others, but anything over 5 transistors is well beyond his skill set ;-)

...Jim Thompson

--
|  James E.Thompson, P.E.                           |    mens     |
|  Analog Innovations, Inc.                         |     et      |
 Click to see the full signature
Reply to
Jim Thompson

Arlet, that sounds somewhat familiar. Did you have a look at this stuff in the past 15 years ? And the manuals still around ?

Rene

Reply to
Rene Tschaggelar

I'm programming in absolute assembly, no linkers or anything like that. The CPU is a 68K, so writing position-independent routines and relocating/running them dynamically is easy. The amount of code that has to be run in ram is actually tiny. If I intend to reflash, I'll do a hard reset and kill everything first.

We bought a clamshell adapter for our programmer so production can program the entire flash chip and then solder it to the board. If it's OK, we ship it. The first flash block will be a boot manager, so if we ever need to change the app code, we can connect it serially to a laptop, start up a ping program, power cycle the box, and the pc can seize control of the boot-block program and potentially reflash the application code. If the boot program doesn't get pinged, it starts up the application. The intent is that the boot block itself never change.

The ultimate fallback is to connect the bdm pod, which would let us reflash everything, boot block too, and make a clean start.

All this makes me nostalgic for plugging eproms into sockets. But, as Fred says, I am incapable of hierarchal thinking.

John

Reply to
John Larkin

Back when RTL gates were just starting to be available (rotten crap they were, too) my boss told me that on IC's, someday transistors would cost less than a penny each. I thought he was nuts. The 4 mbit flash chip is fairly expensive by current standards, maybe 1e7 transistors for $2.85.

John

Reply to
John Larkin

Rene,

Actually, yes. Even for a current project, using GCC, I needed to modify the linker scripts. For the GCC toolchain, there's the binutils documentation:

formatting link

Like I said, it's easiest to grab one of the supplied linker scripts that come with the tool installation, make a local copy, and modify that for a particular project.

I've also done this for the ARM ADS suite a couple of years ago. The syntax is a bit different, but the concepts are the same.

Unfortunately, there's no standard linker script definition, so you'll have to consult the documentation of your particular tools.

Reply to
Arlet

Hmm, sounds like a walk in the park. Ever tried to modify a similar sized program written by a lumber-jack?

I think you have already figured the whole thing out by yourself. The whole way of erasing / programming is kind of standard anyway. You can try to look into the AMD or Intel datasheets for similar devices to see if they make more sense to you. I recall the AMD datasheets have some examples.

--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl
Reply to
Nico Coesel

You mention FPGA later, so if you are hooking up pairs of these to get x32, keep in mind you get to program both halves in parallel.

Also, beware of the various block- and chip-level write protects. The commands to turn these on are typically short, and easy for runaway code (17,000 lines of asm!) to lock a few blocks and leave you pulling your hair out when erase/write cycles don't verify.

Well, like most flash, you can only program 1 bits to 0 bits. 0 bits can only be "erased" back to 1s. So you can always go back and clear bits with subsequent writes.

I've never seen one take remotely that long, but maybe this one is special.

It's right below the table.

These sequences are common to many flash chips, and you can google yourself a raft of examples by searching for 0xAA 0x555 (for C, try hAA and h555 for asm ;-)

There's a "fast" mode where you can enable a shorter write sequence. That's what most bulk programming routines use.

--
Ben Jackson AD7GD

http://www.ben.com/
Reply to
Ben Jackson

At the other end is excessively hierarchical partitioning of information. For example, John's application would be, in most of the industrial-military complex, "solved" with a cluster of 40 Windows servers each running their own special version of some relational database and developed by independent teams of foreign contractors :-).

In other words, if his worst problem is stupid FLASH, he's a winner in my book!

Tim.

Reply to
Tim Shoppa

I have deliberately chosen to do deep-embedded products that have no user interface, limited connectivity, and bog-simple microprocessor code. Given a choice of selling...

A benchtop instrument with front panel, display, user interface, serial and network connections, power supply, enclosure, fan, six PC boards, Windows drivers, LabView drivers, lead-free, UL/FCC/CE stickers, five man-years of engineering, and that sells for $900,

or

A VME module: one PC board, four LEDs, user interface = dipswitch, one manual summarizing register functions, over in three months, and that sells for $5200,

we have chosen to go the "stupid" route.

I wonder what sort of sophisticated stuff Fred designs.

John

Reply to
John Larkin

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.