MCU mimicking a SPI flash slave

John Speth · 2017-06-14T17:44:00+00:00

Hi folks- I'm hoping to tap into your various experiences to see if what I'm thinking is practical. Our customer has a device into which a Datakey is plugged to extract data stored in the device. The Datakey is a really just a trade name for a SPI flash with ergonomic features that resemble a real key (see Our customer would like to replace the Datakey with a new design that will transmit the data wirelessly instead of storing it to the Datakey SPI flash. We're proposing a design based on an MCU which will appear as a SPI slave flash device, store the data pumped to it from the customer's device, and forward it wirelessly. I'm wondering if it's practical to accomplish it. The firmware would have to be totally responsive temporally and in data content. That sounds like a tough hill to climb that requires getting the interface just right, nearly perfect. I see a lot of pitfalls that could make the effort a dead end. Does anybody have any success or failure stories to relate that would help us gauge the feasibility of the proposed design? Thanks - John Speth

R

rickman 9 years ago

Maybe I don't understand the speed of the XMOS. Nothing you write above says to me it is any faster than the 700 MIPS processor I considered. Nothing written above would allow it to perform the task any differently than the bit banging approach I initially considered.

So what are you confused about?

Maybe you don't understand the problem. A bit serial data port (two bits wide actually, but that is not relevant) provides data, address and a two bit command word serially. The last bit of the command work is indicated by a "command" signal going high during that bit. Data is clocked in on the rising edge of the clock. When the command word is a read, the read data is provided serially starting on the subsequent falling edge. This provides 15 ns from rising edge to falling edge. It is possible to prefetch the read data based on the shifted address on each rising edge until the command signal is asserted. This helps with timing of the memory fetch, but still, that data has to be presented to the serial port output in 15 ns and updated every 30 ns.

I'd like to see code that will make this work. Or maybe you weren't addressing my previous application? For the OP, can the XMOS emulate a 30 Mbps SPI port?

I'm referring to bit I/O using the same 50 MHz clock.

That's a lot more than a "short" gap in the context of a fast clock.

I don't, but not everyone has *your* constraints. If a processor can't be sold in a low cost product, it cuts off the largest volume parts of the market including the app I am looking at presently.

Rick C

Vote

J

John Speth 9 years ago

Thanks all for the suggestions and stories. We decided to take the safe route and design in a SPI flash with an MCU-controlled switch that will switch between external and internal access. We figured the expense and certainty of the HW design is less than the time and uncertainty of the SW design.

We figured that with enough time end effort plus a worthy DMA engine, we could make an MCU SPI slave look like SPI flash, but with some yet-to-be-learned challenge. We didn't have the time.

JJS

Vote

D

David Brown 9 years ago

The XMOS devices clock at 500 MHz, with a dual-issue CPU for 1000 MIPS. And it really is 1000 MIPS - all code in data is in single-cycle SRAM. There is no cache, no prefetches, no branch prediction - everything is as fully deterministic as on a simple 8-bit microcontroller. But that does require at least 5 hardware threads - the maximum for a single thread is 100 MHz, 200 MIPS. (Some devices run at 400 MHz, and therefore fewer MIPS. And older devices have only single-issue CPUs.)

The hardware timers, input/outputs, and serial-to-parallel and parallel-to-serial converters attached to the IO pins can handle 10 ns timing.

Of course, no one had mentioned that so far in this thread - so unless you were familiar with the devices, you would not know that.

The XMOS has parallel/serial converters for every GPIO pin. For an application like this, you would use an 8-bit SERDES on the MISO and MOSI pins, and use the clock pin to trigger the transfers.

You get a lot for your money with an XMOS - but not every application needs that power, and then your money is wasted. They are certainly not as cheap as a small microcontroller, but if the alternative is an FPGA, they suddenly look much better value.

Vote

R

rickman 9 years ago

I think that depends on the design. Perhaps you are familiar with the GA144 with 144 processors at a cost of around $0.10 per CPU. They took a similar route with NO dedicated I/O hardware other than a SERDES receiver and transmitter pair.

It is expected that all I/O would be through software. Along that line the chip boots through one of three I/O ports, async serial, SPI serial, a 2 wire interface and I believe there is a 1 wire interface but I'm not certain. Three of the CPUs can be ganged to form a parallel port/memory interface, two CPUs with 18 bits of I/O and the third 4 bits. All of this is controlled by software.

I find the device has significant limitations overall, but certainly with a peak execution rate of 700 MIPS there is a lot of potential. Much like no one focuses on the idea that using the 4 input LUT of an FPGA as an inverter is excessively wasteful, the GA144 with its 10 cent CPUs gets us out of the thinking that using a CPU as a UART is wasteful.

Not trying to be negative, but I see the $2 cost of an XMOS CPU as being excessive and wasteful. Heck, you can buy complete MCU devices for a fraction of that price.

The question is whether it is worth investing the time and energy into learning the chip if you don't focus your work in this realm. Personally I find the FPGA approach covers a *lot* of ground that others don't see and the region left that is not so easy to address with either FPGAs or more conventional CPUs is very limited. If the XMOS isn't a good price fit, I most likely would just go with a small FPGA. I saw the XMOS has a 12 bit, 1 MSPS ADC which is nice. But again, this only makes its range of good fit slightly larger.

I prefer Forth for embedded work. I believe there is a Forth available. I don't know if it captures any of the flavor of CSP, mostly because I know little about CSP. I did use Occam some time ago. Mostly I recall it had a lot of constraints on what the programmer was allowed to do. The project we were using the Transputer on programmed it all in C.

Rick C

Vote

R

rickman 9 years ago

I think you are saying the CPU can do things with a 10 ns time resolution, no? That is the relevant number for this if bit banging the I/O port. I assume the "dual issue" can't simultaneously execute instructions where one depends on the result from the other?

Serial to parallel converters may not help with this design. Data is 8 bits, address is 8 bits, command is 2 bits, the incoming data path is 2 bits wide, outgoing data path is 1 bit wide. The design was made to be tolerant of extraneous clock edges between words. The end of the serial transfer was flagged by a CMD signal going high on the 2 bit command word transfer. I don't see how an 8 bit serial shift register would help receiving the input data even if it were only 1 bit wide (or you use two CPUs) since you can't rely on the clock count to always be right, *plus* you get a single clock with 2 input bits and a flag to indicate the end of the input transfer. Then you have 15 ns (minus setup and hold time on the output pin) to fetch the data and start shifting it out.

I recall even in the FPGA I was reading the output of a register mux which then had to feed a shift register. I used another 1 bit mux to select the output of the mux for the first bit and the output of the shift register for the remaining bits *and* the timing was tight.

The data ports of the design had serial interfaces up to 50 MHz, time correlated to a CODEC. The CODEC received a time code which was transmitted along with the digital data in packets over IP. The same board on the other end received the packets and reconstructed the data and time stamp.

Once an FPGA was on the board there was no reason to use a CPU, although I would have liked to have a hybrid chip with about 1000-4 input LUTs and a moderate CPU or DSP even. Add a 16 bit stereo CODEC and it would be perfect!

I wonder why they can't make lower cost versions? The GA144 has 144 processors, $15 @ qty 1. It's not even a modern process node, 150 or 180 nm I think, 100 times more area than what they are using today.

Rick C

Vote

R

rickman 9 years ago

I will say that while you will have to meet the specs of the data flash as presented, I am pretty sure the data flash is meeting the same specs with a processor. But they tailor the CPU to work optimally and tailor the specs to match the limitations of the CPU while you don't have any of that flexibility.

Rick C

Vote

T

Tom Gardner 9 years ago

A 30000ft opverview:

formatting link

What are the *guaranteed* timings and latencies? Include all possible disturbances due to cache/TLB misses, branch predictions and interrupts. Include simultaneous USB comms to a host PC.

N.B. "guaranteed" /precludes/ measurements to see what is happening. "Guaranteed" requires accurate /prediction/ *before* the code executes.

Each i/o pin can be clocked at up to 250Mb/s. That clock can come from an internal clock (up to 500MHz) or an external pin. The latter sounds relevant for your case. (Strobed and master/slave interfaces are also directly supported in hardware and software).

Each I/O pin has SERDES registers, so the data rate processed by a core can be reduced by a factor of 32.

So yes, it does look like a /small/ fraction of an xCORE device could very comfortably support that speed.

Sigh.

My application only stops for a few microseconds when it is convenient for my application, i.e. once every

1 or 10seconds at the end of a measurement cycle.

At other times it chunters away continuously at full rate without interruption - as guaranteed by design.

I don't think there are any surprises there.

But, again, your (current) requirement is only one perspective.

Vote

R

rickman 9 years ago

I know this is clear to you, but I'm not sure what processor you are talking about.

You still are not addressing the issue at hand. You are talking about raw I/O speeds and the problem is not about raw I/O speeds. The issue is interactivity. Stimulus followed by response in very short order. I have seen nothing to indicate the XMOS will work for this problem. That's why I asked for a code snippet.

Fine, but the fact that it will work for your needs does not mean it will work for mine. Again the requirement is to read the inputs looking for a command strobe, on finding that retrieve the appropriate word from memory and outputting it. The clock cycle is 30 ns and the total I/O time from command strobe read on the positive edge of the clock to the input of the device monitoring the output pin (with a 5 ns setup time) 15 ns. You haven't even addressed the output delays in the I/O pins.

I can't argue with that. But that is the need I have and low cost is not a very minor requirement. Processors are sold at much higher volumes for the low cost products. The higher cost, lower volume products often can be built with a wide range of solutions, again the selected device is often chosen as the one that meets the requirements at the lowest price. So shrugging off the cost issue of *my* current need is rather disingenuous.

Rick C

Vote

T

Tom Gardner 9 years ago

There are many interesting snippets, plus a description of how they fit together in the /very/ lucid xC tutuorial.

formatting link

You might care to look for the "pinseq/pinsneq" attribute of i/o ports, in conjunction with the ways of combining i/o pins, and port timers.

For more details of some i/o structures, see

formatting link

Of course I will shrug them off; they are of no direct interest to me.

You have a choice...

You could look at the XMOS devices see where they might complement the existing design options (principally FPGAs and conventional MCUs), and blur the boundaries between them.

Or you could find ways in which the devices couldn't /possibly/ do what existing options can do (cf the way h/w designers reacted to early microrocessors).

I can imagine that FPGA designers might see them as a /bit/ of a threat, and react accordingly.

Vote

R

rickman 9 years ago

A threat??? That makes no sense.

Rick C

Vote

D

David Brown 9 years ago

Yes, that sounds right.

It sounds like a particularly challenging design you have here. The XMOS will let you do many things an ordinary microcontroller cannot, but it can't do /everything/ ! There are some timing requirements that are impossible without programmable logic.

Often a soft processor can be enough for housekeeping, but an FPGA with a fast ARM core would be nice. Expensive, but nice.

I think Atmel/Microchip now have a microcontroller with a bit of programmable logic - I don't know how useful that might be. (Not for your application here, of course - neither the cpu nor the PLD part are powerful enough.)

I guess it is the usual matter - NRE and support costs have to be amortized. When the chip is not a big seller (and I don't imagine the GA144 is that popular), they have to make back their investment somehow.

Have you used the GA144? It sounds interesting, but I haven't thought of any applications for it.

Vote

D

David Brown 9 years ago

I had a little look at the GreenArrays website for the GA144. It appears to be pretty much a dead company. There is almost nothing happening - no new products, no new roadmaps, no new software, no new application notes. It is a highly specialised device, with a very unusual development process (Forth is 50 years old, and it looks it - and these devices use a weird variant of Forth). I think it would be a very risky gamble to use these devices for a real project, even though there is a lot of interesting stuff in the technology.

Certainly - working with XMOS means thinking a little differently. But it is not /nearly/ as different as the GA144.

If you have lots of experience with FPGA development, it is natural to look to FPGAs for solutions to design problems - and that is absolutely fine. It is impossible to jump on every different development method and be an expert at them all - and most problems can be solved in a variety of ways.

Available for what? The XMOS? I'd be surprised.

Occam has its own advantages and disadvantages independent of the use of CSP-style synchronisation and message passing.

I have looked at Forth a few times over the years, but I have yet to see a version that has changed for the better since I played with it as a teenager some 30 years ago. The stuff on the GA144 website is absurd - their "innovation" is that their IDE has colour highlighting despite looking like a reject from the days of MSDOS 3.3. Words are apparently only sensitive to the first "5 to 7 characters" - so "block" and "blocks" are the same identifier. (You would think they would /know/ if the limit were 5, 6 or 7 characters.) Everything is still designed around punch card formats of 8 by 64 characters.

I can appreciate that a stack machine design is ideal for a small processor, and can give very tight code. I can appreciate that this means a sort of Forth is the natural assembly language for the system. I can even appreciate that the RPN syntax, the interactivity, and the close-to-the-metal programming appeals to some people. But I cannot understand why this cannot be done with a modern language with decent typing, static checking, optimised compilation, structured syntax, etc.

Vote

S

Stephen Pelc 9 years ago

The versions shipped by the professional companies have changed a lot. The major commercial suppliers are Forth Inc and MicroProcessor Engineering. See

formatting link

Forth does not suit everyone, but it has changed a lot since you were a teenager.

Stephen

Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads

Vote

D

David Brown 9 years ago

I had a look. No, FORTH has not progressed that I can see (unless you think adding colour to the editor is a revolution). Some FORTH compilers might be good at producing optimised code on microcontrollers, but that is an improvement in the implementations, not the language.

Vote

S

Stephen Pelc 9 years ago

We use standard editors with syntax colouring files as we have done for decades. The professional Forth compilers, whether for microcontrollers or for the desktop, produce optimised native code.

The current Forth standard is Forth-2012. See:

formatting link

What you used 30 years ago did not include target code for USB stack FAT file system TCP/IP stack with HTTP, FTP and Telnet servers Embedded GUI and so on and so on

Having been in the Forth compiler business for a very long time, I can assure you that the tools and libraries supplied in this decade are vastly superior to those of 30 years ago.

Stephen

Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads

Vote

R

rickman 9 years ago

I think it would have been possible with the GA144 and the 700 MIPS peak rate, but it depended on the I/O timing which I couldn't get them to provide. They suggested I build one and test it! lol

Not sure what you mean by a "fast" ARM core, but ARMs combined with FPGAs are sold by three of the four FPGA companies.

You might be thinking of the PSOC devices from Cypress. They have either an

8051 type processor or an ARM CM3 with various programmable logic and analog. Not really an FPGA in any sense. They can be programmed in Verilog, but they are not terribly capable. Think of them as having highly flexible peripherals.

If you really mean the Atmel FPGAs, they are very old, very slow and very expensive. I don't consider them to be in the FPGA business, they are more in the obsolete device business like Rochester Electronics. The device line they had that included a small 8 bit processor is long gone and was never cost competitive.

I'm talking about the XMOS device. The GA144 could easily be sold cheaply if they use a more modern process and sold them in high volumes. But what is holding back XMOS from selling a $1 chip? My understanding is the CPU is normally a pretty small part of an MCU with memory being the lion's share of the real estate. Even having 8 CPUs shouldn't run the area and cost up since the chip is really all about the RAM. Is the RAM special in some way? I thought it was just fast and shared through multiplexing.

There are a number of issues with using the GA144 in a production design. Not the least is the lack of reliable supply. The company runs on a shoestring with minimal offices, encouraging free help from anyone interested in writing an app note. lol When they kicked off the GA144 there was a lot of interest from fairly fringe groups of designers (the assembly language is pretty much Forth) but I have yet to hear of any designs reaching production which is not the same thing as there being none. The production runs appear to be the minimum size test runs from the foundry. The chip is pretty small, so they get a *lot* from a wafer.

Rick C

Vote

R

rickman 9 years ago

Yeah, I don't know of any product using the GA144. I looked hard at using it in a production design where I needed to replace an EOL FPGA. Ignoring all the other issues, I wasn't sure it would meet the timing I've outlined in other posts in this thread. I needed info on the I/O timing and GA wouldn't provide it. They seemed to think I wanted to reverse engineer the transistor level design. Silly gooses.

I don't know why the age of a computer language is even a factor. I don't think C is a newcomer and is the most widely used programming language in embedded devices, no?

The GA144 is a stack processor and so the assembly language looks a lot like Forth which is based on a stack processor virtual machine. I'm not sure what is "weird" about it other than the fact that most programmers aren't familiar with stack programming other than Open Boot, Postscript, RPL and BibTeX.

"A little" is a *lot* larger learning curve than any other MCU I am aware of (the GA144 aside). My point is that can be worth it only if you do a lot of designs that would make use of its unique features. I'm not sure there really is a very large sweet spot given that the chips are not sold at the low end and other devices will do the same job using existing techniques and tools.

Yes, but you don't need to know a "variety" of ways of solving problems. You only need to know ways that are highly effective for most design problems.

The many misconceptions of FPGAs relegate them to situations where CPUs just can't cut the mustard. In reality they are very flexible and only limited by the lack of on chip peripherals. Microsemi adds more peripherals to their devices, but still don't compete directly with lower cost MCUs.

You seem obsessed with your perceptions of the UI rather than utility. I don't have a problem with large fonts. Most of the designers of the system are older and have poorer eyesight (a feature I share with them). The use of color to indicate aspects of the language is pretty much the same as the color highlighting I see in nearly every modern editor. The difference is that in ColorForth the highlighting is *part* of the language as it distinguishes when commands are executed. Some commands in Forth are executed at compile time rather than being compiled. This is one of the many powerful features of Forth. ColorForth pushes further to allow some commands to be executed at edit time. I have not studied it in detail, so I can't give you details on this.

I just want to explain how you are using very simplistic perceptions and expectations to "color" your impression of ColorForth without learning anything important about it.

You can't understand because you have not tried to learn about Forth. I can assure you there are a number of optimizing compilers for Forth. I don't know what you are seeing that you think Forth doesn't have "structured syntax". Is this different from the control flow structures?

I see Stephen Pelc responded to your posts. He is the primary author of VFX from MPE. Instead of throwing a tantrum about Forth "looking" like it is 30 years old, why not engage him and learn something?

Rick C

Vote

D

David Brown 9 years ago

By "fast", I mean "so much faster than strictly needed for the job in hand that you don't need to worry about speed" :-)

Alternatively, a Cortex-A core with Neon is "fast".

No, I mean the new Atmel XMega E series. They have a "custom logic module" with a couple of timers and some lookup tables. It is not an FPGA - it's just a bit of programmable logic, more akin to a built-in PLD.

No, I know about that line too (never used it, but I know of it).

It is a fast RAM - the one RAM block runs at 500 MHz single-cycle, and may be dual-ported. There is only one cpu on the smaller XMOS devices, with 8 hardware threads - larger devices have up to 4 cpus (and thus 32 threads). The IO pins have a fair amount of fast logic attached too.

But I don't know where the cost comes in. The XMOS devices are, I guess, a good deal more popular than the GA144 - but they are not mass market compared to popular Cortex-M microcontrollers.

Vote

D

David Brown 9 years ago

The reference to colour was for the "new" colorForth used by the GA144. I am not surprised you have syntax highlighting in your editors - /Forth/ may not have moved on, but your implementations of Forth toolchains seem top class and with modern features.

Those are all libraries provided by your implementation. That makes your implementation good and useful to users - it does not make the /language/ any better.

Again, you are missing my point entirely.

The /language/ has not changed. You are still stuck with a typeless system relying on programmers writing comments to describe a function's inputs and outputs. You are still stuck on doing everything with "cells", that are usually 16-bit or 32-bit, or double-cells - no standardised way of working with data of specific sizes. You are still stuck on a single word list, with 31 characters significance (the GA144 Forth is limited to "5 to 7" significant characters) - no modules, namespaces or other local naming. You are still tied to blocks of 16x64 characters.

Some details have changed, but the language has not.

Vote

D

David Brown 9 years ago

The age of a language itself is not important, of course. The type of features it has /are/ important.

Many things in computing have changed in the last 4 decades. Features of a language that were simply impossible at that time due to limited host computing power are entirely possible today. So modern languages do far more compile-time checking and optimisation now than was possible at that time. Good languages evolve to take advantage of newer possibilities - the C of today is not the same as the pre-K&R C of that period. The Forth of today appears to me to be pretty much the same - albeit with more optimising compilers and additional libraries.

Even for a stack machine, it is very limited. In some ways, the 4-bit MARC4 architecture was more powerful (it certainly had more program space).

But this is all first impressions from me - I have not used the devices, and I am /very/ out of practice with Forth.

For a programmer who is completely unfamiliar with FPGA and programmable logic, the XMOS is likely to be less of a leap than moving to an FPGA.

But I agree it is hard to find an application area for these devices - I have only had a couple of uses for them, and they probably were not ideal for those cases.

True.

FPGAs have their strengths and their weaknesses. There are lots of situations where they are far from ideal - but I agree that there are many misconceptions about them that might discourage people from starting to use them.

I am merely highlighting what the GA144 website seems to view as being modern innovations in the Forth toolchains.

If your eyesight is poor, use a bigger screen or a bigger or more legible font - that's fine. But it does not make sense to use that as the basis for designing your toolchain - just make the IDE configurable.

It is syntax highlighting.

You get that in other languages too. True, it is not always easy to determine what is done at compile time and what is done at run time, and the distinction may depend on optimisation flags.

But really, what you are describing here is like C++ with constexpr code shown in a different colour.

I've read the colorForth FAQ, such as it is. I also note that the website is dead.

I fully understand that there are good optimising Forth compilers and cross-compilers. But those are good /implementation/ - I am talking about the /language/.

I am not "throwing a tantrum" - I /am/ engaging in discussion (including with Stephen).

I am talking about how Forth appears to me. I have worked with a wide range of languages, including various functional programming languages, parallel programming languages, assembly languages, hardware design languages, high level languages, low level languages, and a little Forth long ago. I have worked through tutorials in APL and Prolog. I am not put off by strange syntaxes or having to think in a different manner. (It might put me off /using/ such languages for real work, however.) When I talk about how Forth appears to me, it is quite clear that the language has limited practicality for modern programming. And if that is /not/ the case, then it certainly has an image problem.

Vote

MCU mimicking a SPI flash slave

Join the Discussion

Didn't find your answer?