ARM's v7 MMU

Hi,

Any pointers as to idiosyncrasies in ARM's v7 MMU? (no, I'm not looking for information as to how to *use* it; rather, pointers regarding any "unexpected behaviors" that I might encounter -- especially when mixing page sizes, etc.)

Also, any pointers to particular silicon to avoid/favor in terms of potential problems in the MMU implementation?

Thx!

--don

Reply to
Don Y
Loading thread data ...

There are 4 indirect things I've come across:

1) As you will be aware, the whole caching/buffering subsystem was totally reworked for ARMv6 and the ARMv5 subsystem is no longer supported in ARMv7. I've found the configuration on memory/device regions is much more sensitive/fragile than it is with ARMv5 devices when the MMU is enabled.

A specific example: if you are experiencing device lockups when enabling the MMU, try changing the device type attributes in the paging table for the peripheral region.

2) If you are using the MMU on a device with Security Extensions enabled, don't forget that some register bits which are otherwise R/W become R/O in Non-Secure mode.

3) Don't forget that on ARMv7 class devices, some register updates may be posted across a bus meaning they are not updated immediately. When you turn on instruction and data caching, interrupt handling code can run fast enough that you get a race condition with the interrupt hardware firing the interrupt for a second time unless you use the usual DSB instructions.

I've seen this happen on the AM3359.

4) There's no longer any way to invalidate the whole data cache in one go. You now have to do it by MVA or set/way.

The AM3359 in the Beaglebone Black caused me way more trouble than the Allwinner A10s did. However, the AM3359 is heavily documented (unlike the Chinese jobs... :-()

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

Hi Don,

if on that version they still have that ridiculous MMU tagging pages by logical address - so you have to flush all caches etc. on task switch - may be your best chance is to simply disable it, if you have the option (I don't know ARM). Or switch to a power architecture processor, their MMUs work OK.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Reply to
Dimiter_Popoff

The problem is the caches on ARM don't work unless the MMU is enabled.

Can you pick up capable battery operated small Power boards for about 20-30 British pounds ?

You can for ARM but you can't for Power (at least the last time I checked) which is why there's a hobbyist and experimenting ecosystem around ARM but there isn't around Power.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

I know, for whatever reason Power is kept out of reach for the hobbyist market. There are (very) powerful chips which allow sub-$100 boards but that's about all.

However, I don't think that would stop Don, my guess is he is just looking for the cheapest hardware which will do the job for him - which may well be ARM based but then may be not.

Dimiter

Reply to
Dimiter_Popoff

Yeah, I really would have liked the "tiny" page size (actually, even tiny *quarterpages*!) Aside from the (slight) performance gain, the sections/supersections I'd gladly trade away in that case!

Is this just a case of "doing extra homework" (i.e., making sure you understand the repercussions of each flag setting)? Or, do certain targets behave differently (thus *requiring* different settings)?

I assume you mean beyond the obvious "make sure the page is wired down", cacheability setting, etc.?

Said another way (for all of the above), when you discover(ed) the source of the problem, did you slap your head and utter "D'oh!" (i.e., "damn, I should have known better!") *or* did you find yourself uncomfortably wondering why *that* fixed the problem?

[The former I can deal with; the latter would leave me anxious!]

OK

I'm not sure I understand your point. Can you embelish an example?

Hmmm.... this could be annoying. OTOH, there are few cases where I would need to invalidate more than just a cache line, "typically". So, the extra cost/complexity may disappear in practice.

Was the "trouble" attributable to "learning curve"? I.e., did the A10 benefit from "previous experience" on the BB?

Thanks!

--don

Reply to
Don Y

I'm not really interested in (ready-made) "boards" but the point is the same -- I want inexpensive and low power (that also tends to suggest a high level of integration).

My power budget per node (including all "I/O loads") is ~10W. In some cases, much of that 10W is I/O so the processor needs to be in the 1-2W ballpark.

There's (I think) also a bigger selection (vendors, configurations) with ARM.

--don

Reply to
Don Y

Yes, I'd like to keep cost and power requirements down. OTOH, I am (now) trying to cut some (development) corners.

My original design would have required me to create *three* different RTOS's with compatible features/capabilities as they would execute on targets at different price/complexity points (e.g., "Intel", Cortex-A and Cortex-M). Hard to get such a heterogeneous system to "play nice together" :<

OTOH, if I can 86 (ha!) the Intel targets, that gives me another degree of freedom in the design (and, *forces* me to steer clear of that ever-changing platform!).

Now, I'm trying to rationalize replacing the Cortex-M devices with (more expensive) Cortex-A's... just to eliminate yet another variation and have a truly homogeneous system! Size may prove to be a problem...

--don

Reply to
Don Y

The latter.

When I took the perfectly working settings from the A10s for the peripheral memory region to the BBB, the BBB locked up solid every time the MMU was enabled.

Turns out that on the AM3359, the peripheral memory region must be marked as shareable device or it simply will not work. Marking the region as non-shareable device caused a solid lockup every time. This was not a issue on the A10s.

Oh, yes. I went through all those (and more) before discovering the solution. I still cannot find anything which explains why the above is required on the AM3359 but not on the A10s.

The latter. I could not find anything in the ARM architecture manuals, the AM3359 TRM or other documents about why two Cortex-A8 MCUs behave so differently. That makes me nervous.

This is on the AM3359 with my own interrupt wrapper written in ARM assembly which is executed when the IRQ exception vector is triggered.

The IRQ interrupt wrapper determines which interrupt handler to call (UART, timer, etc) and calls it.

In the interrupt handler you write to a peripheral (say timer) register to say you have handled the interrupt and then return back to the IRQ interrupt wrapper.

The IRQ interrupt wrapper writes to the AM3359 interrupt registers telling it the interrupt controller can search for a new interrupt.

When both instruction and data caching is turned on, my code runs sufficiently fast that the write to the timer interrupt acknowledge register is still making it's way across the bus and the interrupt controller thinks the interrupt is still pending because there's no longer a coherent view of resources.

The solution is to use a Data Synchronisation Barrier (DSB) instruction sometime between writing the timer interrupt acknowledge register and telling the the interrupt controller it can look for a new interrupt.

If you read the AM3359 Technical Reference Manual, you will see the use of a DSB is discussed in relation to writing to the above mentioned interrupt controller register and the same reasoning can apply to the peripheral interrupt acknowledge registers as well.

The A10s, with it's poor documentation, came first for me.

I managed to figure things out on the A10s even with this poor documentation and I still got tripped up on the BBB when I later started playing with that.

Older ARM MCUs used to have such nice predictable behaviour...

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

Sorry, my bad. :< By "targets" I meant "regions of memory" (i.e., different I/O devices in the same system). It *appears* that the settings you eventually came up with work *universally* for all "(I/O) devices" within a given "MCU target" -- but, that the settings for MCU target #1 differ from those for MCU target #2.

Is this a correct assessment?

Do all of the "(I/O) devices" on that part fit in a single page/map? I.e., do you *replicate* the settings for the devices that reside at one part of the address space to devices that reside at other parts of the address space? (or, do you throw them all in a "section")

And, not likely you are going to have N other MCUs to compare against (to determine *which* of these is the "exception"). :<

No help from manufacturer? Forums?

Will the A10 "behave" if configured as the AM3359? Or, does your code make assumptions that require it to be configured thusly?

What are the design consequences of each configuration?

Agreed. At the very least, have it documented as a "bug"/anomaly so you can at least know that "they" are aware of it -- and, will either act to preserve this behavior *or* alert folks to any *changes* to it.

OK.

Ah... also makes sense.

Logical choice (all else being equal) is to do so in the dispatcher (as it allows the most time for any previous code to "complete")

Yes. I think the Cortex-A's are suffering from a desire to follow the "path" of other "big" (complex) processors (e.g., x86) along with all their cruft.

One other question: is your use of the MMU largely "static" (i.e., set it and forget it); somewhat dynamic (using it to create individual protection domains for different processes); or even more "esoteric"? The intent of this question being to see how likely other "races" and anomalies are likely to have been stumbled upon in your codebase.

Thanks!

--don

Reply to
Don Y

Yes, it is.

The major surprise was finding two single CPU Cortex-A8 MCUs having different requirements. There was nothing in the AM3359 material I have read which indicated that only one of the ARMv7 architecture level options for mapping peripheral address space was available on the AM3359 or that this was even a potential issue.

Yes. All the peripheral address space mappings have the same attributes.

I have not tried that, but may in the future.

Although I am a programmer/sys admin by day, that is on commercial systems, with typical commercial type programming and tasks.

My embedded work is purely a hobby and right now I am deeply into other hobbyist interests. :-)

BTW, the help from the manufacturer is in the form of their StarterWare example code library. Unfortunately, while _every_ other manufacturer of ARM MCUs I have come across gladly places their example code on their website for free download, TI have placed their _example_ code under bl**dy export control!!! :-(

I registered to download it and was denied access. TI support would not talk to me about granting access to the StarterWare kit unless I provided them with a range of personal information to establish my identity. (And BTW, this British guy living in the UK is _still_ annoyed about that.)

I recently discovered the StarterWare kit has been uploaded to GitHub and I cannot see _anything_ in there which has any restricted, NDA or security issues at all. :-(

Basically none I was aware of in single MCU systems. When adding MMU support to a existing A10s bare metal project, I basically read the ARMv7 architecture manual section in question before writing a single line of code, choose what looked like valid options for the MMU tables, and then wrote the code and mapping tables.

After a few silly issues, the code pretty much worked the first time. Based on what I read in the architecture manuals and, later, the AM3359 TRM, I had no reason to believe the same attributes would not work as-is on another Cortex-A8 MCU for the corresponding regions on that other MCU.

That's exactly where I placed it. :-)

I think ARM are starting to lose the clean-and-elegant approach which has served them so well up to now.

Largely static, with virtual to physical address mapping equivalence.

I have quite a bit of experience with bare metal code on earlier ARM MCUs, but I wanted to explore the Cortex-A8 at bare metal level, basically just to learn about it (and for fun :-)).

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

OK. Then, the next question (sorry, I don't necessarily expect you to have explored all of these options -- I'm just thinking aloud and wondering how you might opine regarding them) is: could the problem, perhaps, have been associated with a *particular* I/O device? Or, did you observe the problem plaguing *all* I/O's?

Hmmm... I've been approaching this from the other end: starting with the generic ARM documents before selecting a particular device (and then chasing down the manufacturer's docs *for* that device).

There are quite a few areas in the memory management description where they hand-wave and resort to "implementation defined" (i.e., Your Manufacturer May Vary) as a catchall. This was, in part, the reason prompting my initial query (i.e., how have folks been surprised by these "undocumented areas").

On my next re-read of those sections, I will keep your comments handy and see if I can "rationalize" them in the context of ARM's caveats...

Which returns to the question above (re: different I/O devices giving you problems vs. *all* of them)

[I assume each of the devices are multi-core?]

Understood. There is, also, always the incentive to "just get it to work" (and not really worry about the "why")

My opinion of TI has steadily declined over the last ~30+ years. But, that is true, perhaps, of most of the "legacy" semi houses. They seem oblivious to the issues that have allowed all these "upstarts" to nibble away at a market within which they used to be Leviathan! :<

(sigh) I expended a fair bit of effort to rid myself of their mailing and emailing lists. Very uncooperative. "Screw the customer" attitude.

Thankfully, email addresses can be discarded easily and you can always rent a *different* POBox (notifying ONLY those mail and email contacts with which you want to remain in contact)

OTOH, I can recall having a fair number of export issues in the past with projects and products that one would *think* would be unencumbered. :(

Perhaps they were just wanting to harvest "marketing" information under the guise of "security". Or, just some mindless dweeb who has been *told* to ask those questions

("Um, why do you need to know my name and address before you will

*show* me some shoes? You know, this OTHER vendor didn't annoy me with these questions; perhaps I should just shop *there*?")

Have you revisited the ARM docs to see if they shed extra light on your observations?

I suspect that's partially related to the RISC-CISC issue -- as you start demanding more performance, RISC starts acquiring more complex mechanisms instead of strictly adhering to the "RISC mantra".

:>

--don

Reply to
Don Y

I have not seen any indications that some devices in the peripheral address space need different MMU attributes from other devices within that same address space.

I was doing something really simple - outputting characters on a UART in polling mode - when I got the lockup.

I did exactly the same. I started with the virtual memory sections of the ARMv7 architecture manual, then the same sections in the Cortex-A8 architecure manual and then finally the specific information in the AM3359 TRM.

No. Both the A10s and AM3359 are single core devices.

Several times. I am yet to locate the tangential reference in the second sentence of the third paragraph on page 1234 which explains the issue. :-)

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley
[attrs elided]

Fair enough.

By "lockup", do you mean the UART stopped behaving properly (i.e., not like your code *expected* it to behave)? Or, did the processor actually stop fetching opcodes... ?

I.e., could the problem be explained/duplicated by the UART "disappearing" from the memory map?

OK.

Ah, OK.

Page 1234 in *which* document? I.e., the cited page in the Arch Ref Man (DDI 0406C.b) deals with floating point support... :-/

I'll try to look at this more later this week. I have some commitments to attend to over the next few days...

--don

Reply to
Don Y

One test involved a blinking LED on a GPIO line at the same time.

That LED stopped blinking as soon as the MMU was enabled, so this appears to have been a general lockup.

I guess my British humour was too subtle. :-)

The page number was not intended to be a literal page number.

I was making a comment/joke on the tendency of manufacturers to bury a critical insight into a device right in the middle of a huge document...

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

Or, equally so, the example too *probable*! :-/

Having written thousands of pages of documentation, I can attest that making information easily available to the folks likely to go looking for it is very difficult! (and rarely "rewarded")

Some folks need a document that you can slog through sequentially: do this, then do that. Other folks want a document organized as a "reference" of sorts -- that they can readily *remind* themselves of some particular fact. Still others are unconcerned with the "obvious" aspects of the object under discussion and, instead, want all the idiosyncrasies brought to the fore.

Perhaps we need entries in the index labeled "implementation defined"?

[There are sure a lot of them in just the memory management chapter of the arch ref man!]

At the end of the day, your point is well taken: just because something

*seems* like it SHOULD work, I should be prepared for it NOT to work and ready to tweek my expectations accordingly. :<

Thanks!

--don

Reply to
Don Y

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.