Integrated TFT controller in PIC MCUs

Certainly it is possible to pick examples where 32 registers is more effective than 16 - but equally we can pick examples where 16 registers is more efficient (such as context switching, or code with a lot of small functions). Examples are illustrative, but not proof of a general rule.

(Regarding more than 32 general-purpose registers, I think the Itanium is the only cpu I know of with 128 integer registers and 128 floating point registers. It needs more registers for its EPIC architecture, but I don't think anyone would consider it an "optimum" design!)

Many load/store machines have some sort of compressed or limited instruction format for greater efficiency (especially of instruction cache). ARM has a couple of "thumb" modes, MIPs have an equivalent, and on the PPC I have seen various schemes.

Even in the full 32-bit ARM set, the bits "saved" by having fewer registers are used for the conditional execution bits and the barrel shifter.

Every bit of space in the instruction set is important. Using them to support extra registers may be the best overall tradeoff in some cases, but it is always a tradeoff.

It's a nice architecture (apart from the backwards bit numbering!). But it is a very complex architecture - for smaller devices (say 200 MHz, single core - microcontroller class cpus) a PPC core will be much bigger, more difficult to design and work with, and take more power than an ARM (or MIPS) core. On bigger devices, caches, interconnects, buses and the like dominate device size.

That makes no sense to me.

Many cpu implementations have some sort of bank switching or dedicated registers reserved for interrupt usage - it is seldom a full set of registers, but can be enough for very short interrupt routines. Beyond that, you must save other registers before using them - that applies regardless of the number of registers you have. And if your interrupt function calls external code unknown to the compiler (or programmer), you must save /all/ the registers. Twice as many registers means twice the time, space and bandwidth.

Don't forget that the way /you/ program using VPA sort-of-assembly is very different from the way the great majority of embedded developers write code. A lot of what you see as "fact" here is the result of your subjective experiences - there is no doubt that 32 registers suits your way of programming far better than 16 registers would. You have manual control of all your registers, and track what needs to be saved in different places, avoiding the need to save as many registers as a compiler would.

The time needed to save and restore all registers may not be relevant in a given application, but it is not irrelevant or negligible in all cases.

Reply to
David Brown
Loading thread data ...

In Thumb2 you work directly with 8 GP registers, indirectly with few like PC and SP, and accessing the rest of the GPRs is different and/or has penalties.

Just trying to say that it is a moot point. And personally, I never understood the existence of Cortex-M - why cripple the ability to switch to native 32-bit mode, if most or all of the underlying logic is there?

Reply to
Vladimir Ivanov

Of course different people have different needs. All I have been saying is that a person might have good reasons for not wanting to use a PIC32

- I am certainly not saying there are no good reasons why some people might prefer them over alternatives.

I don't think I am /confusing/ the families - I have merely been ignoring the details! Certainly complaints about the bugs in 480 Mbps USB only apply to the devices with that hardware - complaints about the development culture at Microchip are extrapolated from that.

(Based on a single post from a single user on another newsgroup, I gather that there are still problems with 480 Mbps USB. It would be fair to interpret such third-hand anecdotes as "I don't know", however.)

It's been a while since I looked at the details of MIPS - and I haven't used their newer architectures at all. So I can't help you on this one.

My understanding - based on a few articles and some MIPS white papers - is that it is at least competitive, and often better. Google will give you more information - but all I can say is that if the likes of NXP, TI or Freescale put MIPS cores in some of their microcontrollers, they should be in the same class as their ARM devices for price, performance, features and power. And then the competition would drive improvements for both core suppliers.

I don't think the PIC32 uses the most appropriate MIPS core - but I don't have the the details in my head. Maybe I'll read up about MIPS some more in the weekend.

Reply to
David Brown

Nope - I haven't needed a TFT controller, and can't say off-hand what devices would do the job for you. I can point you towards the Freescale Kinetis K70 family as good Cortex M4 devices with support for LCD displays, but that's the best I can do.

Reply to
David Brown

Vladimir Ivanov schreef op 09-Jan-15 om 10:54 AM:

To make small, cheap, low-power chips.

(to force all those obsolete 8-bit and 16-bit designs out of the market).

Wouter

Reply to
Wouter van Ooijen

As far as I understand it, accessing the other registers means 32-bit instructions rather than the short 16-bit instructions. So accessing them has penalties compared to accessing the faster registers, but not compared to normal ARM 32-bit instructions.

My knowledge of the details is weak, but AFAIK the only thing you really lose with Thumb2 compared to ARM instruction sets is the conditional execution flags - with ARM, you can use the flags with most instructions, while with Thumb2 you have the if-then-else construction. (You also lose the barrel shifter on some instructions, but that is not going to affect much code.)

With the original Thumb, ARM kept the normal 32-bit ARM ISA as well because for some types of code it could be significantly faster. But with Thumb2, there is almost no code for which the full 32-bit ARM instructions would beat the Thumb2, taking into account the memory bandwidth benefits of Thumb2.

Reply to
David Brown

STM32F429/439 has a TFT controller too, as do many of the NXP parts.

AFAICT that one is also the only available part that can cache the SDRAM. Useful for more bloated systems (QT, uC linux say). All the others seem to be ~5x slower executing from SDRAM than their internal flash.

The upcoming cortex M7 should be good in this respect.

--

John Devereux
Reply to
John Devereux

The high-speed USB interface of the PIC32MZ family is (AFAIK) fully functional, and doesn't have particularly significant errata. What happened was that when the chip was first released, Microchip completely botched the validation, and in the first couple of revisions of the errata sheet the peripheral was marked as not functional. The chip has significant other problems (esp. the ADC has been completely neutered) and I wouldn't particularly recommend it for anyone. The only thing it has going for it is that you can get 2MB flash and 512K RAM in a 64-pin package. Word is that Microchip are working on a redesign (PIC32MZA) due to be released some time this year.

For reference, here's the current PIC32MZ errata sheet:

My personal opinion of the PIC32 chips is that they feel clumsy, outdated and bogged down by historical MIPS baggage. The MIPS4K core used in the PIC32MX series looks like a response to the ARM7TDMI, and the MIPS14K core used in the PIC32MZ has some features bolted on as a reaction to the Cortex-M3. ImgTec only recently released a microcontroller core with an optional floating-point unit, no doubt a reaction to the Cortex-M4F.

-a

Reply to
Anders.Montonen

That's missing the point completely.

Reply to
Vladimir Ivanov

Yes, longer code sequences, and most likely very limited instruction forms. The latter leads to shuffling of data between the regular 8 GPRs and the other, "unregular" GPRs.

I am not Thumb2 expert, either. As a very strong personal (biased) opinion, I don't find it elegant at all. MIPS16e impressed me bit more with their EXTEND instruction.

What I am trying to communicate, is that the CPU core with all the blocks is there. Thumb2 is more or less a decoder, just like the ARM mode is. Same with MIPS32 and MIPS16e. Why would one cripple something by removing one of the decoders? The power savings are negligible.

ARM7TDMI was more balanced in that regard.

Any pointers to data showing this? Never heard of it so far, and does not reflect my experience.

Why'd they include ARM mode at all in the Cortex-A series? :-)

Reply to
Vladimir Ivanov

I have used both, cortex M3/M4 is just much nicer to program. The code is compact, and faster clock-for-clock than even 32-bit ARM7 code. No more convoluted assembly language wrappers everywhere, no "thumb interworking", "GLUE7" segments, no half a dozen system modes+stacks to worry about.

--

John Devereux
Reply to
John Devereux

The tagline for Thumb-2 is the performance of ARM with the code size of Thumb. It would be interesting to see a comprehensive benchmark comparing the two. The best I've found so far is an ARM presentation with some numbers for the EEMBC benchmarks[1], which shows Thumb-2 having 98% of the performance.

-a

[1]
Reply to
Anders.Montonen

This applies partially to old Thumb. Thumb2 is still shorter than 32 bit ARM code for the same task. The cost of r8-r15 use is two bytes in most instructions, but we are only in the length of regular 32-bit code in these expensive forms.

--

-Tauno Voipio
Reply to
Tauno Voipio

microMIPS has instructions for pushing and popping the callee-save registers onto the stack (LWM32/LWM16/SWM32/SWM16). This is notable in a way because MIPS have traditionally avoided committing an ABI into the architecture.

MIPS16e is much closer to Thumb. You only have a subset of the registers available, and no system control instructions. microMIPS is comparable to Thumb-2, and the idea is the same. Shrink the code size while retaining performance. MIPS32 support is optional for cores that support microMIPS. In fact, the latest version of Microchip's XC32 compiler includes support for an unreleased PIC32MM family which only supports microMIPS.

-a

Reply to
Anders.Montonen

You have yet to prove this point. Context switching is not a valid example, as I explained in my former post which you must have read prior to replying to (for those who have not, context switching is responsible for a fraction of a percent of CPU time, consequently halving or even completely eliminating that can bring a fraction of a percent improvement, i.e. it is negligible).

I would certainly be interested in the example you claim to be able to produce demonstrating how 16 registers can be more effective. My example covers a typical, widespread application - that of a FIR. Let us see yours.

I'd be interested to see those sub-32 bit opcode schemes you talk about on power, I have yet to encounter one of them. Certainly none of them is present on the cores I use or have investigated. (I am just curious, not in need of something like that).

Can you please elaborate on that. What can ARM do using the barrel shifter which you cannot do using the likes of rlwinm, rlwnm or rlwimi on power?

This kind of general talk leads nowhere here as you may have found out in previous discussions. If you cannot support a claim by a valid particular example you are saying nothing.

I explained that 16 registers are too few for a load/store machine, gave an example to make it easier to understand why (to be able to compensate for the pipeline delay).

You are just repeating your opinion basing it on nothing.

The bit numbering is simply big endian. I also have had my trouble with it of course but it is easy to get used to. In VPA I do use both - crazy as it may seem once you get used to think on it it is no longer an issue. But overall having big endian bit numbering on a big endian machine is the correct thing to do. If one chooses to use a power core in little endian mode much of the time one will be somewhat screwed I suppose.

Oh I agree power does not make sense on the smallest of MCUs, of course. In fact I don't think it makes much sense below a megabyte of RAM or so (but then that's me and is based on my needs so far, I am not claiming this to be some general rule).

Well I really cannot simplify the concept of saving say 4 out of 32 registers, using only them in an IRQ handler, then restoring only them and returning from the exception.

So show us one such case.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Reply to
Dimiter_Popoff

IBM has a scheme called CodePack, where compressed code is unpacked into L1 cache on misses. It's an interesting (and very different) approach, but it loses out on the improved cache utilization of schemes like Thumb-2 and microMIPS.

-a

Reply to
Anders.Montonen

Thanks, I had never seen that. Looks very different indeed - and interesting but probably impractical.

The comparisons are practically comparisons between the efficiency of the respective compilers/compiler libraries rather than of the machines.

The fact that a 16k instruction cache can be too short for some code demonstrates only how poor the code is - which is the norm nowadays, of course (and the reason why they try to defeat the messy programming by building hardware on top of it, which can only work short term but here we are).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Reply to
Dimiter_Popoff

In the general case, you have to push and pop all the registers every time you take an interrupt.

In your world you may not have to always do that, but for the general purpose case when you don't have absolute control of the code being called from the handler you do.

Even when you control the code being called from the handler, you still have to push all the registers the code could potentially use if it's written in a high level language.

Or to put this another way, your usage model when it comes to interrupt handlers is not the general usage model that most other people have to work with. :-)

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

!?

In the general case you do not, if you are the programmer.

Of course you can program any machine to a complete halt. Or just use a hammer to smash it, this will perhaps be an easier way.

Well if programming has deteriorated by *such* a degree I really do not have many people to converse with about programming, this much is obvious :-).

But this does not change the validity of the concept "save/restore only what you have to" when applied in the core to core comparison context.

My God, I really did not think things had gone *that* bad.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Reply to
Dimiter_Popoff

Some processors have multiple register sets that might avoid that. SPARC has register windows, such that they don't have to save to memory until all the windows are in use. I don't know if that is for interrupts, too.

-- glen

Reply to
glen herrmannsfeldt

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.