Integrated TFT controller in PIC MCUs

pozz · 2015-01-07T11:28:57+00:00

Microchip PIC32 MCUs can be used for graphics applications (as I read on their website), but I couldn't understand if they have a real integrated TFT controller (as in LPC MCUs by NXP) or a different thing. I would prefer NXP's versus Microchip's solutions for many reasons: - ubiquitous ARM core (instead of MIPS) - true integrated TFT controller - lower cost (I'm not sure on this) Both solutions offer a free to use graphics library: NXP delivers SEGGER (precompiled), Microchip a proprietary library (with source code). What do you think? The project will use a typical 4.3" 480x272 TFT display with RGB interface, and I'm interesting in developing a good HMI.

S

Simon Clubley 11 years ago

In that case, is there any information yet about whether the PIC32MM will still have PDIP variants and what the expected capabilities are for the PIC32MM ?

(Simon may have just found himself a new toy to look forward to. :-))

Thanks,

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

S

Simon Clubley 11 years ago

^^^

[snip]

Re-reading the above shows that this example is 68k, not the PowerPC architecture we were talking about. However, the same comments still apply.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

D

Dimiter_Popoff 11 years ago

I suppose so. Explains much of why I am that more efficient than that rest of the world, too, I suppose.

Writing an IRQ handler in C is outright poor programming but OK, let us assume someone is just not qualified to use assembly to do it properly (though I can hardly see how someone qualified to write a whole project in C will find it difficult to write a few - or a few tens of - lines in assembly).

This means just poor compiler work. Since the compiler knows what registers it will use there is no problem communicating that up/down the line such that only those used to be saved. If all compilers are that stupid well, their users get what they deserve.

OK, I have yet to see that really work for someone without major rework but let us assume the cliche is correct.

Not at all, VPA stands for "virtual processor assembly". The sources can be compiled for any architecture (using a compiler for it, obviously). I have used two so far - 68k and power (much of the DPS code prior to 2000 or so is written for 68k (CPU32) and compiled for power nowadays). I also did something close to it for a TI DSP, the 5420, but it was not completely VPA, would have been impractical, many of the 54xx registers are too specific and are used all the time - yet bears a huge resemblance.

You simply don't know what you are talking about here.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

S

Simon Clubley 11 years ago

We are obviously not going to agree on many things, so just an observation:

Have a look at the Linux source code and see how much common code there is between various devices (such as USB host controllers) which are used on a _wide_ range of architectures.

The Linux kernel is very nicely modular and reusable in that regard.

Now imagine having to write assembly language modules for each architecture those drivers are going to be used on instead of just writing it once in a higher level language and letting the toolchain generate the architecture specific code for each target for those drivers.

It sounds like your VPA is some kind of assembly language templating infrastructure.

What does the syntax look like and do you have a language reference manual which can be downloaded ?

What level of abstraction is possible ?

For example, can you implement high level language control structures or is it just some pseudo assembly language ?

Can you implement abstract data types in VPA ?

How would something like the following (untested!) look in VPA ?

========================================================================= struct sample_t { unsigned long int var1; unsigned char char1; };

struct sample_t search;

#define SAMPLE_SIZE 100 struct sample_t samples[SAMPLE_SIZE];

int main(int argc, char *argv[]) { unsigned long int i;

{code to populate samples and search would be placed here}

for(i=0; i

Vote

D

David Brown 11 years ago

I think there is a serious misunderstanding going on here.

A compiler with good support for interrupts on the given target will /not/ save or restore more registers than it has to, when it knows all about the code in use. When it does not know everything about the code, it must save and restore registers according to assumptions about their usage - possibly saving and restoring /everything/, if it cannot make reasonable assumptions. And this save and restore /will/ take more time and space if you have more registers - but the relevance and significance of that will depend on the circumstances.

/Exactly the same applies in C and assembly./

If you are writing an interrupt function that calls other code, you will have to assume that it may change any "volatile" registers - and thus you will have to save and restore them around the function call. That applies in C and assembly - but in assembly you might have your own non-standard ABI that affects which registers are "volatile". In assembly, you may also know exactly which registers the called function uses, and use that in optimisation. In C, the compiler may also have this knowledge (if the definition of the called function is in the same module, or you are using link-time optimisation), and it can take advantage of it.

If you are writing an interrupt function that is self-contained, then either an assembly programmer or a C compiler will only save and restore registers that are needed by the function.

It is perfectly reasonable to expect a C compiler to generate interrupt code of the type posted by Dimiter. Equally, it is perfectly reasonable to expect an assembly programmer to write interrupt code that stacks a range of registers if it has to call arbitrary external code.

Vote

A

Anders.Montonen 11 years ago

I haven't seen any info that would reveal the packaging, but going by the compiler header files, the smallest chip (PIC32MM0016GPL020) has only 16 GPIOs, up to 28 in the largest (PIC32MM0064GPL036). Flash size ranges from 16 to 64K, RAM from 4 to 8K. Contrary to what I wrote above, in other support files two register sets are mentioned. No hints on performance, but there doesn't seem to be a prefetch cache so I would guess it will run at similar speeds to the 1xx/2xx.

All the SFR definitions are included in the processor header files in XC32 1.34, if you want to look at what's there.

-a

Vote

D

David Brown 11 years ago

We are having a discussion and exchange of ideas - there is no need to prove anything unless someone feels there must be a "winner" here.

Your numbers apply to your example - different numbers apply to different situations.

The system I had with that rate of timer interrupt had to do quite a lot of work in each interrupt - most registers were saved, because most were used.

I am well aware of the difference between task switching and interrupts. But when you have a larger interrupt function (i.e., not a small one that can use a minimal number of registers), it is perfectly reasonable to call it a context switch - you switch from the current task context to the interrupt context, and then (depending on the type of system and the interrupt function) you switch back to the same task, or a different one.

Perhaps you think that you only need to save lots of registers during a task context switch? Usually an interrupt (without a task switch) does not involve saving /all/ the registers, but it can still involve saving many of them.

Large processors are optimised for steady throughput with few unexpected changes of flow - small processors in microcontrollers are optimised for faster and more consistent timings on changes. There is a reason why some processors are made with long pipelines and multi-layer caches, while others are made with very short pipelines and no caches - or why there are devices made that combine a Cortex M3/M4 core alongside one or more Cortex A cores.

I have written many hundreds of systems - I have no doubt you have also written vast numbers, as have others here. I have used something like a dozen different processor architectures in embedded systems. There is no way to give details of everything - so some generalisations are unavoidable. But it is certainly the case that details vary wildly between systems - thus particular examples can be interesting, but do not necessarily show common behaviour.

(I've snipped the rest - not because it was not relevant or interesting, but because I don't have the time to give a full response, and I think we are going around in circles. I also feel this discussion looks like we are arguing opposites, when in fact I agree with a fair number of your points - I am almost certainly expressing myself rather poorly, and don't want to continue doing so.)

Vote

D

Dimiter_Popoff 11 years ago

Thanks for clarifying that, David. Seemed like the obvious thing to expect to me but being told the opposite I had begun to question my state - am I in some dream or what.... :-)

Dimiter

Vote

S

Simon Clubley 11 years ago

You need to read the rest of David's comment as well. David's talking above about the specific case when a compiler knows what registers it uses _and_ can generate _all_ the interrupt handling code as well.

I've been talking about the general case when you don't have that knowledge or your initial IRQ interrupt handling code is part of a general framework.

For example, you could have a hand-written assembly language IRQ infrastructure which does things like priority nesting and then dispatches to a device specific C language handler, the address of which has been entered into a generic interrupt dispatch table during startup.

There's no way that generic assembly language wrapper is going to know what registers to save so it saves all the ones the C language handler could potentially use and which can't be otherwise preserved by (say) switching to some other execution mode to execute the C language handler.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

S

Simon Clubley 11 years ago

Thanks.

I was hoping they were going to go in the other direction for the PIC32MM PDIP packages with more onboard memory resources and hence continue the trend they have started with the recent PIC32MX PDIP packages.

OTOH, if the above MCUs are priced at the (say) Cortex-M0/8-bit MCU price points, along with (say) a PDIP 20 pin package, then I can see myself using them in situations I wouldn't normally consider using the PIC32MX for.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

D

Dimiter_Popoff 11 years ago

There is no such general case. Not calling unknown code from an interrupt handling routine is basic programming literacy, whoever does it produces just garbage.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

D

David Brown 11 years ago

I think this is another case of looking at different situations - and of you, Dimiter, generalising from the way /you/ structure your systems. There are also several related but distinct meanings of "unknown" here, which should be distinguished.

Some embedded programs are entirely self-contained, and written, documented and maintained by one person (or at least a very small, tight-knit team). In such cases, you know everything about the system - you know which interrupts can happen at different times, you know where their priorities lie, you know how exactly what tasks you have, and you know /all/ the details of them. Perhaps you know that only one particular task uses floating point, and therefore you can avoid any stacking or restoring of floating point registers on interrupts or task switches.

Other systems are written in separate pieces, perhaps by different people or at different times. To be flexible, you make far fewer assumptions about what is running and what code you are using. You might write your UART receive interrupt routine to grab the next character and pass it on to a handler - without knowing if the handler just implements a buffer or if it handles message decoding or leads to task switches. You always preserve the volatile floating point registers, because you don't know whether tasks or interrupt functions use them or not.

Yes, Dimiter, such systems are used - they are used a great deal, and they are neither garbage nor programming illiteracy. They give a different balance between maximising program speed and efficiency, and maximising re-use, testability and modularity in development.

Regarding "unknown", I said there were many meanings. At the highest level, the programmer (or at least some programmers in the team) should know what the system is doing, and how things fit together - even though the person writing the interrupt wrapper may not know what the interrupt handler does.

In order to minimise the register stacking on an interrupt, you need to know what registers are "destroyed" by the code run on the interrupt. If you are writing in assembly, and have clear cooperation between the interrupt wrapper and the interrupt handler (perhaps they are integrated as one function), you know exactly which registers are destroyed. If you are writing in C, it is important that the /compiler/ has this knowledge - either it assumes that all "volatile" registers may be destroyed, or it can see the full source code (by link-time optimisation, or by having the wrapper and handler code in the same file) and use that to minimise the register stores. In extreme cases, handler code may be written by other people or with different tools, giving you very little knowledge of what is safe - regardless of the programming language.

Hopefully that makes it a little clearer that it can be true that you should not call "unknown code", while it is also true that the code being called might be "unknown"!

Vote

D

Dimiter_Popoff 11 years ago

Believe me I am not doing this. I have a broad enough view of what is done, how it is done and why. After all I have done myself most of what the others - MS, linux - do.

Of course there can be. What I said was obviously tied to the context (the "necessity" to save/restore all register when there is no need to do so nonsense we - amazingly - keep on talking about).

If you need a mechanism where the interrupts going through a certain vector can be branched somehow and passed to code not written at the moment of writing the exception handler nothing is stopping you to do it the way I have done it it DPS - the common IRQ code (where the processor goes straight from the exception vector) saves a few registers, say 4, branches through a lookup table and lands on the respective code, some of which may be written years later. The called code just saves all registers *it* will use, perhaps does not save those it knows are already saved by the caller (I think there were 4 of them, may be 5). When a new IRQ handler is introduced to the system it just sets its entry in the lookup table (which by default goes to the "spurious interrupt" handler).

Where does the need to save *all* registers by the caller come from? Doing so is outright stupid and demonstrates lack of basic programming literacy, if this is how they have done it in linux then well, they have either been sloppy or have to educate themselves and give it another try. We are not talking complex issues here.

Under the score I want to reiterate that all the lengthy comments about what is saved where and how *DO NOT* demonstrate any necessity whatsoever to save all registers in an interrupt handler, quite the opposite. So a core with 16 registers has NO advantage in terms of IRQ latency compared to a core with 32 registers, not in any case of proper use - which is what we were discussing.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

S

Simon Clubley 11 years ago

I think this is what you are reacting to from me:

|Once you do that, the IRQ wrapper needs to save all the registers the |C compiler could potentially use, including all the temporary registers, |before the wrapper calls the device specific handler.

I'm not really seeing how I could have rewritten that to be much better than it is above, other than to maybe to add a qualifier that says "all registers, apart from any callee saves registers as defined by the ABI" but I assume that's taken for granted and not required by anyone reading this here who will have read multiple ABI specifications.

This includes potentially having to save the floating point registers if you have any floating point code at all in your system and can't be sure your driver code won't use it.

I called out the temporary registers because they are caller saves in the ABIs I am familiar with (such as ARM EABI) and in the general case you have no way of knowing in your wrapper if a routine just uses (say) R0 or uses R0 through R3 so you have to save the lot.

I think David is right and this is people seeing different things in light of their own experiences. :-)

Well all registers apart from callee saves as defined by the ABI. It's because you generally have to conform to a published ABI.

You were talking earlier about the use of more registers. Here's one of the many ABIs for MIPS:

formatting link

Look at the number of caller saves versus callee saves registers.

If you want a _real_ heart attack :-), there's a more tangible example in the section of the PIC32MX Family Reference Manual that deals with interrupts (chapter 8). Look at the example prologue/epilogue code provided by Microchip for the single vector interrupt mode in examples

8-6 and 8-7 (at least in my local copy of the FRM dated from 2-3 years ago).

I'm not sure if I am allowed to post it here, so I won't.

It's done for interoperability purposes between code located in multiple modules written by different people.

BTW, the only mention of Linux I have done is to comment on it's modular design approach (even though it is still a monolithic kernel underneath. :-))

To followup from an earlier question, do you have a LRM for VPA available for download somewhere ? It would be interesting to read it.

Thanks,

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

A

Anders.Montonen 11 years ago

Going back to the Cortex-M, one of the (many) beautiful things about it is that it automatically saves all caller-save registers on the stack on interrupt entry. It even supports lazy stacking of floating-point registers.

This is the ISR wrapper I wrote for an RTOS I ported to the PIC32MX: You can compare it to the shadow register version directly below, and compare it to the Cortex-M, which does all the same, in hardware, in 12 clock cycles.

-a

Vote

D

Dimiter_Popoff 11 years ago

On 12.1.2015 ?. 22:46, Simon Clubley wrote:> On 2015-01-12, Dimiter_Popoff wrote: >> On 12.1.2015 ?. 11:32, David Brown wrote: >>> On 12/01/15 04:05, Dimiter_Popoff wrote: >>>> >>>> There is no such general case. Not calling unknown code from an >>>> interrupt handling routine is basic programming literacy, whoever >>>> does it produces just garbage. >>>> >>> >>> I think this is another case of looking at different situations - and of >>> you, Dimiter, generalising from the way /you/ structure your systems. >> >> Believe me I am not doing this. I have a broad enough view of what is >> done, how it is done and why. After all I have done myself most of what >> the others - MS, linux - do. >> >>> There are also several related but distinct meanings of "unknown" here, >>> which should be distinguished. >> >> Of course there can be. What I said was obviously tied to the >> context (the "necessity" to save/restore all register when there is >> no need to do so nonsense we - amazingly - keep on talking about). >> > > I think this is what you are reacting to from me: > > |Once you do that, the IRQ wrapper needs to save all the registers the > |C compiler could potentially use, including all the temporary registers, > |before the wrapper calls the device specific handler. > > I'm not really seeing how I could have rewritten that to be much better > than it is above, other than to maybe to add a qualifier that says "all > registers, apart from any callee saves registers as defined by the ABI" > but I assume that's taken for granted and not required by anyone reading > this here who will have read multiple ABI specifications. > > This includes potentially having to save the floating point registers if > you have any floating point code at all in your system and can't be sure > your driver code won't use it. > > I called out the temporary registers because they are caller saves in the > ABIs I am familiar with (such as ARM EABI) and in the general case you > have no way of knowing in your wrapper if a routine just uses (say) R0 or > uses R0 through R3 so you have to save the lot. > > I think David is right and this is people seeing different things in > light of their own experiences. :-) > >>> Other systems are written in separate pieces, perhaps by different >>> people or at different times. To be flexible, you make far fewer >>> assumptions about what is running and what code you are using. >>> ... >> >> If you need a mechanism where the interrupts going through a certain >> vector can be branched somehow and passed to code not written at the >> moment of writing the exception handler nothing is stopping you to do it >> the way I have done it it DPS - the common IRQ code (where the processor >> goes straight from the exception vector) saves a few registers, say 4, >> branches through a lookup table and lands on the respective code, >> some of which may be written years later. The called code just saves >> all registers *it* will use, perhaps does not save those it knows >> are already saved by the caller (I think there were 4 of them, may >> be 5). When a new IRQ handler is introduced to the system it just >> sets its entry in the lookup table (which by default goes to the >> "spurious interrupt" handler). >> >> Where does the need to save *all* registers by the caller come from? > > Well all registers apart from callee saves as defined by the ABI. It's > because you generally have to conform to a published ABI. > > You were talking earlier about the use of more registers. Here's one > of the many ABIs for MIPS: > >

formatting link

> > Look at the number of caller saves versus callee saves registers. > > If you want a _real_ heart attack :-), there's a more tangible example > in the section of the PIC32MX Family Reference Manual that deals with > interrupts (chapter 8). Look at the example prologue/epilogue code > provided by Microchip for the single vector interrupt mode in examples > 8-6 and 8-7 (at least in my local copy of the FRM dated from 2-3 years > ago). > > I'm not sure if I am allowed to post it here, so I won't. > >> Doing so is outright stupid and demonstrates lack of basic >> programming literacy, if this is how they have done it in linux >> then well, they have either been sloppy or have to educate >> themselves and give it another try. We are not talking complex >> issues here. >> > > It's done for interoperability purposes between code located in multiple > modules written by different people. > > BTW, the only mention of Linux I have done is to comment on it's > modular design approach (even though it is still a monolithic > kernel underneath. :-)) > > To followup from an earlier question, do you have a LRM for VPA > available for download somewhere ? It would be interesting to read it. > > Thanks, > > Simon. >

So eventually - a few tons of ink later - you also accept that it is not necessary to save all 32 registers of a 32 register core in an IRQ handler thus there is no advantage whatsoever in having only 16 registers - which was the whole point of the discussion. Well better late than never :-). Why 32 registers are a must on a load/store machine with a reasonably deep pipeline I already explained; thus my point that ARM with its

15 GPR-s is a crippled load/store architecture stays valid.

I never wanted to go into deeper detail on what this or that particular core does right or wrong, the whole point was the basic

32 vs. 16 (15 on ARM really) GPR-s.

No, there is not manual available for VPA at the moment as there are no machines on the market runnning DPS other than our spectrometry devices. Once I decide to make DPS, VPA and the whole thing separately marketable to compete with MS, linux and the like I will announce it loudly enough I suppose.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

D

Dimiter_Popoff 11 years ago

Back to 6800 (and perhaps 8080, not sure), eh? :-) Saving by hardware more than is absolutely necessary is a disadvantage, the first core of those I used to drop that was the 6809 (on its FIRQ vector, the rest still used to save everything).

I am not sure how they have implemented this but it can be double-edged. Perhaps not if there is a separate stack for FPU save/restore only with entries tagged as valid when registers have been saved on it so they can be just skipped when restoring (and maintain the SP). But if this is tied to the interrupt stack where also GPR-s are saved it can quickly become a mess I suppose (just musing, never explored in depth the idea really).

Here is how I deal with the FP registers in DPS (no hardware tricks). Tasks have a bit in their descriptors "FPU on/off" (on power it really duplicates the respective bit in the MSR). When a task switch is initiated if this bit is 0 the FPU registers are not saved/restored at all; if it is 1, they are all saved/restored (32 of them, 64 bits each). Since many applications (tasks) which use the FPU do so only occasionally and in short bursts, the FPU on/off switch is left accessible to them. On entry of some function/subroutine if it will use the FPU is calls "fpuon$" (which returns the state of the on/off bit prior to the call), does its job (including save/restore FPU registers which it uses) and upon return restores the fpu on/off bit to its former state (by either calling "fpuoff$" or not). Actually I can think of only one app which does not make use of this and uses the FPU all of its lifetime - it really does use it all the time though, can take up well over half of the CPU time alone.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

A

Anders.Montonen 11 years ago

The CPU reserves space in the stack frame for the caller-saved FPU registers, but does not push them unless FPU instructions are used.

-a

Vote

D

Dimiter_Popoff 11 years ago

This for all FPU registers? I.e. if you use this facility (probably you can switch it off?) once you touch a single FPU register the "push all FPU registers" begins?

Dimiter

Vote

D

David Brown 11 years ago

If I can summarise the arguments here, everyone accepts that you don't have to save more registers than you need, and (baring unusual cases) you only have to save /all/ registers during a task context switch. But it is common to have to save all "volatile" registers, of which there are more in PPC and MIPS than ARM - when you have more registers in the cpu, you /will/ do more unnecessary register saves and restores. Opinions differ wildly on the significance or importance of this.

For interrupts, function calls and context switches, it seems clear that

32 registers involves more saves and restores than 16 registers, but there is no convincing argument that this difference is relevant. This round is a tie in the great 16-reg vs. 32-reg battle, and we should move on to more interesting points.

This is, I think the more interesting point, which I do not believe has been covered properly. It is clear for any given function, having more registers is not give slower code than having fewer registers, all other things being equal. But will more registers give /significantly/ faster code? If so, under what circumstances is that the case? And how does it compare to using the same hardware space and/or opcode instruction space for other features?

When you have a deep pipeline and superscaler execution (which is not the case for most microcontroller cpus), you have to have a lot of data passing through the core to make full use of it, and lots of data "in flight" at a time. And since data has to pass through registers, that means lots of registers. But does that mean needing lots of /visible/ registers in the ISA? Deep pipeline superscaler processors invariably have register renaming, which obsoletes the need for many visible registers.

Without register renaming, you need to "manually" (i.e., either the assembly programmer or the compiler, rather than the cpu itself) assign registers in order to schedule and interleave reading new data in, doing calculations, and writing out the results to maximise the throughput - your aim is to avoid the key execution units having to wait for incoming data. But with register renaming, you can use the same register names all the way - the cpu handles the renaming and scheduling. The result is that the code is smaller, simpler, clearer, and more efficient for caching (especially if the cpu has a super-fast cache for small loops).

So if you have a PPC core such as the e200z7, with a 10-stage pipeline and dual issue execution unit, but no register renaming, you need more than 16 named registers to keep the execution units busy in hard calculations. But on a small ARM (Cortex-M3/M4) with a single-issue cpu and a three stage pipeline, 16 registers is sufficient. And on a large ARM (Cortex-A) with a multiple issue, deep pipeline core, 16 /named/ registers is /still/ sufficient because there are a large number of /unnamed/ registers for remapping.

Thus I think you are a very long way from being justified in claiming that the 16 registers in ARM make the architecture "crippled". There is certainly code for which 32 registers works better than 16 even when you have renaming, especially on larger processors, because you want to refer to more data at a time without having to reference the stack or other memory data. But that doesn't make the 16-register ARM "crippled".

Vote

Integrated TFT controller in PIC MCUs

Join the Discussion

Didn't find your answer?