Low Dhrystone score on Tricore TC1796

Hello,

I am doing the Dhrystone benchmark on a TC1796A microcontroller from Infineon with different compilers (TASKING, Hightec-RT GNU GCC). Both compilers are set to produce highest speed optimized code, still I only get about 15 DMIPs with GNU GCC and about 20 DMIPs with TASKING. CPU is running at 150MHz, 75MHz system clock, programs run from external SRAM; no difference (as it should be) with or without connected Lauterbach Debugger. TASKING performs probably better because of the optimized libc, still: the difference is big and it seems overall very low compared in DMIPs per MHz (0.1 DMIPs per MHz) to other microcontrollers: e.g. Freescales MPC555 where DMIPs per MHz are at least over 1.

In fact my readings are so low that I think I either misconfigured something profoundly wrong or my board design is broken.

Has anyone done similar tests or has any comments?

Thanks, Bernhard

Reply to
bfroemel
Loading thread data ...

How fast is this external SRAM and how have you set-up your access to it? Does this cpu have caches (I don't know this chip) and are they enabled? If so, does your loop fit in the cache?

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)
Reply to
Stef

It's CY7C1041BV33, a 12ns asynchronous 16bit SRAM. 2 chips build

1Mbyte memory on the 32bit bus running at 75MHz. (
formatting link
) I played around with wait states and other access parameters, but it seems they are well chosen from the manufacturer (TTTech.com).

There is an instruction cache of 16Kbyte, no data caches.

I checked on the loops - they should fit in, but differences with or without instruction cache is superficial (maybe there is greater impact if run from flash).

Reply to
bfroemel

Can you do a comparison with the code (at least the core loops) running from internal RAM? As you know, there are several distinct banks of internal RAM, having different properties. Some execute faster than others.

Reply to
David R Brooks

Okay, I did that - not expecting much of a difference; So suddenly my DMIPS jump to 45.5, when I place all the .text into on-chip RAM - stack/heap are still in external SRAM, so there could be room for improvement. Thanks a lot! :) So, there is something wrong with my external SRAM configuration, or is this "normal" for those kind of microcontrollers? Currently, I have more experience with ARM, LEON3, NIOS2 and they had all very small on- chip but large off-chip RAM, whereas SRAM has been the fastest.

Reply to
bfroemel

Don't know the chip; but are you *sure* you have enabled the instruction cache? On e.g. ARM720, it is not easy and not done by default.

--

John Devereux
Reply to
John Devereux

It's *very* normal for the TC1796! Some of those internal banks have paths 8 words (256 bits) wide. See the context-management instructions for an example of this (you have, of course, put the context list in the proper RAM bank: there's just one intended for contexts.) You may get further speed-ups if you relocate code to the internal flash, & turn on code caching (if it isn't on already). Afaik, TC1796 is really intended to run all its code either in the internal flash, or in the so-called "scratchpad" RAM area. The real purpose of RAM-resident code (in this chip) is to enable you to self-reprogram the code flash: like most flash, it is inaccessible while a program/erase cycle is running.

Reply to
David R Brooks

Thanks for clearing this issue! The guy who wrote those default linker scripts either had no worries about execution speed or expected, like me, more performance from the external bus. CSA lists must be put into on-chip LDRAM on the TC1796, I didn't try otherwise. I'll certainly heed your advice to use on-chip memories.

By the way, David, through my searches (there are still not many posts about Tricore), I've seen that I missed a request from you about the sources of the GNU Hightec-RT toolchain back in December 2006. Now, they are finally offered on:

formatting link
Back then, I've been given only a password protected FTP URL which stopped to work a few days later.

Greetings, Bernhard

Reply to
bfroemel

formatting link

Many thanks, Bernhard! I've hunted high & low for those, & not (till now) been able to get them. Thanks again.

Reply to
David R Brooks

Back to the subject of making sure the code cache is on... where are you linking your code to? If you link to 0xA....... (e.g. 0xA2000000 for external SRAM, say) then this isn't cached. If you link to

0x8....... (e.g. 0x82000000) then this accesses it with the cache enabled. Check out page 9-6 of the user manual. There's no need to do this for internal SPRAM (0xd4000000).

Cheers Richard

Reply to
Richard (Hitex)

Arrgh, so that's the difference between segment 8 and 10... and I've thought that is generally controlled by the bypass bit CCBYP in PMI_CON0. Speed is little more than doubled (32 DMIPs instead of 15 without cache). Still not anywhere near to 1 DMIPs per MHz, but at least there is significant improvement. Thanks a lot!

Reply to
bfroemel

You need both: CCBYP turns on caching, and access from 0x8... uses it. The non-cached address space is mainly used when programming the flash: then you must access it directly.

Reply to
David R Brooks

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.