Friday afternoon C trivia:

- A
- ArarghMail911NOSPAM
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Nov 30, 2009 8:59 PM

Ok, I give up. Besides one accessing the array sequentially, and the other hopping around, what is the difference?

BTW, I am a 'desktop' type. :-)

And in years past, a minicomputer programmer. But I bet they are mostly gone, now. :-)

--
ArarghMail911 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

To reply by email, remove the extra stuff from the reply address.

- A
- Arlet
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 12:03 AM

The 2nd one, which is hopping around, may have really bad cache behavior

- A
- ArarghMail911NOSPAM
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 12:54 AM

That was pretty obvious, and mostly true for any random access.

Does mean to imply that embedded processors don't have any cache? :-)

--
ArarghMail911 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

To reply by email, remove the extra stuff from the reply address.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 1:06 AM

Yes, though that's not the real issue.

Most desktop environments (with non-toy OS's) use paged memory under a VM. As you hop around, you reference one page, then another, then the first page again, etc. (assuming MAX_I/MAX_J are big and/or the object (X) straddles a page boundary). As such, the system can thrash in low memory conditions -- swap out the page containing one x[][] and fault in the page containing the next referenced x[][]; then swap *that* page out and swap back in the previous page, etc.

The performance hit can be *huge*.

This behavior isn't common in most embedded systems because all memory is physical memory (note that this is becoming less true as embedded systems creep into bigger applications).

So, ALL ELSE BEING EQUAL (i.e., assume foo... has no side effects), why write code that could end up exhibiting this sort of behavior?

- P
- Pertti Kellomaki
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 8:35 AM

Hopefully the development system is not resource constrained, just the target. I know that a half-baked implementation can do stupid things like produce two instances of a template even though the resulting code is identical, but this is a quality of implementation issue, not something intrinsic to templates. Templating is a purely compile time phenomenon, so there is no reason why it should affect binary size at all.

In fact, I would argue that compilers should be able to exploit templating to reduce binary size. It is not unheard of to have functions with identical algorithmic behavior but different signatures. There may be compilers out there that aggressively merge such functions if they result in the same machine code, but clearly it would be more efficient to examine known instances of a template rather than to examine all pairs of functions.

--
Pertti

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 8:43 AM

However even gcc is able to optimize this away AIUI (gcc 4.4+, see

-f-loop-*, e.g. -floop-interchange).

--

John Devereux

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 9:20 AM

You'll note that there are many embedded system targets that GCC doesn't support :> And, you have to understand the nature of the problem in order to know/suspect that you would want/need to deal with it (if so, why not just code it "right" in the first place?)

But, the point being made was consistency in coding (style, practice, etc.). If "those who follow" can see this consistency throughout your "product" (your product is, after all, the code that you write!), then they can more easily pick up on what you are trying to do instead of getting distracted by syntax, etc. And, conversely, when they see you doing something *different*, they are alerted to the fact that something is "special", here. E.g., "Hmmm... why did he reverse his normal order of processing subscripts?"

- B
- Bill Davy
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 9:32 AM

Surely the first version allows the pointer x[i] to be calculated once, perhaps held in a register (even DPTR on an 8051) and the inner loop then simply increments that pointer. A good compiler would also then move j to a register (if it fits) and decrement it to zero - an idiom much used by old C hands.

Embedded with cache? That's' really luxurious!

- P
- Paul Carpenter
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 12:02 PM

In some embedded systems you may need to do this for operations on a captured image or part of it, for processing, so even in physical memory it can have an accumulative effect. If the object 'x' is small (or standard type char/short/int..) the processor is most likely to have simple INC instructions which require less fetches than an ADD larger number.

This may not seem a lot, but if you are scanning a block of image you can easily end up doing this thousands, if not hundreds of thousands of times.

Yes, it is easier to do some of these things in hardware, but if the function is only used a few times it is sometimes easier and cheaper to do this in software.

Many software compression of images routines require doing similar methods as the first, and the outer loop(s) may have large offsets between inner loop runs.

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk
    PC Services
 Timing Diagram Font
  GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
 For those web sites you hate

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 2:00 PM

I don't really disagree. But note that the ones GCC does not support - i.e. some crufty old 8 & 16 bit parts - are precisely those that would see no benefit anyway :)

Right now there is still a good argument for this type of manual optimisation, but this seems likely to disappear shortly IMO.

--

John Devereux

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 5:32 PM

I'm not looking at it as an optimization. That implies an after-the-fact activity. Rather, this is just a

*coding style*. Something DECIDED UPON ahead of time (for one reason or another -- I've just illustrated a particular reason that influences that choice) and then applied consistently UNLESS IT SHOULDN'T BE.

E.g., do you use:

for (i = foo; i < bar; i++)

or:

for (i = foo; i < bar; ++i)

as both are *functionally* equivalent. I suspect you CONSISTENTLY use one form, right? :>

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 6:10 PM

So far, all schemes of template management I've encountered had their drawbacks in practice.

The "Borland method" - placing all template instances in all object files, and sorting out duplicates at link time - so far has been the friendliest one, although it blows up object file sizes and link times, and in practice linkers don't manage to eliminate all dupes. Some methods appear a few hundred times in our final binary.

The "repository" method - putting template instances into separate object files at magic places -, and the "association" method - picking an object file using the template at random and placing the template instance there - have bad interaction with static libraries. We're using the same set of object files to build multiple binaries. When building a.exe, the compiler decides std::string goes into obj1.o, when building b.exe, it decides std::string goes into obj2.o - boom.

All these methods have the disadvantage that you'll never know where your code will end up. We're in comp.arch.embedded? Unlike on desktop environments, where all memory is equal, I occasionally need to assign specific code into specific sections, using a linker file. When I want to put a particular template instance into L1 memory, I don't know which object file to use. And when I want a specific object file, I don't know whether an unexpected template instance will fill up my precious L1.

Templates are instantiated to generate code. Unless you're using templates that are completely inlined and optimized away (which is indeed useful, and which I use all the time), of course it affects the binary size.

Do these compilers/linkers actually exist? I doubt it, and just hoping that they might exist someday doesn't help me today.

Such a feature would be pretty hard to do standards-conformant anyway. In particular, the compiler may *not* merge functions of identical signature, because programmers can expect this to work void (*foo)() = func1; assert(foo != func2); even if func1 and func2 have the same body / machine code. I'm not entirely sure whether it would be permitted to merge functions with different signatures, at least those cannot be compared without a cast.

Stefan

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 6:49 PM

I've seen C compilers that detect that the code at the end of one function is identical to the code at the end of another, and avoid this code duplication by making a cross-jump between the functions so that they share their common ending code. This is a bit on the way towards Pertti's suggestion. (However, I'm not sure if this was limited only to shared compiler-generated function epilogues.)

There is also some work on "clone detection" from source code, which IIRC at present aims mainly to find stolen code, or sloppy code that has lots of copy-paste parts, but could eventually show up in compilers as a code-size optimization.

Moving away from C/C++, some Ada compilers implement the Ada "generic" feature, which roughly corresponds to C++ templates, by creating a single copy of the generic code, parametrized with the instantiation stuff (constants, variables, types, operations). Each instance of the generic has its own parameter structure, but shares the code.

Merged functions could have different entry points, followed by a jump to the shared code. Still a significant code reduction in most cases.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 6:50 PM

OK, I see. Actually I can't recall using a 2 dimensional numeric array on an embedded system. For some reason it has never arisen, so I have not needed to think about it.

I often use arrays of strings, but these are explicitly arrays of pointers to arrays of characters so the question does not arise.

Sure. The former, for no good reason I can recall, I expect it was in K&R!.

--

John Devereux

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 7:24 PM

[attributions elided]

In many "desktop" applications, I have. So, instinctively apply what I learned there to my embedded work. Easier (IMO) to just "get into the habit" than to have to remember some little thing lke this *later*. (brain fade :> )

I use the latter. In this context, damn near any compiler should be able to generate the same code. But, the prefix operator is easier for (dumber) compilers to generate "better" code when used in *other* contexts (e.g., x = foo[++i] vs. x = foo[i++]). Again, it's just me forcing myself to use a style that will ALL ELSE BEING EQUAL err in my favor.

Many of these style issues really "rub me the wrong way" though I try hard to adhere to them. E.g., I *want* to use the postfix operator but force the use of the infix operator, instead.

Similarly, I break expressions so that operators *start* the continuation line(s) rather than *end* the preceding line despite the fact that it "feels" wrong. E.g.,

foo = bar + baz;

vs.

foo = bar + baz;

To each his own. I just think that consistency is

**really** important -- for yourself and those who follow.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 7:34 PM

Grrrr... s/infix/prefix/ -- though that hopefully would have been obvious. :<

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 7:44 PM

Seconded. I also like and use the former style: the operator goes on the same line as the right-hand operand. Especially nice for sums where some terms are added, other subtracted.

Luckily, this also "feels" right to me.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- 4
- 42Bastian Schick
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 7:52 PM

You're kinding ? Nearly all embedded PowerPCs have cache (types not sold quantity, here the 565 and 555 likely outnumber all other PowerPCs) Also most higher ARM CPUs, ColdFire, SH ....

(Hey, people use a controller with cache and want a deterministic RTOS ;-P

--
42Bastian
Do not email to bastian42@yahoo.com, it's a spam-only account :-)
Use @monlynx.de instead !

- P
- Pertti Kellomaki
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Dec 2, 2009 7:37 AM

I don't doubt that, and I symphatize with your descriptions of interaction between templates and the compiler.

Sure, but if you don't get the code from template instantiation, then you need to get it from somewhere else, e.g. write it by hand. If the template instantiation and the hand written code are equivalent, then I would hope the resulting binaries to be equivalent. Otherwise it would be time to change the compiler vendor ;-)

Some people (not you obviously) seem to think that there is some intrinsic run time overhead using templates. But templates -- and metaprogramming in a more general sense -- are really compile time phenomena. Hence my comments.

--
Pertti

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Dec 2, 2009 7:14 PM

Upthread, Vladimir proposed to replace void pointer plus size by templates for type safety. That is, essentially, replacing memcpy(void* dest, const void* src, size_t size); by, say, template copy(T dest[], const T src[], size_t nitems); Normally, this will generate duplicate code.

Of course we could define the latter just as type-safe syntactic sugar for memcpy, inline template copy(T dest[], const T src[], size_t nitems) { memcpy(dest, src, nitems * sizeof(T)); } but then it would qualify as a template which is optimized away, which I excluded :-)

The problem just ist that in C++, with templates in particular, it needs work to avoid bloat. With C, it needs work to generate bloat, because all code has to be written explicitly (unless you're using preprocessor tricks outlawed by every sane coding style).

(Footnote: I've been playing around with toy compilers since I was 15, so I think I know how a compiler works and thinks. Pretty useful knowledge, should be spread wider during developers.)

But isn't "templates have no runtime costs" one of the the first lessons taught? You're trading type-safety and number-of-instructions-per-call for code size. At least, that was my impression when learning C++. Probably to counter the old "hey, what do I need templates, I have a TArray containing TObject* which works fine" and "my hand-written vector of integers is faster than std::vector" arguments.

Stefan