Hiring Advice

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 3:44 PM

+1

Spend your "optimization time" deciding on the best *algorithm* and let the compiler sweat the gory "little details".

In addition to not being a "architecture-independant" optimization, it can also be *less* efficient in terms of memory requirements (i.e., what is the cost of your optimization and how does that cost "change"/vary)

If the data *belong* together as part of a struct, then *put* them together -- this lines up with our expectations on the data (e.g., "{x, y}" instead of "x, y"). But, *forcing* data together can be counterproductive -- like storing people's salaries in your "address book"

- L
- linnix
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 6:31 PM

ote:

irst

kes

ing is

ell as

y

ences

g a

vars)

f

I don't "C" why using pointer automatically make codes unclear, just because "->" is an unusual symbol? I agree, it would have been better to allow:

T *p,b; p.a=3Db.a;

You know p is a pointer. The compiler knows that you know it is a pointer. Why disallow it?

nt

Than let the compiler decides. C is pointer efficient language and ARM is efficient implementation of C arch.

If you dislike pointer, than go Java. Java is strongly typed and very restrictive. In fact, we had to rule out projects because of heavy restrictions. Take Android for example, without the native C interface and C kernel, it would not have been possible.

I don't advocate global static. But if you have to use them, no harm in grouping them in structs.

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 6:49 PM

The best way to let the compiler decide is to write it as simply as possible.

Writing 'a = b' makes it easy for the compiler to introduce a pointer, if that's more efficient. I've seen compilers do that.

Writing 'p->a = p->b' is a bit trickier to optimize back to 'a = b', and only possible if the compiler can analyze 'p' to be a constant.

Sure, on ARM, using pointers is usually better. On the x86 or the Atmel AVR, absolute addresses may be better.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 7:00 PM

The difficulty of understanding generated assembly depends on various things. The biggest issue is probably the target architecture - if you are looking at something like modern x86 code, then it is really hideous. That's where you see things like apparently meaningless loads as a quick way to do some arithmetic, and lack of registers make it hard to follow. Some other processors have very complicated instructions - if you compile some bitfield code on a PPC then you'll have a serious challenge to interpret the bitfield masking, extraction and rotation instructions. And for some processors there are many ways to write the same instruction - if you are looking at disassembled code, it can be hard to figure out what is going on. For example, on the msp430, a "ret" instruction is actually "mov @sp+, pc" - or alternatively "mov @r1+, r0", which is not at all obvious.

Then there is the type of code you are working with. If you want to understand the generated code, you should try to have a minimal C source that illustrates what you are interested in (be it optimising speed or size, checking correctness of volatile accesses, checking for compiler bugs, or whatever). Don't get drowned in too much information.

You also have to choose your compiler flags appropriately. Too little optimisation can make it hard to understand - on some compilers, with no optimisation then all local variables get put on the stack, and you can't see the real code for all the moves on and off the stack. Too much optimisation also makes it difficult, as everything gets re-arranged.

Some compilers can intermingle the source code with the generated object code in a listing file. This is a big help, but can limit the compiler's freedom in optimising.

Example code from hardware suppliers are often the worst examples you can find of good coding practices.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 7:04 PM

These sorts of optimisations can also limit the compiler in ways you haven't thought of. For example, many people like to replace array accesses in loops with pointers, in the belief that it makes their code faster. For older compilers that was often true - but better tools will have more information when using arrays, and therefore generate better code.

Yes, that's all about modularising and structuring your code. It is done more often with C++ programming, since it is natural to wrap related data into a class, but you can do it with a struct in C just as well.

But as you say, don't force illogical connections.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 7:32 PM

I would start with disallowing "T *p,b" - no serious coding standard or code review would let that pass in the first place.

Use pointers where pointers are appropriate. Use structs where structs are appropriate - based on the logic of the program. Adding layers of pointers and structs just because you think it makes the code smaller or faster is bad programming.

I am a great fan of writing efficient code. But code efficiency is of secondary importance to writing code that is /correct/, and that is clear enough that it can be /seen/ to be correct.

Write your code in the natural, logical way - and /then/ let the compiler pick the best implementation. Your attempts at forcing the compiler's hand gives you poor quality source code, and the "optimisation" effects are not portable across different compilers for the same target, never mind across different targets.

As I say, there are times when this sort of thing is necessary to get the speed or size you want. And it's important that programmers know how to do it. But it is not something you do unless you have no choice.

I am not interested in Java, nor am I advocating avoiding pointers. I am advocating avoiding /unnecessary/ pointers that only serve to obfustecate the source code, and are as likely to hinder the compiler as to help it.

Pointers have their place, but there is a good reason to consider them "risky" if they are not handled correctly. And there is good reason for preferring references in C++ where possible.

I /do/ advocate global statics - and global non-static data - if it is the clearest way to implement a particular feature.

And I don't object to grouping them in structs, if that is the /logical/ thing to do. But I object to making an illogical mix in a struct just to generate smaller code (unless it is very necessary), and I certainly object to calling that mix "clear" programming. Often, it is better to put your file statics close to the functions that actually need them - preferably local to those functions.

Even if you decide to put your data together in a struct, you should not take a pointer to that struct - access it directly, and let the compiler optimise accesses. That gives you the clearest source code, and as a side effect will let the compiler generate slightly better target code.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 7:49 PM

Exactly. Their goal is to "show it (the device) works".

E.g., if I'm bringing up a printhead controller, I sure as hell don't invest lots of time designing an abstract, versatile interface to a "marking engine". Instead, I create some image on graph paper, figure out which "squares" get packed into which "bytes", build a table of these byte values -- and then write a little routine that crams them into the printhead and tugs on the "PRINT" line.

If I see the appropriate "pattern" appear on the paper, I have a high degree of confidence that the hardware interface is operational. If, instead, I see an exquisite rendering of the La Gioconda, then I know that something is *horribly* wrong (but, I rip off the paper and send it off to Ripley's Believe It Or Not -- to be immortalized next to the potato chip that looks like Elvis Presley's left ear...)

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 9:35 PM

Having programmed a lot of resource constrained systems, I always tried to represent a data using the smallest possible type. I used to do it automatically... until I noticed that the 16-bit and the 8-bit math appears to be about 3 times slower then the equivalent 32-bit math on the modern x86 CPU.

Another common assumption is that the integer math is faster that the floating point, and that the single precision float is faster then the double. In fact, that could be either way, as it depends on particular system and particular task.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 9:44 PM

As simple as possible... I used to know software engineer who implemented SPI by bit banging without using a loop. I.e. straight linear code for 32 iterations.

"There is only one method for code reuse - Copy/Paste"

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- D
- Dennis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 10:20 PM

I remember years ago working on a processor that only had a double floating point unit. Single was slower because it took a few cycles to convert to double and back to use the double float unit.

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 10:44 PM

Same thing happens on an ARM. Integer math on "unsigned" (32 bit values) almost always generates faster code than operations on unsigned 8 or 16 bit values. And unlink some processors, the ARM can't use for anything else the other half (or three-quarters) of registers occupied by shorter values.

Yup. I remember one project involving a lot of floating point (fire control stuff) where we got a _drastic_ speedup by changing everything from float to double.

I presume the initial assumption was that single-precision math would be faster than double-precision. That might have been true if 1) it were done in software (it wasn't), and 2) if the C compiler _did_ single-precision FP operations (it didn't).

--
Grant Edwards               grant.b.edwards        Yow! I wonder if I should
                                  at               put myself in ESCROW!!
                              gmail.com

- P
- Paul
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 10:58 PM

One of my recent experiences found a lot of problems in a datasheet noted to manufacturer, next release was 12 pages longer.

One of the best bad practises I come across was a peripheral library that

a) relied on compiler to remove unused sections of the 'library' even for parts in modules that were not on that device!

b0 Was so poorly encapsulated that instead of passing which unit of the I/O you were talking to each higher level function had to pass the full 32bit address specific for that device.

If you passed by unit refence eg 0 or 1 for UART 0 or 1 to a UART function, you could also have the 'library' double check by returning number of that I/O function supported or out of range index. Whereas it had NO checks for invalid address passed in.

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk
    PC Services
 Timing Diagram Font
  GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
 For those web sites you hate

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 11:11 PM

Actually, that depends on the version of the core. ARMv6 and up add halfword and byte arithmetic instructions that let you do some poor man's SIMD. Though I'd be shocked if any C compiler actually generates them

--
Rob Gaddi, Highland Technology
Email address is currently out of order

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 11:14 PM

It doesn't make much difference on an x86, I believe, but smaller-than-int maths can be a big cost on many RISC cpus.

I have done exactly the same as you, although I usually automatically picked bigger types when I knew I was targeting a bigger cpu.

Now we just have to change our habits from int8_t to int_fast8_t :-)

Yes, that can be worth remembering too - however, these cases are rarer.

I have nothing against writing efficient code - like you, I do it automatically. I just dislike sacrificing clarity of code for the sake of efficiency unless there is a great deal to gain (and that you are sure that you will get those gains).

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Apr 6, 2011 11:46 PM

Speech, on some 80's vintage arcade pieces, was generated by shifting bits into a CVSD "as fast as conceivably possible" (since this maximized the abysmal signal bandwidth). This required unrolling all loops and ensuring that the number of CPU clocks between "shifts" remained constant, regardless of what the code had to do (e.g., consider how conditionals vary in execution time).

Don't reuse code -- reuse *designs*!

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Apr 7, 2011 10:01 AM

I've actually done that to get maximum throughput, avoiding loop overhead (only 8 iterations though). Think it was on an AVR.

I have seen someone suggest that copy/paste should be disabled in the ideal programmers editor. The idea being that every time you do it you are missing a chance for refactoring.

--

John Devereux

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Apr 7, 2011 10:27 AM

Sometimes it makes sense to do that, and makes it easier to get consistent timings as well as fast timings.

I once saw an assembly language delay function consisting of hundreds of NOP's...

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Apr 7, 2011 10:54 AM

On the other hand, copy/paste can sometimes be simpler than refactoring. Often I will make a RS-232, as well as an RS-485 routine, about half of which will be copy/pasted, and the other half rewritten/tuned for the specific purpose.

A nicely restructured piece of code that can handle both RS-232 and RS-485 will likely be bigger and slower than both of my pieces combined.

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Apr 7, 2011 11:42 AM

I guess the temptation to copy and paste is so strong that it should be actively discouraged. If it is so important to duplicate code, then you should *type it out* to make sure you are not just being lazy :)

--

John Devereux

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Apr 7, 2011 11:44 AM

That's a reasonable compromise :)