Slightly OT: speed of operation on a PC

T

Tim Wescott 10 years ago

Which do you think is quicker on a PC, with the latest gnu compiler:

double a = something; double b = something else;

if (a >= 0.0 && b < 0.0)

> or

Vote

R

rickman 10 years ago

My guess would be the latter if any. Why not write the code and look at what is generated?

Rick

Vote

L

Lanarcam 10 years ago

if (a + b < a) would be quicker, I believe.

Vote

D

Don Y 10 years ago

Do you really need to micro-optimize like this? Why not just have the code *say* what you are trying to *do*?

Vote

P

Paul Rubin 10 years ago

Vote

B

Bill Davy 10 years ago

if a==1 and b==0, first expression is false, second expression is true so they are not equivalent. Border conditions are such a bother. I could proivide a similar expresion which is even faster but laso not right (0, for example). Sometimes the use of the sgn() function is even quicker. Rules of optimsiation:

1) Never 2) Later 3) Measure

Vote

T

Tim Wescott 10 years ago

I don't think that tests if a and b are of opposite sign.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

T

Tim Wescott 10 years ago

Uh -- fun?

Where's the job security in that?

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

L

Lanarcam 10 years ago

That's right, my error.

You could try the macro signbit

if (signbit(a) != signbit(b))

Vote

D

Don Y 10 years ago

If you want to head down that path, there are lots of processor/compiler specific "optimizations" that you can explore! The problem then becomes making sure you don't let some "clever trick" creep into your code at a later date.

Given a year expressed in packed BCD, indicate whether the year is a leap year.

Obviously, there is some merit in knowing this. OTOH, the code only has to execute VERY rarely!

[It's an interesting problem, though. How many clock cycles will it take to resolve? :> ]

Do you *really* want to be tethered to "old projects"? Or, would you rather be free to explore *new* projects?? ;-)

Vote

R

rickman 10 years ago

Neither does your original code, either.

The first case only checks for 'a' positive and 'b' negative. "Opposite sign" would include the case of 'a' negative and 'b' positive. Your multiplication includes either one being zero the same as them being opposite sign.

Maybe you could explain is a bit more detail what the goal is. Is the first code example sufficient for your needs and the domain excludes the problem areas where these two give different results?

In an HDL this may or may not produce comparators or multiplies. I would have to try it to see, but I have seen the compiler be smart enough to realize that much of the logic was not important to the result and simplify it.

Rick

Vote

T

Tom Gardner 10 years ago

With many of these questions, the answer may depend on the surrounding context and the compiler optimisation level, and what is/isn't in the cache.

And may change with the next compiler release. Or with a "trivial" change to the context.

Unless the answer is *critical*, it is usually better to write clear simple code using the "best" algorithm, and let the compiler (hopefully correctly) optimise it. Since you ask the question, clearly you haven't determined that this is the most important optimisation.

Vote

T

Tom Gardner 10 years ago

Precisely.

Vote

H

Hans-Bernhard Bröker 10 years ago

Am 04.08.2015 um 19:59 schrieb Tim Wescott:

Vote

G

glen herrmannsfeldt 10 years ago

(snip of other non-optimizations)

Not so long ago, I had to test an odd integer for being 1 modulo 4. Slightly different, but I did it with the S/360 MVO and TM instructions.

MVO ODD(4),J+6(2) TM ODD+2,X'12' BM LOOP5B

But only after I put in the usual (binary case) test for one bit, and then realized that it didn't work. The first time I remember using BM (branch mixed) for a TM (test under mask) instruction.

-- glen

Vote

D

Don Y 10 years ago

"Too clever for your own good!" :-)

I am retroactively embarrassed for many of the *little* clever tricks I employed in designs early in my career. In hindsight, they're nothing more than ego-strokes -- the "savings" rarely justified the time spent *thinking* about them! (i.e., micro optimizations)

E.g., I designed a "front panel replacement" for a minicomputer in the 70's. This allowed me to discard the little "bit switches" that one would use to "toggle" a program into core (*real* core!), examine memory locations, step the processor, etc.

Part of the design required some conditional logic to encode certain "levels" on the O.C. bus. So, conceptually, you had a bit of "junk logic" feeding OC NAND's that, in turn, drove the bus.

Not content to live with this "obvious" implementation, I realized that some of the "logic functions" coincided with some of the "segment decoders" in common BCD->7segment display decoders. So, by twiddling the "BCD" inputs to the decoder, I could get the logic function of the "junk logic" *and* the bus driving capability of the NAND drivers -- all in one package!

Over the lifetime of the product, it *might* save a dozen DIPs (total!!) :-/

[But, solving the problem was truly *fun*!]

Vote

T

Tim Wescott 10 years ago

I ended up finding a different solution to the top level problem, but yes, the curiosity was whether, with floating point hardware, it was quicker to do a bunch of compares (four I guess and not two) or one multiply and one compare.

I suppose that a good optimizing compiler that understands floating point sign bits would figure this out quickly.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

G

glen herrmannsfeldt 10 years ago

(snip on different ways to do the same thing) (then I wrote)

Most of the others that I thought of were enough more complicated. The result is the exclusive OR of two different bits in two different bytes. The less tricky way might be four bit tests and four branch instructions. Many ways to get it wrong.

For those who don't follow it, MVO is a memory to memory move that shifts four bits. The high half of one byte, and the low half of the one before it get written to one byte. (For between 1 and 16 bytes.)

TM allows testing for some combination of bits in one byte being all zero (BZ) all ones (BO) or mixed zeros and ones (BM).

Even more tricky by the designers of S/360, BO is the Branch on Overflow test after arithmetic instructions, and BM is the Branch if Minus test after arithmetic instructions! (snip)

The 7447 has nice high current 15 volt OC drivers. Usual OC gates have 5V transistors.

-- glen

Vote

G

glen herrmannsfeldt 10 years ago

(snip)

Some time ago, I was doing a test to see if two points are on the same of different sides of a line. Some similar tests ended up in the result, and I believe I went for the a*b case.

For one, I wasn't doing it enough to really matter how long it took. (It might have been thousands of points, maybe even millions, but that doesn't matter today.)

I believe the equal case occurs if one or both are on the line, and in my case it didn't matter which way it went. (And for random (x,y) pairs was pretty unlikely.) (snip)

-- glen

Vote

D

Don Y 10 years ago

I was looking for a quick/efficient test to ensure C2 continuity (reason should be obvious -- if you think on it!) between "successive" (concatenated) Bezier curves in my gesture recognizer.

I got fixated on the: dy/dx ?= dY/dX issue -- how to avoid the (high precision) divide! And, the inevitable "comparing floats for equality" dilemma!

"Ah! dy*dX ?= dx*dY does the trick!"

No, imagine dy == 0 or dx == 0, etc.

After fretting about this (for TOO long!), I slapped myself upside-da-head: "You're solving the *wrong* problem! E.g., C1 instead of C2!" D'oh!

Hence the admonition not to get caught up in silly optimizations at the expense of sorting out what you *really* are trying to do!

Vote

Slightly OT: speed of operation on a PC

Join the Discussion

Didn't find your answer?