Better C-Language Implementation of Multi-Word Addition

D

David T. Ashley 13 years ago

I have created this monster:

formatting link

I believe it is logically correct, but there must be a better way (other than assembly-language) to process the carries.

The target is the ARM.

Any ideas?

Thanks, Dave A.

--------

P.S.--The .PDF above may be easiest to read, but just in case, here is the code as text:

//---------------------------------------------------------------------------------------------------- //DESCRIPTION // Adds two signed 128-bit integers to produce a signed 128-bit result. No limiting or "railing" // is implemented, and overflow will occur as it would with native arithmetic. // //INPUTS // *in_term_to_add, *out_result: // Integers to add. These pointers may be identical or point to overlapping areas of memory. // //OUTPUTS // *out_result // The sum of *out_result and *in_term_to_add. // // uw[0] + in_term_to_add->uw[0]; //For statement coverage, Statement //Group 1

//Perform the [1] addition, possibly including carry from the [0] addition. if ( (local_temp.uw[0] < out_result->uw[0]) //For MCDC testing, Clause 1 || (local_temp.uw[0] < in_term_to_add->uw[0]) //For MCDC testing, Clause 2 ) { carry_out_of_0 = TRUE; local_temp.uw[1] = out_result->uw[1] + in_term_to_add->uw[1] +

1; //Add with carry. //For statement coverage, Statement //Group 3 } else { carry_out_of_0 = FALSE; local_temp.uw[1] = out_result->uw[1] + in_term_to_add->uw[1]; //Add without carry. //For statement coverage, Statement //Group 4 }

//Perform the [2] addition, possibly including carry from the [1] addition. if ( ( (! carry_out_of_0) //For MCDC testing, Clause 3 && ( (local_temp.uw[1] < out_result->uw[1]) //For MCDC testing, Clause 4 || (local_temp.uw[1] < in_term_to_add->uw[1]) //For MCDC testing, Clause 5 ) ) || ( (carry_out_of_0) //For MCDC testing, Clause 6 && ( (local_temp.uw[1] uw[1]) //For MCDC testing, Clause 7 || (local_temp.uw[1] uw[1]) //For MCDC testing, Clause 8 ) ) ) { carry_out_of_1 = TRUE; local_temp.uw[2] = out_result->uw[2] + in_term_to_add->uw[2] +

1; //Add with carry. //For statement coverage, Statement //Group 5 } else { carry_out_of_1 = FALSE; local_temp.uw[2] = out_result->uw[2] + in_term_to_add->uw[2]; //Add without carry. //For statement coverage, Statement //Group 6 }

//Perform the [3] addition, possibly including carry from the [2] addition. if ( ( (! carry_out_of_1) //For MCDC testing, Clause 9 && ( (local_temp.uw[2] < out_result->uw[2]) //For MCDC testing, Clause 10 || (local_temp.uw[2] < in_term_to_add->uw[2]) //For MCDC testing, Clause 11 ) ) || ( (carry_out_of_1) //For MCDC testing, Clause 12 && ( (local_temp.uw[2] uw[2]) //For MCDC testing, Clause 13 || (local_temp.uw[2] uw[2]) //For MCDC testing, Clause 14 ) ) ) { carry_out_of_2 = TRUE; local_temp.uw[3] = out_result->uw[3] + in_term_to_add->uw[3] +

1; //Add with carry. //For statement coverage, Statement //Group 7 } else { carry_out_of_2 = FALSE; local_temp.uw[3] = out_result->uw[3] + in_term_to_add->uw[3]; //Add without carry. //For statement coverage, Statement //Group 8 }

//Decide whether a carry out of the [3] addition occurred. if ( ( (! carry_out_of_2) //For MCDC testing, Clause 15 && ( (local_temp.uw[3] < out_result->uw[3]) //For MCDC testing, Clause 16 || (local_temp.uw[3] < in_term_to_add->uw[3]) //For MCDC testing, Clause 17 ) ) || ( (carry_out_of_2) //For MCDC testing, Clause 18 && ( (local_temp.uw[3] uw[3]) //For MCDC testing, Clause 19 || (local_temp.uw[3] uw[3]) //For MCDC testing, Clause 20 ) ) ) { carry_out_of_3 = TRUE; //For statement coverage, Statement //Group 9 } else { carry_out_of_3 = FALSE; //For statement coverage, Statement //Group 10 }

//Assign the addition result back to the intended storage location. out_result->uw[3] = local_temp.uw[3]; //For statement coverage, Statement //Group 11 out_result->uw[2] = local_temp.uw[2]; out_result->uw[1] = local_temp.uw[1]; out_result->uw[0] = local_temp.uw[0];

//Return the carry information to caller. return(carry_out_of_3); }

Vote

R

Robert Wessel 13 years ago

If you want to do this with whole words, try something like this as a building block:

unsigned AddWithCarry(unsigned a, unsigned b, unsigned carryin, unsigned *sum) //returns carryout { unsigned sp, cp;

sp = a+b; cp = (sp < a) || (sp < b); *sum = sp + carryin; return cp || (*sum < sp) || (*sum < carryin); }

Then your add 128 routine becomes something like:

c = AddWithCarry(a.w[0], b.w[0], 0, &sum.w[0]); c = AddWithCarry(a.w[1], b.w[1], c, &sum.w[1]); c = AddWithCarry(a.w[2], b.w[2], c, &sum.w[2]); AddWithCarry(a.w[3], b.w[3], c, &sum.w[3]);

Alternatively, consider working with half words, so that the carries are easy to shift out of the top half of the result.

Vote

D

David T. Ashley 13 years ago

Thanks for the help.

I discovered something else sinister about this problem.

You wrote

sp = a + b; cp = (sp < a) || (sp < b);

A similar construct occurs many places in my code as well.

The test:

cp = (sp < a);

would actually be adequate.

Assume that the addition has "rolled over".

Since "a" is constrained to be in [0, 2^N-1],

sp = a + b;

has to produce a result < b if the calculation has rolled over.

Using similar logic, the result must also be < a if the calculation has rolled over.

Phrased differently,

sp = a + b;

cannot reach the region "between" a and b.

So, half the tests in my code automatically go away.

Please beat on me if I'm wrong. But I will require a numerical counterexample to accept the beating ... thanks.

Dave A.

Vote

R

Robert Wessel 13 years ago

No, you're quite correct. It's well know that you only have to test if the sum (assuming a finite field) is less than either operand. In fact it will be less than *both* operands, which is why you only have to check one. In fact it's the standard way to produce a carry on processors that don't have a carry flag.

It was a complete brainfart on my part putting both tests in there. So:

unsigned AddWithCarry(unsigned a, unsigned b, unsigned carryin, unsigned *sum) //returns carryout { unsigned sp, cp;

sp = a+b; cp = (sp < a); *sum = sp + carryin; return cp || (*sum < sp); }

Vote

B

Boudewijn Dijkstra 13 years ago

Op Tue, 21 Aug 2012 19:06:20 +0200 schreef Robert Wessel :

If the carry-shift-out-ease is a concern, then surely 31-bit quantities are better.

Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/

Vote

R

Robert Wessel 13 years ago

Except that then storing the numbers becomes problematic. Using (say)

16-bit words, storing the number densely (IOW, no wasted bits) is easy, and you can process those words fairly directly in your bignum library. With 31 bit words, you either have to store it like that, leading to wasted space, or you have to repack the number before working on it.

The other advantage that the half-word format buys you is that multiplication works - the product of two half-sized words will fit in a full-sized word.

Obviously there are ways to deal with all of those issues, but a highly tuned bignum library is a pretty fair undertaking. Particularly if you get fancy, testing is a very large challenge. If it meets the application's performance requirements, the simple approach is *much* easier.

Vote

D

David T. Ashley 13 years ago

There are three approaches being discussed with the understanding that I'm programming in C:

a)32-bit limbs, with the recognition that carries are awkward.

b)31-bit limbs, using B31 as the carry that would otherwise be lost because I'm programming in C.

c)16-bit limbs.

For addition, it does not follow automatically that (c) is faster than (b). It is true with (c) that the repacks are possibly simpler, but you would then be doing TWICE AS MANY additions.

Also, the ARM has the combinational logic built in so that any shift is a single-clock instruction.

For relatively large integers (say, 256 bits), I'd bet that (b) is the fastest approach on the ARM if one is programming in C and doesn't have the carry from an instruction available.

DTA

Vote

L

langwadt 13 years ago

cmsis defines a function to get the APSR wonder if that would work

-Lasse

Vote

R

Robert Wessel 13 years ago

Packing and unpacking a 32 bit representation into a 31-bit one is going to be fairly expensive, almost certainly more so that a series of 16 bit additions. So if you need to do the repacking with reasonable frequency, that'll kill you performance-wise. If you cared only about addition, do it 24 bits at a time might be a good compromise (you can still treat it as 16 bit words when you multiply).

If your compiler supports long longs (or __int64s) add the 32 bit words with that - it'll probably generate pretty good code, and the carry will be neatly sitting in the high word of the result.

But you'll have to measure it to see - it will depend highly on your compiler and the actual CPU you're running on.

If performance is an issue, a few lines of assembler (using the carry flag) will blow all of the pure C approaches out of the water. If speed isn't an issue, I'd suggest going for simplicity.

Vote

Better C-Language Implementation of Multi-Word Addition

Join the Discussion

Didn't find your answer?