I needed to multiply U128 = U96 * U32 on an ARM the other day, and did a hack job in C. It is logically correct, but it probably takes 3-10 times as long as assembly-language.
Are there any large integer and similar libraries out there for the ARM?
The GMP is a great resource, of course, but it isn't very applicable to embedded systems because of:
(a)Dynamic allocation.
(b)Variable-size operands, which mean looping and branching, which causes pipelined RISC machines not to perform as well as they would if the code were written in a linear way.
I've been working on untangling the OpenSSL bignum library free from OpenSSL. I've found it confusing because of a lot of upward dependencies, and the fact that the ./configure intermingles option setting for the whole package. One target that may be nearby is to group the options that only bignum needs into a single .h file.
If there is enough synergy where you feel we might be able to work together or exchange technical information, please write me directly.
My feeling based on the embedded work I've done is that there need to be two variants of the library--classic ARM and Thumb. That assessment may need to be adjusted downward or upward.
I'm still working to get my head around the physical requirements for packaging a release. It seems as though a source->.a process would suit my needs best. That way, ARM/THUMB can be decided project-by-project.
I've just started some due diligence and registered with github.
Somebody there has under a BSD license which might be what we're both looking for.
Otherwise, it might be a plan to put up a BSD licensed project based on the BSD licensed OpenSSL code.
I have the same thoughts. Assuming that ".a" is the standard suffix for an ARM library, a library would be built from the smallest compileable modules (typically a single function each), and the linker will automatically bring in only what it needs to satisfy references.
With few exceptions, the source code would consist of source files that each contain a single function or a single constant or variable data structure.
There would have to be two libraries--one for ARM classic and one for THUMB.
Nearly all of the assembly-language modules would be different for the ARM vs. the THUMB libraries, but nearly all of the C functions would be the same for both libraries, with the development tool settings compiling for ARM or THUMB.
Thanks. I will look at that.
Thanks. I'm not sure about the BSD license. I'll look at that.
I believe it was Steve Ballmer of Microsoft who once wrote that "open-source licenses attach themselves like a cancer to anything they touch".
I disagree with that point of view (I fully support open-source development), but they just aren't practical for embedded systems.
Someone who buys a washing machine doesn't want to see information about software licenses anywhere, including the manufacturer's web page. It doesn't help them to use or understand the product.
Open-source is great. Open-source licenses applied to small embedded systems is not.
A library (.a) with a header file is a convenient format to use - and as you say, each source file should only contain a few functions so that only those needed are linked in.
However, this will lead to sub-optimal code because it does not allow cross-function and cross-file optimisations. If you are happy giving out the source code, then people can use that to get the smallest and fastest code.
Where possible, it is best to write such a library in C, using inline assembly when it makes a big difference. If you can make your functions as inlineable C code, then it will often result in smaller and faster object code, as the compiler can do things like re-arrange registers, re-use results, avoid function call overheads, etc. When you make the routines in fixed assembly, the compiler is forced to move data into and out of registers to suit the assembly code - this overhead can quickly outweigh the benefits of the tuned assembly code.
Where you do need assembly, it is often worth writing it as inline assembly using advanced syntaxes supported by the compiler. Then at least the compiler has the freedom to pick registers and include the code in its optimisations. gcc syntax is supported by several compilers, including (obviously) all gcc-based tools and CodeWarrior. I don't know if other tools like IAR and Keil support them, or if they have their own syntax.
Don't take advice about open-source licenses from Steve Ballmer - the only thing he knows about them is that they might slightly reduce his gazillion dollar income. Anything he says is pure FUD.
What you are thinking about here is pure GPL - if you make your library GPL, then the user (i.e., another programmer) has to give the end user the right to all the source code, and the person buying the washing machine gets confusing information about the software license.
With BSD-style licenses, no restrictions are imposed on your users (other developers), and the end user knows nothing about it. The only requirement is that no one can remove the copyright notices you put on the files and claim they wrote them. It's a perfectly good license for embedded libraries. (And Microsoft likes it too - they took the BSD-licensed network stack from FreeBSD to form the first version of the Windows TCP/IP stack. Apple likes it even more - they took the entire BSD-licensed FreeBSD OS, put a nice face on it, and called it "OS X".)
An alternative is to use a modified GPL which forces developers to publish enhancements and modifications to the library code, without affecting anything in their own code. See FreeRTOS for an example.
That's an exaggeration. It's true that OS X contains some FreeBSD code, but Apple certainly did not "take FreeBSD and put a nice face on it".
OS X is based on NeXTSTEP, which itself was based on CMU's Mach 2.0 operating system. NeXTSTEP took Mach and replaced the existing Unix compatibility layer with new code that included ports of both FreeBSD and NetBSD modules. Apple chose NeXTSTEP to be the basis for the new Mac OS and purchased NeXT to get it.
[And yes, I know Steve Jobs founded NeXT - after he was ousted from Apple. Jobs came back to Apple after a 10 year absence as part of the deal to purchase NeXT. Jobs did not retake control of Apple until the company narrowly avoided bankruptcy 2 years later.]
//**************************************************************************************************** //Add together the terms. The best way to envision this is the way that children are sometimes //taught to multiply with a rectangle and then adding diagonally. // //Word [0] of the result. No addition is required, because this term comes from the least significant //term alone. out_u128->w[0] = term_96w0_32w0.w[0]; // //Word [1] of the result. temp_u64 = (APP_BIGINT_U64_NATIVE_T)term_96w1_32w0.w[0] + (APP_BIGINT_U64_NATIVE_T)term_96w0_32w0.w[1]; carry = temp_u64 >> 32; out_u128->w[1] = temp_u64;
//Word [2] of the result. temp_u64 = (APP_BIGINT_U64_NATIVE_T)carry + (APP_BIGINT_U64_NATIVE_T)term_96w2_32w0.w[0] + (APP_BIGINT_U64_NATIVE_T)term_96w1_32w0.w[1]; carry = temp_u64 >> 32; out_u128->w[2] = temp_u64;
//Word [3] of the result. temp_u64 = (APP_BIGINT_U64_NATIVE_T)carry + (APP_BIGINT_U64_NATIVE_T)term_96w2_32w0.w[1]; out_u128->w[3] = temp_u64; }
IMHO, the biggest economy, in terms of code density, comes from the umlal instruction (32x32->64_mul followed by add_64). (On SH-4, it's all done "by hand" using 4-6 instructions.)
FWIW, here's the assembly code gcc 4.6.1 generates for Cortex-M3. What CPU are you targeting? ARM or Thumb mode?
The possibility of a carry makes me nervous. I'd first like to verify that the sum
can't exceed 64 bits and generate a carry.
It doesn't need to be verfied that
can't generate a carry, because there is a simple proof of that relying on the operand sizes and the fact that this is the most significant product.
Squaring the maximum 32-bit integer yields:
(2^32-1) * (2^32-1) = 2^64 - 2^33 + 1.
If the second term is also a maximum square, adjusting for the placement, its maximum value is:
2^96 - 2^65 + 2^32
The question of whether there can be a carry is equivalent to the question of whether.
2^96 - 2^65 + 2^32 + 2^64 - 2^33 + 1 < 2^96
Looking at the terms, this is definitely the case.
OK, I believe that
cannot generate a carry. This seems plausible because one of the arguments is 32-bits ... I'm just not sure about how larger multiplications (more terms to add) would work out.
I believe your implementation is correct.
Below is the source code and assembly-language for my original plus your improved version.
//**************************************************************************************************** //Add together the terms. The best way to envision this is the way that children are sometimes //taught to multiply with a rectangle and then adding diagonally. // //Word [0] of the result. No addition is required, because this term comes from the least significant //term alone. out_u128->w[0] = term_96w0_32w0.w[0]; // //Word [1] of the result. temp_u64 = (APP_BIGINT_U64_NATIVE_T)term_96w1_32w0.w[0] + (APP_BIGINT_U64_NATIVE_T)term_96w0_32w0.w[1]; carry = temp_u64 >> 32; out_u128->w[1] = temp_u64;
//Word [2] of the result. temp_u64 = (APP_BIGINT_U64_NATIVE_T)carry + (APP_BIGINT_U64_NATIVE_T)term_96w2_32w0.w[0] + (APP_BIGINT_U64_NATIVE_T)term_96w1_32w0.w[1]; carry = temp_u64 >> 32; out_u128->w[2] = temp_u64;
//Word [3] of the result. temp_u64 = (APP_BIGINT_U64_NATIVE_T)carry + (APP_BIGINT_U64_NATIVE_T)term_96w2_32w0.w[1]; out_u128->w[3] = temp_u64; }
By changing to one temporary variable (rather than 3) the compiler seems to have shaved out two instructions (one at the start to allocate stack space and one at the end to deallocate it).
I believe that the definition of C guarantees that
temp = expression + temp;
will use the "old" value of temp before updating the new one, so that should be OK.
Here is the new code and assembly-language (with two fewer instructions).
I really appreciate the help. The C version you provided is SUBSTANTIALLY more efficient.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.