ugly math

- J
- John Larkin
  
  Contact options for registered users
posted
17 years ago

Sun, Nov 12, 2006 11:06 PM

OK, I've got a 64-bit unsigned binary number, in a pair of 32-bit registers, in an MC68332 cpu. Its maximum value is 1e13, which is 10 seconds measured in picoseconds, so the MS longword can only get up to about 2328 or something like that. I'm looking for an efficient way to turn this into a decimal ASCII string, so basicly need to convert it into bcd. The bcd output will need to have 14 digits. "Numerical Recipes" doesn't address mundane problems like this.

One way is to create a BCD buffer, zero it, and map each nibble of the input through a 16-entry bcd lookup table, and sum all the bcd terms. The 68332 has a fairly efficient BCD add instruction, so adding a bcd table entry to the running sum isn't ghastly. The lookup table needs

12 entries (one per active nibble) each holding 16 bcd numbers, each being a 14 digit bcd number, namely 8 bytes, grand total 1536 bytes. Creating this table would be a minor nuisance; I'd need a jillion-digit calculator. Well, PowerBasic does have a 64-bit integer type, so maybe I can write a little Basic program to gen the tables. There's even a BCD data type, up to 18 digits.

One nice algorithm for binary-to-bcd is to just keep dividing by 10 and posting the remainders as the bcd digits, in reverse order of course. The 68332 can divide a 64-bit value by 10, with a remainder, but the quotient is limited to 32 bits, not enough. I could use this mode to convert the low 32-bit part to bcd, then use four nibble table lookups on the high half to finish things off. Since I also need a

32-bit-to-bcd thing somewhere else, maybe that isn't such a gross idea.

If the raw input could be scaled to 64-bit fractional format, successive multiplies by 10 will pump out digits, ms digit first. Gotta think about that one... it may have embarassing rounding problems.

Or maybe I could program our Xilinx FPGA to do a hardware divide, divide a 64 bit integer by 10 and feed me the remainders. Hell, maybe it could do the entire ascii conversion in hardware.

Oh, I'm programming in bare-metal assembly.

Anybody have any tricks here?

(You comp.arch guys take it easy on me, please. I'm just a simple engineer.)

John

- R
- richard mullens
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Nov 12, 2006 11:50 PM

Something like this:-

Divide the 64 bit value by 1 million. Then Convert the quotient and remainder to ascii strings.

(Divide by 10, push remainder, divide quotient by 10, push remainder, ..., store quotient, pop remainder, store, pop remainder, store ...)

- A
- Andrew Holme
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 12:01 AM

I think you can use the same trick which the FORTH word M/MOD uses. This divides a 32-bit value by a 16-bit value, giving a 32-bit quotient and

16-bit remainder; but it does it using the U/ primitive, which is 32/16 = 16q+16r.

Say you want to work out (a + 2^32*b) / n using your 64/32 --> 32q, 32r instruction

Divide your high word first (set the top 32-bits to zero and move b into the lower 32) b/n = q1 + r1/n

Now do the low word, adding in the remainder from the above: (a + 2^32*r1) / n = q2 + r2/n

You can now form a 64-bit quotient and 32-bit remainder from the two results: (a + 2^32*b) / n = 2^32q1 + q2 + r2/n

Here's the code in FORTH:

$Colon MDivMod, 'M/MOD' ; Unsigned 32/16 -> 32q 16r ; >R 0 R U/ ROT SWAP R> U/ ROT SWAP ( l h d -- l h r ) ; ;>R l h d ;0 l h 0 d ;R l h 0 d d ;U/ l q r d ;ROT q r l d ;SWAP q l r d ;R> q l r d ;U/ q1H q2L r2 ;ROT q2L r2 q1H ;SWAP q2L q1H r2

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 12:22 AM

store quotient, pop remainder, store, pop remainder,

Gadzooks, my 68332 can do that beautifully. That splits the number into two chunks I can process with the available 64/32 divide opcodes.

Don't need no stinkin' pushes and pops... I've got 16 registers!

Thanks!

John

- R
- richard mullens
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 4:48 AM

Thanks, Interesting. Maybe I'll write the (32 bit) code in Macro-11 for reference later on. Should be useful for microcontrollers with h/w integer divide Richard

- A
- Arlet
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 7:26 AM

store quotient, pop remainder, store, pop remainder,

Given the 64/32 bit divide, this is the best solution. Instead of storing all the BCD nibbles on the stack, or in registers, just write them to memory as you produce them. The trick is to write them 'backwards'. Set up an address register with the end of your ASCII buffer, write a zero byte with autodecrement, and write each nibble with autodecrement. When you're done, the register holds the start of the string in the correct order, terminated by zero.

- R
- Robert Baer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 9:51 AM

Attached is a FORTRAN program that converts internal binary back to ASCII decimal, for a FHT (Hartley) multi-million digit multiply routine. As such, the input (large) number was "sampled" into 7-digit or less length sequences, converted to binary, then FHT, convolve, inverse FHT, and this used to convert back to ASCII in the same-sized sequences. The secret to this speedy code (attached) is a 579Kbyte look-up table (actually two tables ASK4 and ASK5) placed in (labelled) common by an ASM "program" which i could send to you if you wish.

- F
- Frank Bemelman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 10:37 AM

"John Larkin" schreef in bericht news: snipped-for-privacy@4ax.com...

Why does it need to be efficient? Code-space efficient or execution time efficient? For displaying purposes, a refresh rate of 10 times per second is good enough. A lookup table with 1,10,100,1000 upto

1000000000000 converted to 64 bit values, and then subtract those values until you hit 0. Or a larger table, with 1,2,3,4,5,6,7,8,9,10, 20,30,40,50 and so on, with compares and subs. Such table would hold 9x16x8 bytes (1152) of data.

Or use a different hardware counter, to count the picoseconds, lsb part of it in binary, msb part of it in decimal.

--
Thanks, Frank.
(remove \'q\' and \'.invalid\' when replying by email)

- T
- Terje Mathisen
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 10:45 AM

store quotient, pop remainder, store, pop remainder,

You can of course use a 64/32->32,32 (result/remainder) in a loop to divide an arbitrarily long bitstring by a 32-bit number.

If this needs to be efficient, then it is much faster to do the long divisions by 1e9 (which fits in 32 bits, then use regular ascii conversion code to convert each 9-digit part into ascii/bcd digits.

If time is _really_ critical, then you should search for my _fast_

10-digit 32-bit binary to ascii conversion algorithm, which I posted on usenet (in asn asm newsgroup) many years ago. It is fast enough that AMD 'borrowed' it without attribution for their code optimization manual! :-(

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"

- D
- dave
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 11:35 AM

You don't provide much context for this problem, so I'll just drop this into the thread. If you have a C compiler available, why not just port the Calc program (supports arbitrary precision arithmetic) to your computer?

formatting link

Dave Feustel

- A
- Arlet
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 11:47 AM

Seems overkill to use arbitrary precision when only 64 bits are needed. Most modern C compilers have a 64 bit type, usually implemented as 'long long'.

In that case, a simple one line call:

sprintf( asc, "%lld", bin )

will do the requested conversion.

Of course, OP said he wants to use assembler, presumably because the available memory does not allow a few kB for the library calls. For code efficiency, it's hard to beat the solution proposed by other poster where you do an initial 64/32 bit division by 1 million, followed by repeated 32-bit divisions by 10 to get the BCD digits.

- D
- dave
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 12:59 PM

I agree. I have found calc to be quite useful on OpenBSD, although it's probably not useful if the OP is developing code for an embedded system.

- C
- colin
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 3:55 PM

ha ive just implemented such a thing in c to display speed of my motor controller on a dspic30, but it uses modulus and divide per digit wich is probaly too slow for you if you are doing it in assembler as you probably need speed, else why do it in assembler.

but heres an idea i just made up to do it quickly with no divides or multiplications but instead uses lots of code but its very fast i would think, only executing 4 compares 1 and + 1 subtract per decimal digit (78 ops).

its aranged like a binary decision tree, each decision spliting the problem into two aprox halves,

psuedo code :- (well ok its c actually and totaly untested !)

unsigned64 result=0; x=input; if (x>=5e12) if(x>=7e12) if(x>=9e12) { result&=9

- C
- colin
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 4:10 PM

if

in

problem

I spoted a few mistake already as I made some changes just before posting, x is an uneeded temp and shld be replaced by input, I assumed input is actualy

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 4:36 PM

store quotient, pop remainder, store, pop remainder,

My thoughts exactly.

Here's my code, based on Richard's suggestion, but dividing by 1e9 instead of 1e6. I haven't read it through enough, so there's likely a bug or so. I usually code bugs here and there, but catch then on read-throughs, before I actually run the code.

ACORN is separately callable to convert 32-bit integers.

John

.SBTTL . ASCII CONVERSIONS

; GIVEN A 64-BIT TIME IN PICOSECONDS, CONVERT THAT INTO AN ASCII ; STRING IN OUR "BCD" STRING BUFFER.

; INPUT IS IN D0:D1, MAX VALUE 99,999,999,999,999, 14 DIGITS!

; WE'LL BREAK D0:D1 INTO TWO CHUNKS BY FIRST DIVIDING BY 1E9

ATIME: MOVE.L # 1000000000, D3 ; ONE WHOLE BILLION! WOW. DIVU.Q D3, D0:D1 ; DO THAT DIVIDE

; D0 (REMAINDER) HOLDS THE LOW 9 DECIMAL DIGITS, 999,999,999 MAX ; D1 (QUOTIENT) NOW HOLDS THE NUMBER OF BILLIONS, 10,999 MAX

MOVEA.W # BCX, A0 ; AIM AT END OF 'BCD' BUFFER, +1 MOVE.W # 9-1, D7 ; REQUEST 9 DIGITS MOVE.L D0, D6 ; COPY CORRESPONDING CHUNK BSR.S ACORN ; CONVERT THEM, POKE INTO BUFFER

MOVE.W # 5-1, D7 ; MAKE READY FOR BIG 5 DIGITS MOVE.L D1, D6 ; COPY THE GIGAS ; AND FALL INTO FINAL CONVERSION

; THIS LITTLE SUB CONVERTS THE CONTENTS OF D5 INTO ; AN ASCII STRING.

; D7 IS NUMBER OF DIGITS TO OUTPUT, DBF STYLE ; D6 HOLDS UNSIGNED LONG TO CONVERT ; A0 AIMS AT START, THE *LS* DIGIT OF THE STRING+1, ; MEANING WE'LL WORK BACKWARDS, -(A0) MODE

ACORN: MOVE.L # 10, D3 ; RADIX CONSTANT FOR HUMANOIDS

ACRID: CLR.L D5 ; MAKE INPUT INTO 64-BIT INT DIVU.Q D3, D5:D6 ; DIVIDE, REMAINDER IN D5 ADDI.W # "0", D5 ; CONVERT TO ASCII 0..9 MOVE.B D5, -(A0) ; POKE A NEW DIGIT

DBF D7, ACRID ; AND LOOP RTS

; ON EXIT, A0 AIMS AT THE MS CHARACTER

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 4:40 PM

store quotient, pop remainder, store, pop remainder,

I'd love to see that; could you possibly repost here?

Is there a competition for the *slowest* binary to decimal-ascii conversion? I think I may have written a candidate, for the 6802 uP, a while ago.

John

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 4:53 PM

Partly for elegance. The box has a 38.4 kbaud serial interface, and users can interrogate it for current settings, so I would like to not be speed-limiting the communications. I guess it would be OK to do the

64-bit binary to 14-char ASCII in a couple of milliseconds, which ain't hard on a 68332 no matter how you do it. The other efficiency is my coding time.

I posted what looks like a nice routine, based on Richard's divide breakup, followed by using the 68332's very nice 64/32 divide with remainder opcode.

I've done that, on machines without the divide opcode, but the divide thing just spits out ASCII digits in a 5-opcode loop, a few microseconds per character.

I considered having my FPGA guy do the whole thing for me in VHDL, sort of as a math coprocessor, but the split divide thing looks fine.

John

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 4:56 PM

Yeah, I expect the entire application (managing an embedded digital delay generator) to come in around 6 kilobytes or so, including serial command parser, 'help' text, FPGA loader, flash manager, and of course the hardware management.

John

- A
- Ancient_Hacker
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 5:09 PM

one simple way is to have a powers of ten table:

dq 1000000000000,10000000000,1000000000 dq 10000000,1000000,1000000,100000,10000,1000,100,1

Start by seeing how many times you can subtract the first entry-- that's your top digit. continue doing the same for each succeeding table entry. Quite fast, and no divides needed.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Nov 13, 2006 10:07 PM

I know you're using assembly, but I can't be bothered writing it out in assembly (certainly not tight and fast code) at this stage. But here's an idea in rough C, that you should be able to translate well enough.

static const uint64 bcds[8][256] = { { 0, 0x0001, 0x0002, ..., 0x0255 }, { 0, 0x00000256, 0x00000512, 0x00000768, ..., 0x00065535 }, { 0, 0x00065536, 0x00131072, ..., 0x16711425 }, ... { 0, 0x02814749 76710656, ... } // Overflows after 35th entry } { ... } // Row 8 is all overflows, and can be skipped altogether };

uint64 intToBcd(uint64 x) { uint64 sum = 0; int i; uint8 *p = (uint8* &x) + 7; // Point to LSB

for (i = 0; i < ; i++) { sum = bcd64add(sum, bcds[i][*p--]); }; return sum; }

The idea is to have 8 lookup tables, one for each byte placing in the

64-bit number (excluding the msb, as that will certainly overflow - you could skip some of the other bytes too as you don't need the full 64-bit). Each table gives the bcd value of each of the 256 values for a byte in that place. You decompose the original 64-bit int into bytes, and build up the bcd sum. If you want to save space, you could use 16 4-bit lookup tables instead of 8 8-bit tables.