Combinatorial Division?

- J
- Josh Rosen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 12:45 AM

This is the two bit at a time algorithm that I was talking about. Back in the 70s this was the most cost effective way to implement division. Modern machines, and FPGAs, have fast multipliers so convergence division is the best choice today.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 1:45 AM

Is the two bit at a time algorithm a Booth Recoding analog, or variable shifter algorithm?

the

Except for the fact that logjam is "I'm building a 64bit ALU using standard TTL devices."

- J
- Josh Rosen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 2:17 AM

the

It's been 25 years since I've used it so I don't remember the details precisely. As I remember there is a lookup on the MSBs of the numerator and the denominator, as I recall it only required two or three bits of each. With the two bit quotient approximation you subtract the product of the quotient and the denominator. I think you can can combine it with a Booth algorithm. As I said I don't remember all of the details, but as I recall it wasn't very hard to figure it out. I'll leave it as an exercise for the reader.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 2:56 AM

Ok. I picked this thread up because writting a portable intrinsic for FpgaC was on my list this weekend. And it's been a few years since I last had to do this too :)

All of the optimized software routings are optimized around processor and table lookup specific designs, which aren't exactly FPGA friendly. So I guess it's time to dig out the text books again :)

I wish all the FPGAs, even the smaller ones did have multipliers, unfortunely most of the small FPGAs on student boards just have LUT/FFs.

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 3:39 AM

Well, Spartan-3 and 3E have lots of multipliers, and they are about as cheap as any FPGA. Your wish has been granted... Peter Alfke

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 6:37 AM

Wow ... how did you make 6 years of smaller XC4K, Spartan and Spartan2 boards just disappear?

- P
- Philip Freidin
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 9:44 AM

A very detailed description of a twos complement divide including the management of status flags, and microcode listing for the AM2903 and 29203 bit-slice processor products is documented by AMD in their 2900 data book.

You can read about it (and see if I remember correctly) in this PDF file:

formatting link

(it is a 32 MB file)

on page 2-51

The 32 bit AM29332 and 29C322 non-slice microcoded ALUs also implemented these divide algorithms. (PDF at same site).

I think though that these are 1 bit at a time algorithms. These chips did do 2 bit at a time modified-Booth multiplies.

Another source of info on these multistep algorithms would be the AM29000 RISC processor documentation, that had both multiply and divide step instructions.

Philip Freidin

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 4:30 PM

Are you trying to be nasty, funny or whatever? I answered your complaint about the alleged lack of cheap boards with dedicated multipliers. I did not make anything older disappear. But those old boards have no dedicated hardware-implemented combinatorial multipliers. That's all. Peter Alfke, from home.

- I
- Isaac Bosompem
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 5:41 PM

The SuperH series of CPU's also have a divide step instruction.

- L
- logjam
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 10:43 PM

Thats right. I've learned a lot about microcode and processor instruction sets from trying to design my own ALU. For example, a compare instruction is usually a subtract without store.

Probably the coolest thing I've done yet is add FFFFFFFFFFFFFFFF to 0, apply the carry input, and watch the Altera simulator propagate the carries. :)

One thing I have that 1975 didn't, is a P4 3.2GHz PCB autorouter, and a way to test the design to pretty much guarantee it will work. :)

That sounds very interesting. Do you have a project web page journal or anything like that?

An interesting thing to think about...how long do you think it will be before through hole devices are discontinued? What will the electronic hobby look like? Computer programming classes today are in Java and visual basic. I don't think the local university even has C.

One last note, I live in Alaska. Its cold and snowing right now. What elese am I supposed to do? :)

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 11:08 PM

For high-pincount and/or high-performance circuits, through-hole has been dead for more than a decade. Through-hole means 100 mi (~2.5 mm) pin spacing, which becomes hopeless above 200 pins. Add to that the bondwire and lead-frame inductance, and it just cannot support today's top requirements of speed and signal integrity. The trend is to flip-chip, bonding the chip surface directly to the package, without any bondwires. That then also means ball grid arrays, and is great for professional assembly, but a killer for the hobbyist. I am really feeling sorry for this, for I grew up designing and soldering (ham) radio gear. The combined smell of hot solder and burned fingertips is still in my nose. But it is not for the future. Just like carburetors, generators, breaker points and grease nipples on cars. Fond memories, some not so fond... Peter Alfke

- I
- Isaac Bosompem
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Feb 26, 2006 11:33 PM

I read a site somewhere that had a couple of students build a full

32-bit CPU out of standard 74 parts. Obviously it was a huge power guzzler but it worked nonetheless!

For myself, I hand-built a Z80 SBC about 2 yrs ago, it still works today :) : I clocked the CPU @ 2.45Mhz (same clock into USART), have 2KB of flash and 32KB of RAM and a single 8-bit output port. It is a nice development system. I wrote some IEEE754 FP library in Z80 assembly. It was relatively painless since I am fairly comfortable with the x86 and scores other CPU's instruction set.

I just dont have the cash to purchase all the TTL components to do it your way (broke student syndrome, it is well documented :) ). But I did design a 16-bit CPU in VHDL and used the 200K gate Spartan3 to implement it, it is working fine too, just tested it last night in the FPGA. The thing is that the CPU I designed is relatively complex (its ISA definitely has some CISC elements to it) and only took about 30-40% of the space in my FPGA. So I have a TON of space to implement peripherals and the like. I plan on writing an assembler and C compiler for it. I definitely need to learn how those work. Also I used BlockRAM to implement the registers (banked) and the stack space so I took a speed hit (in terms of clks/instruction), but that is no problem, it was my first CPU design. Also let me make it clear that I DO NOT work for Xilinx. The Spartan3 devkit I bought was very affordable, I am very happy with the amount of features you get for the price.

Forge on, and show us pics, lots of juicy pics :)

- M
- Michael Hennebry
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Feb 27, 2006 12:43 AM

The long path shouldn't be any longer than 64 times the length of the short path. The latency wouldn't change much, but in a pipelined environment, the rate could change a lot.

If one knows the division is exact, one can make a divider similar to a ripple carry array multiplier:

formatting link

Arrange for the divisor to be odd.

If one calculates a reciprocal using Newton-Raphson, each iteration roughly doubles the number of significant bits. The amount of hardware needed to unroll the loop will not be much more than that required for the last iteration. The delay will also be dominated by the delay in the last iteration.

6 or 7 iterations should be sufficient. After taking the reciprocal, one must multiply by it. The amount of hardware required will be roughly twice that required for multiplication. The delay will also be dominated by the delay for the last iteration. I suspect Newton-Raphson is not the OP's idea of simple.

A Wallace tree is probably not best for multiplying large numbers. Given 64-bit factors and single-bit partial products, the Karatsuba formula is probably useful.

formatting link

- E
- Eric Smith
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Feb 27, 2006 2:51 AM

Huh? I'm writing this email on a computer that has a 940-pin through-hole processor with three gigabit interfaces to I/O subsystems. The pin spacing is less than 100 mils.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Feb 27, 2006 5:26 AM

The funny thing is that BGA brings SMT to the hobbiest, where high density flat packs have a pitch so narrow that parts can not be hand placed on paste, or easily soldered by hand, even with a stereo microscope that easily, but possible for smaller pin count devices like memory.

I've shown that home brew computer group here how to reliably solder BGAs, and reball them, so they can use salvage at low cost. Placing and soldering a 400-700 ball bga is a piece of cake, hand soldering the SDRAM and EEPROMS for the design is REALLY painful for TSOP's.

It's actually easier to do powerful hobby designs in BGA, than it was in dip parts.

- F
- Falk Brunner
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Feb 27, 2006 1:17 PM

Naaa. Some yeas ago I did my own Spartan-II demoboard, using a TQ144 package with 0.5 mm pitch. It was not so difficult. OK, you need some training and the right solder iron, but after a while if will be no big deal.

I wouldnt agree. DIP is still easier to handle, but sure, if you want to achive a given functionality, a BGA device this tons of power inside can easy replace a bunch of standard euro cards filled with leagacy DIP stuff.

Regards Falk

>

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Feb 27, 2006 3:22 PM

On a sunny day (26 Feb 2006 15:33:20 -0800) it happened "Isaac Bosompem" wrote in :

Hey, Z80 cool. I build a Z80 system in the eighties, needed an OS too, so I wrote a CP/M emulator for it, disassembler, practically any application soft you can think of, has even audio audio editor, and then wrote a multitasking kernel for the z80 that ran text windows and mouse... then the 64 kByte was full.

formatting link

diagrams are there too, the thing is in the attic, 2 euro card backplanes with CPU, IO (EPROM programmer), serial IO, DRAM RAM disk, VDU, more, cannot remember.... lots of plug-in Euro cards. But honestly I would not want to go back to Z80 today. Should take some pictures some day, probably the EPROMS are duff by now... For 1 M$ you can buy it and the rights to the CP/M emulator for embedded ;-) However the multitasker is still on 5 inch flop, and I have no way these days to make a copy.. That dz80 disassembler was actually one of my first C programs, and it shows.... People seem to be using it though.

- D
- Derek Simmons
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Feb 27, 2006 4:39 PM

I kind of came late to this party but you might be interested in the TI datasheet (if you can find it) of the 74ACT8867. It was a high speed math processor. Four of them were used in the GE6, geometry engine of SGI VGX graphics system (pre-Reality Engine).

If you can find the data sheet or anybody has it, I'd appreciate it if you would share it with me.

Derek

- L
- logjam
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Feb 27, 2006 9:12 PM

I soldered a PQFP144 package with one of those dull pointed radio shack irons. Used a lot of solder flux and dragged the tip along the 4 sides of the chip. Worked great. Out of the two chips I soldered I had one bridge. (which I found before the smoke test ;) ) I haven't had to do one of those in a few years, but now I have a much better solder statio so hopefully I could do better this time. :)

I was thinking about a transistor based computer last night. What kind of memory can I use in a transistor computer without cheating? Core memory? Should I just have a goal of "transistor CPU" and consider the ram a peripheral?

One guy built a relay computer but used a 62256 for ram. :)

- I
- Isaac Bosompem
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Feb 27, 2006 9:52 PM

shows....

Neat site, your system is far more complex than mine, from the schematics I can see you are using some form of memory banking? I opted out of using banking and DMA to make it easy for me to build. I will take a look at your source, thanks for sharing ! :)

-Isaac