32bit multiplication using TTL logic

I want to design what would have been a super computer in 1975, using parts that would have been "easily" available. Several people have done this already, just not on an insane scale. My favorite is the Magic-1, which I'm currently designing PCBs for. It has been a GREAT learning experience studying the schematics from the Magic-1. I never thought much about what a computer did during the mysterious clock cycles between instruction cycles, and now with the microcode muxes, registers, it all just makes sense.

Any way, I need my computer to be very good at math. The magic-1 is a

16bit Add/sub/compare machine. I want to make mine capable of 32bit operations and have a 32 bit data path.

I'm starting with 32bit multiplication because it gets 32bit addition out of the way as well.

I threw together a diagram of a 32*32 multiplier using around 1040 AND gates and 400 74283 Adder circuits. This schematic would be what I think is called "asynchronous".

Once I completed my diagrams I remembered something horrible. There is a LOT of time spent waiting for the carry-in-out propagation in the hundreds of adders which leads me to building an adder out of XOR and AND gates. This comes at the added expense of power consumption board space, but should speed things up by about 250x. Could anyone suggest the best 1 bit addition "block" with carry? Its pretty late here and I'm starting to loose brain function.

Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to make a display...which I haven't finished yet... (building the drivers)

One last thought for those of you who, well, you know who you are. Idle, what would the power consumption of some 600 odd ICs be? 20ma per device? This just might make more heat than a pentium!

Now...for floating point... ;)

Reply to
logjam
Loading thread data ...

You really need (in this order) a girlfriend and an FPGA development kit.

Reply to
Anthony Fremont

In the good old days, as I remmeber, Mulipications /dvisions were done by sofwtare ( OS libraries ), Some Aritmetics unit did it Async, most of the time using some microcode or a hardwwre ( that's even older ) but none did it Sync ( 32 bits in parall ) Not sure butseems to to me 32bits* 32 bits = 64bits ??

Regards

Reply to
zoot

Bipolar Integrated Technologies, I think they were in Beaverton, Oregon or nearby, used to make pure high-speed floating point ALU chips. Might see if there is a white paper or two floating around from them.

Also, the PDP-11 had some entries added later on which supported floating point. It's possible that someone has already developed some Verilog or VHDL for such a beast and you might be able to look at the code for ideas.

Finally, this was posted over on comp.arch.embedded and it may help you think more closely about the class of individuals you are soon to join:

formatting link

;)

Jon

Reply to
Jonathan Kirwan

The fabled "Machina Analytica"! Terminal blocks for the address and data busses! Relays everywhere! St. Leibowitz and the Venerable Boedullos are smiling somewhere, Jonathan.

But is that a 32K X 8 SRAM I see?...

"...He turned to the community, spread his hands, shrugged eloquently.

" 'It's still not Him', he told them sourly, then hobbled away.

"Afterwards, there was little formality."

Chris

Reply to
Chris

The fabled "Machina Analytica"! Terminal blocks for the address and data busses! Relays everywhere! St. Leibowitz and the Venerable Boedullos are smiling somewhere, Jonathan.

But is that a 32K X 8 SRAM I see?...

"...He turned to the community, spread his hands, shrugged eloquently.

" 'It's still not Him', he told them sourly, then hobbled away.

"Afterwards, there was little formality."

Chris

Reply to
Chris

I found his web page just a month or two ago also. But the relatively short life of a relay worries me. I found a few auctions on ebay for

3000 relays at ~$200, but I don't want to hand wire such a thing. :) SOMEONE needs to make a relay based web server. Maybe it only says "Hello World", which would probably take about a minute or two...

When I thought about it last night I came up with a new solution. A compromise between speed and board space. This is my first experience with binary multiplication, so my first implimentation was very wasteful!

I wanted to have a 64bit adder circuit with parallel carry generation, but I forgot about fan-in/out. Getting to the end of the adder circuit would require a LOT of components too. Maybe 16bit parallel by 4 serial is more reasonable.

If I try to multiply 1111 by 1111 I should get 11100001.

Carry bit | multiplicand | multiplier

Start:

0 0000 1111 - Next Op, P0=1, add 1111 0000 to 0000 1111 0 1111 1111 - Next Op, shift right 1 bit 0 0111 1111 - Next Op, P0=1, add 1111 0000 to 0111 1111 1 0110 1111 - Next Op, shift right 1 bit 0 1011 0111 - Next Op, P0=1, add 1111 0000 to 1011 0111 1 1010 0111 - Next Op, shift right 1 bit 0 1101 0011 - Next Op, P0=1, add 1111 0000 to 1101 0011 1 1100 0011 - Next Op, shift right 1 bit 0 1110 0001 - 4 repetitions? yes Done

Requirements are a register to store the 32bit multiplicand, 64bit flip flop shift right register (carry out from adder gets shifted in), possibly a 64bit register to store the product/working value.

This would require the least amount of parts, but limit me to multiplication with the circuit. It would be pretty fast since it only needs a 32bit adder instead of a 64bit.

In the end it might be worth having a separate multiply / shift right circuit and then an additional add / subtract circuit. It all depends on the speed path of the full 64bit adder.

ALU flow would probably come from a fast ROM or even SRAM loaded by the host at boot-up.

Anything I've missed that could speed up this operation? I'm interested in reducing components, but not at a huge speed drop. The process will take 32 loops, so 10ns more per loop is a lot.

Reply to
logjam

What's really neat for me, I think... it's right here in my city!! So I may hope to meet him and watch it work.

Jon

Reply to
Jonathan Kirwan

hehe.

Nor would I.

Well, remember that each and every "bit" of memory requires a lot of space and handiwork to make. It would be excessively large, memory wise. I think anyone doing something like this, with relays, is going to be very conscious of getting the most from the least effort. And so a lot of thought will go into that aspect. (Which is why I'd like to visit, because I'm sure I'd learn a few things developed out of that kind of close thinking about where each additional relay added has the most payoff in the result.)

I think a lot of folks have been down the same path you are going, as well. Trade-offs between combinatorial and sequential approaches. Particularly, in this area of division, multiplication, and so on.

When thinking about an adder, though, why not take this to an extreme here and think about doing it with a single full adder and sequential logic? One bit at a time. Between that and a fully combinatorial solution lays a lot of fun.

First off, just by way of interesting information on the side, in multipliers there is a trade-off in combinatorial vs synchronous implementations. A 4x4's combinatorial equivalent gate area just about equals a sequential implementation of the same thing, in fact. Below a 4x4, combinatorial is actually less gates. Above 4x4, sequential is less gates. Your 4x4 suggestion here is right on the cusp. You can go either way.

Since you are talking sequential here, the common approach I've studied is Booth's algorithm and "shift-and-add." For shift and add, the structure looks like:

4 X ================================\\ \\\\ 4 ,-------, || Y =======>| YR | || '-------' /===\\ || ||4 // \\\\ || || || || || \\/ \\/ || || ,-----, ,-----, || || \\ \\ / / || || \\ \\ / / || || \\ \\/ / || || /\\ / || || / \\ / || || 1/ '----' // || / || // || / ||4 // 4 || 8 / || /====== || =========> ANSWER | || // || // | || // || //4 v \\/ // \\/ // ,---, ,--------, ,--------, 0-->| C |--->| ALUR |--->| XR | '---' '--------' '--------' /\\ || 0000

You start out through your execution control logic (not shown), by latching X into XR, Y into YR, 0000 into ALUR, 0 into C, and some sequential step counter (not shown) to an initial value of 4. That counter will have to be able to decrement and test for 0 (not shown.)

That is step (0).

Then the following logic is followed:

(1) if LSB of XR is 1, latch C:ALUR from ALU. (2) shift right, "0":C:ALUR:XR (3) decrement counter (4) if counter not zero, go to step (1) (5) Result is ALUR:XR

YR C ALUR XR counter (0) 1111 0 0000 1111 4 (1) 1111 0 1111 1111 4 (2) 1111 0 0111 1111 4 (3) 1111 0 0111 1111 3 (1) 1111 1 0110 1111 3 (2) 1111 0 1011 0111 3 (3) 1111 0 1011 0111 2 (1) 1111 1 1010 0111 2 (2) 1111 0 1101 0011 2 (3) 1111 0 1101 0011 1 (1) 1111 1 1100 0011 1 (2) 1111 0 1110 0001 1 (3) 1111 0 1110 0001 0 (5) ============

Is that about what you are talking about here? It only requires a

4-bit adder to produce an 8-bit output.

Of course.

Have you looked at Booth's?

I don't want to spend a lot of time thinking more about this. It's all been done, somewhere. Have you looked over the literature on the subject?

Jon

Reply to
Jonathan Kirwan

The IBM 704 did floating point with tubes!

formatting link

John

Reply to
John Larkin

I'd like to see it in operation, too. Obviously a rather extensive object lesson in building an ALU -- arcane, but very appropriate to his classwork. No one will ever accuse him of not taking his job seriously. ;-)

There are four relays for the clock -- I'm wondering if he's got a three-phase or two phase clock.

Clock speed = 10Hz? Mirth still lives in our world -- if nothing else, Professor Porter has a sense of humor.

Chris

Reply to
Chris

If you guys want to see/hear a much bigger relay computer, look here:

formatting link

Click on Downloads and then listen to all of the sounds. In one of them you can here the electric typewriter banging out the answer! :)

1500 relays vs 400
Reply to
logjam

Well, I've drawn the 4 bit adder in my schematic program and I'm not sure the board space consumption from a 32bit parallel carry adder are worth the very little speed increase.

I want to make sure that I'm using parts that were available in the late 70s. Does anyone know of a list of parts and their introduction date? Or an old databook in PDF format?

Reply to
logjam

formatting link

Cheers! Rich

Reply to
Rich Grise

Look up "Carry Lookahead". :-)

Have Fun! Rich

Reply to
Rich Grise

logjam wrote: : I want to design what would have been a super computer in 1975, using : parts that would have been "easily" available. Several people have : done this already, just not on an insane scale. My favorite is the : Magic-1, which I'm currently designing PCBs for. It has been a GREAT : learning experience studying the schematics from the Magic-1. I never : thought much about what a computer did during the mysterious clock : cycles between instruction cycles, and now with the microcode muxes, : registers, it all just makes sense.

: Any way, I need my computer to be very good at math. The magic-1 is a : 16bit Add/sub/compare machine. I want to make mine capable of 32bit : operations and have a 32 bit data path.

: I'm starting with 32bit multiplication because it gets 32bit addition : out of the way as well.

: I threw together a diagram of a 32*32 multiplier using around 1040 AND : gates and 400 74283 Adder circuits. This schematic would be what I : think is called "asynchronous".

: Once I completed my diagrams I remembered something horrible. There is : a LOT of time spent waiting for the carry-in-out propagation in the : hundreds of adders which leads me to building an adder out of XOR and : AND gates. This comes at the added expense of power consumption board : space, but should speed things up by about 250x. Could anyone suggest : the best 1 bit addition "block" with carry? Its pretty late here and : I'm starting to loose brain function.

: Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to : make a display...which I haven't finished yet... (building the drivers)

: One last thought for those of you who, well, you know who you are. : Idle, what would the power consumption of some 600 odd ICs be? 20ma : per device? This just might make more heat than a pentium!

: Now...for floating point... ;)

While some others have suggested changes like using a carry-lookahead adder for the final carry-propagate adder in the multiplier, I would suggest that you also look into Booth encoding (the two techniques can both be used.) At the expense of a little bit of extra complexity to generate the partial products, you can cut the number of rows of partial products approximately in half. As far as using a CLA for the final adder, I think that that's a good idea, because, if you are hell-bent on building this out of TTL parts,there is a 4-bit TTL carry-lookahead block (74x182?, if I recall.)

So, a Booth-Array Carry-Save portion, followed by a Carry-Lookahead Carry Propagate portion, will be fast, and do-able, if you're really hell-bent on making this out of TTL parts. I also agree with the other poster that said that you really should get yourself a FPGA development platform, but whatever floats your boat!

Have Fun,

Joe

Reply to
<jwelser

Yup. Crazy. Nuts. Certifiable. Bonkers. Wacko. Insane. ;-P Stupid? Remains to be seen. ;-D ;-D ;-D

Get the integers going first! ;-)

[enlightening discussion of carry techniques snipped]

I've drawn a schematic using Xilinx's WebPack, of a 4x4 multiplier:

formatting link

The ANDs are just plain ANDs, and the ALU-looking thingies are nothing but 1-bit full adders.

And I've been procrastinating on doing enough of their tutorials to figure out how to do the guzintas[1] and the guzoutas[2] and schtuff. ;-)

Cheers! Rich [1] Goes Into [2] Goes [comes] Out Of

Reply to
Rich Grise

Do you think I will gain anything useful while trying to use the fewest parts to generate the result fast? From what I've seen so far of some FPGA desings they're pretty wastful "because you can".

Reply to
logjam

I think I've fallen in love with 74S274 and 74S275. I don't think the

130ns for a 32 bit product includes addition, maybe it does. I wonder why these parts had to be killed off???

The parts are pretty expensive...$1.99 for a 274 and 2.99 for a 275, but compared to a few hundred logic gates??? Using these chips I might be able to get more than a million multiplications per second.

Reply to
logjam

Here some links I found helpful:

formatting link
formatting link
formatting link

I think I like the combinational method used in the 1976 databook. Very fast, exceeds everything else I've found.

I have two choices...use parts FROM 1975 for the circuit, or make it out of modern 283s, etc. Using the origional 74S parts I estimate 2A power consumption.

I've found list prices at:

74S274s for 1.99, I would need 64 of them 74S275s for 2.99, (I haven't quite figured this one out yet, somewhere around 15)

I would need 4 74F283s for every 74S274, but they're pretty cheap. In the end it would be smarter to go with 74S274s even though 256 16 pin devices on a PCB would look cool. I could go either way on this one. :) According to the 1975 ttl book a 74S283 completes a carry in 11ns and a 74S274 4x4 in 45ns, so I assume the 74S274 is pretty much 4

74S283 and 16 AND gates The critical speed path is carry-in-out on all the 64 partial products is typ 5ns for all the AND gates and then 34ns through the adders (maximum of 7.5ns AND and 48ns adders).

As long as the circuit would work fine with the parts from the 75 I'm happy. I don't need to buy a bunch of NOS chips, but I want to make sure I account for the delay of the old chips.

I'll work a little more on this and post some schematics that hopefully one of you can look over. :) This is pretty exciting for me. :)

Reply to
logjam

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.