Pipelined 6502/z80 with cache and 16x clock multiplier

Brett Davis · 2010-12-20T02:25:49+00:00

EETimes had an interesting article asking if 4-bits was dead.These chips have been pad limited for 2 decades, and as such areprobably manufactured at fabs that are beyond obsolete.You could take a public CPU design like OpenRISC and replace theinstruction decoder, and get an easy ~4x performance jump running65c802/65c816 code.Compatibility with the Apple// disk controller would be poor. ;)But lets ignore that for the moment, Apple made some work aroundsfor the Apple2GS, so that can be fixed.Step 2 would be to add a boot loader to set the cache modes upcorrectly for your memory spaces, so everything is not write through.That will get you another ~2x speedup.One should be able to do at least a 16x local clock multiplier, especially if the base clock is a pathetic 2 MHz.That will get you another ~8x speedup.The end result will be a ~quarter of the speed of the native OpenRISCopcodes, due to being register starved, but close enough not to matter?Compatibility is important.The end result would still be pad limited and tiny, and made at astill obsolete if newer fab.Is there a market for a 6502 era CPU that ran ~10x faster at ~10% more cost?I think I just described the AVR8 family of CPUs, so the answerwould be yes...Brett- Merry Christmas

M

Morten Reistad 15 years ago

There is another feature at play here, AFAIR. When the indirect chain is interrupted the original instruction is stopped, and PC saved; and a context swap is done. When the interrupt is done, and the machine gets back to scheduling the instruction again, the whole thing has to start over, and evaluate it all over again.

With a sufficiently long chain on a sufficiently memory-starved machine this set of events may never terminate.

I certainly don't miss the quality issues with hardware from the era from before risk processors, raids and real networks.

-- mrr

Vote

N

nmm1 15 years ago

So you much prefer the current failure modes? Yes, they are much rarer, but typically FAR more evil when they occur - just as with modern versus older automobiles. If you do a proper cost-benefit analysis (i.e. using game theory, not benchmarketing), modern systems aren't as much better as most people think.

Some of that could be improved by proper documentation and not just some recipes to follow when all goes well, and more could be improved by putting more resources into better and more pervasive diagnostics, but some of the degradation is fundamental. Where timing problems were rare and obscure, now they are common and ubiquitous.

Even 40 years ago, it was EXTREMELY rare to have to cancel a whole project because of a failure mode IN PRODUCTION EQUIPMENT which couldn't be located or even reduced to a tolerable level, but nowadays it is merely unusual. In a few decades, it may even become common.

Regards, Nick Maclaren.

Vote

K

kym 15 years ago

In comp.arch snipped-for-privacy@cam.ac.uk wrote: ...

...

There is something to this. :)

A couple decades back embedded work was fairly straightforward. Components may have been trivial and slow but because of that hooking them together was generally straightforward and the "mental model" needed to get things to work as expected were simple, too. You didn't need (as now) to rely on masses of very buggy documentation to make progress.

I remember a few projects in the "early days" making microprocessors (Z80,

6809, 68k, and even the odd 8080 in the *very* early days) do things they were never "designed" for, and generally ending up with something that did a job reliably. There were still quite a few "undocumented features" you'd run across, but they maybe tended to provide shortcuts rather than roadblocks.

Just a couple years back I worked on an embedded system to provide simul data, SMS and multi-channel voice over a 3g network. Not only was the wireless module quirky (I am being charitable) with at least 50% of its executive summary functionality undocumented and maybe not-entirely-thought-out, but the large multinational responsible seemed uncooperative in getting our product past a prototype stage. If it weren't for some arm twisting from our arm-twising dept vis a vis some regional company rep the project would have foundered. Timing issues abounded, and the basic design of the module seemed designed to make the operation unreliable at best.

After various people assured us the provided documentation was completely up-to-date the regional rep managed to send us tantilising photocopies of clearly more recent documentation that described features we needed to co-ordinate operations. Not that it entirely worked as described. :)

We ended up just having to wear the concurrency issues and put in a few "grand mal" resets, numerous sleeps and timeouts with empiricly-determined max parameters etc etc at judicious points to try and discourage and then recover from various races, deadlocks and starvations.

The development of consumer-level products is largely a matter of stage magic. Provided the end user (or even your supervisor :) doesn't know exactly what your gadget is doing, it can *appear* to work fine. As in the music hall, a bit of misdirection in the form of a "simplified explanation" or 2, a few flashing leds, and a couple of potted "information messages" can convince observers the product is not only doing its job but miraculously exceeding design specs.

Just -- *please* -- don't look behind the curtain.

Generally, an empty answer. Try again. -- John Stafford , 08 Dec 2010 10:16:59 -0600

Vote

M

Morten Reistad 15 years ago

But if you do a proper systems analysis, they are. Because they are cheap, you can have multiple systems. With different components.

And we have tools to handle faults.

We can use raid for disks. And multiple power sources.

Done right, we can afford to throw one out.

One PPOE had a principle of _always_ having separate implementations of all critical systems, running as live as possible. I learnt a lot from that. We even found a floating point bug in hardware.

But the point you are making is important. The open hardware movements are important, because we need the transparency.

It is not just that the driver works with Linux. It is that you can actually see what it is doing.

And yes, we have to be a lot more proactive on this front.

-- mrr

Vote

M

Morten Reistad 15 years ago

It is evident that this poster never handled SMD or ESMD disks, large x.25 network devices, MAU-based token ring, or pre-internet multiplexing equipment.

the 6502 and the other 650x processors had a lot of surprises, and they were not exactly a showcase in terms of documentation.

Bad designs exist everywhere. But get the contract right, and they have to deliver, or perish.

The extreme top-down Telco model for implementation never worked. Not then, not now. Read rfc875 for an idelological handle on it.

Perhaps you are ready for the internet model of consensus and working systems now?

-- mrr

Vote

T

Terje Mathisen 15 years ago

That seems _very_ similar to

LEA EAX,[EBX+10]

on an x86, which also has separate address calc and integer execution paths.

Terje

- "almost all programming can be viewed as an exercise in caching"

Vote

T

Terje Mathisen 15 years ago

The last sentence is the key:

Yes, you _CAN_ have redundant systems with different components, but I have yet to see a single vendor who will ceritfy and/or recommend this!

Instead they want you to make sure that the harware and software is as identical as possible on each node, significantly increasing the risk of a common mode hardware problem hitting all nodes at the same time.

I.e. NetWare's System Fault Tolerant setup mirrored the state between two servers, so that the slave could take over more or less immediately (i.e. well within the software timeout limits). I always wanted those two servers to use totally separate motherboards, cpus, disk and network controllers, etc., but was told that the HW had to be identical. :-(

[snip]

That's very interesting, I'll have to get the full story from you at some point in time. :-)

Terje

- "almost all programming can be viewed as an exercise in caching"

Vote

N

nmm1 15 years ago

I am afraid that you have completely missed the point. To a very good first approximation, any problem that is localised within a single component is trivial; the hard ones are all associated with the global infrastructure or the interfaces between components. And remember the 80:20 rule - eliminating the 80% of the problems that account for only 20% of the cost isn't a great help.

Even worse, almost all of the tools to handle faults are intended to make it possible for a trained chimpanzee to deal with the 80% of trivial faults, and completely ignore the 20% of nasty ones. In a bad case, the ONLY diagnostic information is through the tool, and it says that there is no problem, that the problem is somewhere it demonstrably isn't, or is similarly useless.

Let me give you just one example. A VERY clued-up colleague had a RAID controller that went sour, so he replaced it. Unfortunately, the dying controller had left the system slightly inconsistent, so the new controller refused to take over and wanted to reinitialise all of the disks. Yes, he had a backup, but it would have taken a week to do a complete reload (which was why he was using a fancy RAID system in the first place).

He solved the problem by mounting each disk, cleaning it up using an unrelated 'fsck', manually fiddling a few key files, and then restarting the controller. Damn few people CAN do that, because none of the relevant structure was documented, and the whole process was unsupported.

I have several times had a problem where even the vendor admitted defeat, and where a failure to at least bypass the problem would mean that a complete system would have had to be written off before going into production. Each took me over a hundred hours of hair-tearing.

Well, yes. I regret not having access to a range of systems any longer - inter alia, it makes it hard to check code for portability.

I fully agree with that.

Regards, Nick Maclaren.

Vote

M

Morten Reistad 15 years ago

If you want real redundancy you need sufficient separation between systems, and transparancy in failover methods.

Today this eliminates tightly coupled systems, like raid controllers, "intelligent" switches and fancy hardware failovers.

Raid controllers, etherchannel, multiple power supplies and separate processors are used, but the performance issues are as important as redundancy. The redundancy, or really, extra uptime these bring are "nice to have", but not to depend on.

For real redundancy you need separate power, like in different main station feed, at least, separate network, physical separation.

A "trust me" tool, tightly coupled to the system, without transparancy, from a single vendor.

At what point in the deployment did this show up?

I keep Linux, Freebsd and Openbsd around. And in the cases where we only have Linux support I deliberatly install 64-bit systems in location A and 32-bit in location B.

-- mrr

Vote

U

upsidedown 15 years ago

One of my customers is using their systems capable of running on various HW platforms (both on Big as well as Little Endian) on different base operating systems and there should not be much problems in implementing the same functionality on different HW.

Using platform diversity did not create much interest, since after all, the same application level software would be used.

The only publicly known truly redundant software that I have heard of is the US space shuttle with more os less triple (voting) flight control computers and with a 4th independent computer programmed by a different team capable of (only) landing the space shuttle.

Vote

T

Terje Mathisen 15 years ago

I think this is a requirement for all modern fly-by-wire (commercial) airplanes:

I.e. you can have two identical sets of flight control computers which should normally agree. These are running the full-blown software, capable of optimally running the plane during all phases of a flight.

On top of this you have a totally separate implementation which is intentially kept as simple as possible: Just the bare minimum to let the plane fly and land.

The idea is obviously that as long as the primary computers agree, they stay in control, but when/if they don't, you have a fallback option which still works.

Terje

- "almost all programming can be viewed as an exercise in caching"

Vote

T

Tim McCaffrey 15 years ago

A 64 bit 7600 would be a Cray 1 :)

Seriously, tha max memory addressing of the 7600 was ~2 Megabytes (256K 60 bit words), small enough to put right on the chip, and without page table & tag lookups probably could be accessed in 10 cycles? (which would be consistent with the 6600/7600). Then you could treat DDR3 as ECS (ok, the interface would have to be changed, the old interface maxed at 4 Megawords IIRC (~32 Megabytes)).

The biggest performance improvement would be to re-think the PPs, but then you have a lot of code to rewrite, since a great deal of the OS was located there. Of course, the console code would have to be rewritten anyway, display technology has taken a different turn since then.

- Tim

Vote

P

Paul A. Clayton 15 years ago

On Jan 4, 4:40=A0am, Terje Mathisen wrote: [snip]

Of course, this will not properly handle a systematic error in the primary (identical) computers. (I suppose one might have an additional system that detects misbehavior and hands control over to the alternate control computer, but by the time such is detected the problem might be beyond the ability of a simple system to handle [perhaps the primary control computers would take over again after a reset??]. ISTR that there was a failure mode in which doing nothing for a brief time would return the system to a controllable state. [In an extreme circumstance--in which reasonable actions do not help and disaster seems unavoidable--random or counter-intuitive actions might be worth trying.])

Paul A. Clayton just a technophile

Vote

M

MitchAlsup 15 years ago

This is incorrect.

The shuttle has 5 computers. 4 are used as a 4-way voting system and one is used as a safety net. First failure it degrades to a tripple redundant system. If a second failure occurs before the first computer has recovered, all control passes to the 5th computer running different software. {All written in HAL}

Mitch

Vote

M

MitchAlsup 15 years ago

Maybe, maybe not.

What I was thinking was a machine with 3 register sets of 8 registers each. Each register being 64-bits wide. There would be a number of computation units, and instructions would be orchestrated with a CDC6600-like scoreboard mechanism, but the computation units pipelines like the CDC7600, all done with modern RICS-like pipeline timing.

Addresses are 64-bits, and an MMU creates a paged virtual memory environment of whatever size is appropriate for the intended application (about 50 physical bits in 2010).

Accompaning this computation unit is a barrel programmed I/O processor with a number of threads to run the perifferals, and do OS scheduling chores.

0 bit

ag

stent

ce

2

Yes, you could, but this would be a fatal disaster.

The reason Seymore stuck with 20-odd bits of physical address was that was as big as he could package to have fully pipelined short memory access latencies and full vector bandwidth. Memorie hierarchies no longer have those constraints.

Placing the main memory on die would seriously limit the kinds of problems that could be run.

n you

here.

I still happen to think the PPs were a brilliant idea. Leave the computations to the computation unit, an leave the mucky queueing and I/O polling to slow processors with minimal processing capabilities. Hide the entire OS in the PPs, so the CU just compute--getting interrupted when the OS has decided to compute something else. Then the CUs jump from full speed on one computation to full speed on another computation in one main memory cycle time (16 clocks).

In my conjectured 'processor' the PPs are also 64-bits with a barrel elngth of 16-ish steps.

Mitch

Vote

T

Tim McCaffrey 15 years ago

Well, it was mostly for for fun, not that I would try to sell this as the next great thing.

Michigan State had a largish systems programming group, they basically rewrote most of the Scope 3.4 OS (they called the result Hustler). One of the things they did was move alot of the OS back into the main CPU, it sped things up considerably. Even on the 7600 Cray must have had second thoughts, because the CPU could talk to the channels directly without the PPs getting involved.

64 bit PPs help, giving them interrupts helps as well. The CDC PPs did not have interrupts, boy was that a pain to program.

- Tim

Vote

B

Bernd Paysan 15 years ago

Hm, what's the point of that fifth computer?

Ok, I understand: The idea is the following: Alice, Dilbert, Wally, and Asok are four co-workers with lots of expierence in the field. If one of them disagrees (usually Wally, who once was a super-programmer, but broken by futility), Alice punches him hard to shut him up. If two people disagree, Alice's fist of death is already stuck in Wally, the fuzz doesn't die, and then the PhB decides (that's module five, running completely different software). Fatal failure guaranteed.

Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/

Vote

T

Tom Knight 15 years ago

Neither the PDP-6 nor any of the PDP-10 models had this problem. The

7094 did, however, and one could hang CTSS (the time sharing system) by executing an infinite indirect loop. Multics (645) could not interrupt and restart, but had a timeout trap. There were instructions on the PDP-10 which would never finish, even though legal and (on paper) terminating, since they could reference a pattern of pages which could never be in memory simultaneously. Since instructions restarted each time from the beginning, they could never complete execution.

WRT reliability, we ran a time shared KA-10 processor for periods of over six months, and took the system down for an update, rather than having it crash (this was with the ITS operating system). I find it remarkable that today's software is so chock a block full of fixable bugs. No one seems to actually look at what caused a crash any more.

Vote

K

kym 15 years ago

Now, *that* rings a bell. :)

[...]

Ever seen film of the Polar bear bashing through the ice to get seal cubs? Less ice more food for the Polar Bear -- george , 27 Oct 2010 15:55:37 -0700

Vote

R

Rob Warnock 15 years ago

+--------------- | WRT reliability, we ran a time shared KA-10 processor for periods of | over six months, and took the system down for an update, rather than | having it crash (this was with the ITS operating system). +---------------

TOPS-10 was often just as reliable. At the Emory Univ. Chem. Dept., we ran a time shared KA-10 processor running TOPS-10 for multiple periods of over six months, and in one instance, over a year. Note: We filed a complaint with DEC Field Circus over that one, since we had a service contract and they were supposed to do preventative service more often than that!! ;-}

+--------------- | I find it remarkable that today's software is so chock a block | full of fixable bugs. +---------------

My personal experience with FreeBSD has been pretty good:

$ uname -mrs ; uptime FreeBSD 6.2-RELEASE-p4 amd64 12:09AM up 401 days, 22:50, 19 users, load averages: 0.03, 0.02, 0.00 $

That machine runs web, mail, SSH, and DNS servers, and sits right on the 'Net with no hardware firewall.

'Course, the 2-hr UPS helps a lot... ;-} ;-}

-Rob

----- Rob Warnock

627 26th Avenue San Mateo, CA 94403 (650)572-2607

Vote

Pipelined 6502/z80 with cache and 16x clock multiplier

Join the Discussion

Didn't find your answer?