How to develop a random number generation device

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 1:24 PM

[...]

Beware of anything that is claimed to lead to better programming. When Intel introduced segmentation on the 8086, they said it improved program modularity etc. At that time I suggested that the program counter should have been made such that it decremented to help with top down program design.

One thing about Windows and products like that is that they have to make buggy code in order to be able to sell upgrades. At one time I had a toaster with a bug in it. I got tired of turning it upside down and jiggling the lever to get the system back to its "idle state". I upgraded to a new toaster. Without the bug, I'd still be using that old toaster.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 1:39 PM

I was impressed by the COPS processor that used a pseudo-random shift register as the program counter. That was all-over-the-place program design.

John

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 2:25 PM

I agree entirely. People don't seem to appreciate how complex modern software is. And for that matter just how difficult it is to write absolutely bullet proof code that will never fail no matter what the provocation.

It is actually a better metric for deciding on the number of test cases needed to excerise every path in a complex decision network at least once. Essentially it gives a path complexity count of all the control flows through the code.

formatting link

and

formatting link

It should be better known in the industry.

I like McCabes CCI which I find a *very* good indicator of code likely to contain bugs. It comes from a graph theory analysis of the decision nodes in a routine. Although I disagree with the proponents of this metric about exactly where the thresholds should be placed. It is a good way to find dangerous spaggetti code sections in an inheritted large project without having to read through everything. And a good way to check for future maintainence traps.

I can pretty much guarantee that above a certain size or complexity there will be bugs in a given routine. You will get more hits Googling with the longer "Tom McCabe" and "cyclomatic complexity index". Sadly it is yet another useful tool ignored by the mainstream. "McCabe's CCI" gets mostly my own postings and a medical usage.

Regards, Martin Brown

- R
- Richard Henry
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 2:35 PM

If Microsoft followed their own advice (see Code Complete by Stevew McConnell) , they could develop acceptably bug-free versions of their operating systems. However, that would require them to follow a pattern of testing and bug repair before relaease that would mean we would still be waiting for the release of Windows 98.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 3:40 PM

"Complex" hardware systems are bug free for small values of complex.

The OS/2 Team more or less followed that model and did ship an OS that after a short while was virtually bullet proof and bug free. IBM delayed shipping the Presentation Manager GUI until it was (nearly) right.

MS Windows shipped to mass market bugs and all - the rest is history,..

IBM also confused the market by linking OS/2 to their new brand of PS/

2 hardware with a proprietory lock-in MCA bus (anyone remember that?). We only bought Compaqs & Dells afterwards.

I do believe that the MS programmers are for the most part very bright guys, but that the process is flawed. Senior management claim to endorse bug free quality, but their bonuses will always depend on the bottom line.

Regards, Martin Brown

- R
- Rich Grise
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 4:38 PM

Well, that's where the "design" part comes in. :-)

OK, fair enough.

Well, better tools might help, but the best solution would be to hire programmers who actually give a hoot about the quality of their work product. :-)

Cheers! Rich

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 8:32 PM

You have repeatedly said that current OS's (software OS's running on one or a few cores) is inherently unreliable, while your idea of a massively multi-core cpu running a task per core would be totally reliable. As far as I can see, you are the only person who believes this. If I've misunderstood (either about your claims, or if you can show that others share the idea), please correct me.

No it isn't. At best, you can compare apples and oranges and note that a ram chip is more reliable than windows, despite the former having more transistors than the later has lines of source code.

We agree that typical hardware design processes are more geared to producing reliable and well-tested designs than common software design processes. But that does not translate into a generalisation that a given task can be performed more reliably in hardware than software.

Perhaps "guarantee" was a bit strong - but you stated confidently that your 1024-core one-core-per-task devices were "gonna happen".

This is beginning to sound a lot more like a practical system - devices exist today with several specialised cores, particularly in the embedded market. Arguably graphics cards fall into this category, as do high-end network cards with offload engines. But that's a far cry from your cpu-per-thread idea, and it is done for performance reasons - *not* reliability.

Forget windows - it's a bad example of an OS, and a it's an extreme example of unreliable software. There is no "Microsoft big OS" model - they just have a bad implementation of a normal monolithic kernel OS.

There are uses for computers based on running large numbers of threads in parallel - the Sun Niagara processors can handle 64 threads in hardware (running on 8 cores). But these do not use a core (or even a virtual core) per thread - the cores have context switches as threads and processes come and go, or sleep and resume. Clearly you will get better *performance* when you can minimise context switching - but no one would plan for a system where context switching did not happen. There is nothing to suggest that the system could be made more reliable by avoiding context switches, except in the sense of reliably being able to complete tasks at the required speed - it's a performance issue.

Perhaps I didn't explain it well, or perhaps you didn't read these posts

- it's hard to follow everything on s.e.d.

The problem with so many cores accessing a shared cache is that you have huge contention for the cache resources. RAM cells get bigger, slower and more complex the more ports they have - it's rare to get more than dual-ported RAM blocks. So if you have 1000 cores all trying to access the same cache, you're going to have huge latencies. You also need complex multiplexing hierarchies for your cross-switches - as each cpu needs to access the cache, you basically require a 1000:1 multiplexer. Assuming your cache has multiple banks and access to some IO or other buses, you'd need something like a 1000:10 cross-switch. That would be really horrible to implement - you'd need to find a compromise between vast switching circuits and multiple levels introducing delays and bottlenecks.

Here's a brief view of the Niagara II - your device would face similar challenges, but greatly multiplied:

formatting link

If each core has an L1 cache to relieve some of the pressure (without it, the system would crawl), you then have a very nasty problem of tracking cache coherency. Current cache coherency strategies do not scale well - they are a big problem on multicore systems.

With existing multiprocessor systems, it is the cache and memory interconnection systems that are the big problem. If you look at high-end motherboards with 8 or 16 sockets, the cross-bar switches that keep memory coherent and provide fast access for all the cores cost more than the processors themselves. Building it all in one device does not make it significantly easier (although it saves on some buffers).

There are alternative ways to connect up large numbers of cores - a NUMA arrangement with cores passing memory requests between each other would almost certainly be easier. But you would have very significant latencies and bottlenecks, a very large number of inter-core buses, and you'd still have trouble with the L1 cache coherence.

With a new OS, and certain significant restraints on the software, you could perhaps avoid many of the L1 cache coherence problems. In particular, being even more restrictive about memory segments would allow you to assume that L1 data is private, and thus always coherent. For example, if all memory came from either a read-only source for code, or was private to the task using it, then you'd have coherency. You'd need a system for read and write locks for memory areas, with a central controller responsible for dishing out these locks and broadcasting cache invalidations when these changed, but it might work.

However, you've lost out on a range of requirements here. First off, your cores are now far from simple, and the glue logic is immense. Thus you have lost all hope of making the device cheap and reliable. Secondly, you've still got significant latencies for all memory access, slowing down the throughput of any given core, crippling your maximum thread speed. The bottlenecks don't matter so much in the grand view of the device - the total bandwidth to the cpus should still be more than if it were a normal multi-core device. Thirdly, you've lost compatibility with all existing software - it won't run most programs, as they rely on being able to have shared data access.

Yes, that's about it. To be more precise, it will be impractical for general purpose computing because it won't run common general purpose programs. Even with the required major changes to the software and compilation tools, and without the cache restrictions mentioned earlier, it would run common programs painfully slowly.

A microkernel *may* be more reliable because of its modular design - each part is relatively simple and communicates through limited, controlled ports. That's far from saying it always *will* be more reliable. Much of the theoretical reliability gains of a microkernel do not actually help in practice. For example, the ability of low-level services to be restarted if they crash is useless when the service in question is essential to the system. Thus there are no reliability benefits from putting your memory management, task management, virtual file system, or interrupt system outside the true kernel - if one of these services dies, you're buggered whether it kills the kernel or not. A similar situation is found in Linux - because X is separate from the kernel, it can die and restart independently of the OS itself. But to the desktop user, their system has died - they don't know or care if the OS itself survived.

Most of the benefits of a microkernel can actually be achieved in a monolithic kernel - you keep your services carefully modularised, developed and tested as separate units with clear and clean interfaces. It's a good development paradigm - it does not matter in practice if the key services are directly linked with the kernel or not, since they are all essential to the working of the OS. About the only way a microkernel improves reliability is by enforcing this model - you are not able to cheat.

What *does* make sense is keeping as many device drivers as possible out of the kernel itself. Non-essential services should not be in the kernel.

You underestimate the power of software bugs - you'll *always* be able to crash the kernel!

The context switches in this case are completely irrelevant to reliability. The issue with microkernels and context switches is purely a matter of performance - they cost a lot of time, especially since they involve jumps to a different processor mode or protection ring. If you want to produce a cpu that minimises the cost of a context switch through hardware acceleration, then it would definitely be a good idea and would benefit microkernel OS's in particular. But it's a performance improvement, not a reliability improvement. Other hardware for accelerating key OS concepts such as locks or IPC would help too.

So do I - but we both make and sell practical solutions which are a step beyond our competitors. We would not try and sell something that seems a revolutionary new idea at first sight, but terribly impractical to implement and lacking the very benefits we first thought.

There's nothing wrong with dreaming, quite the opposite. But you have to be able to see when it is nothing but a dream.

mvh.,

David

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 9:10 PM

That's probably true. Sun will soon be shipping 8-core, multithread processors. Looks like the number of cores per chip is at least doubling every year, now that clock speeds are no longer the holy grail.

So, in 5 years, with 8 * 2^5 = 256 cores, running maybe 1K threads, why context switch?

Well, the world is ready for OS reliability.

Exactly! Except there is no context switch overhead.

Thirdly, you've lost

Exactly! We can't run .NET forever.

Circular reasoning. Why aren't we still running 1401 code?

If the cache throughput is the limit, you get the same amount of computing no matter how many CPUs are running. CPUs can also have a little bit of local instruction cache, since code does not have to be kept globally coherent.

No. Not if it's small and correct, and it's absolutely protected by the hardware, and it runs on a CPU that runs nothing else. I've written RTOS's that never crashed.

Do I have to give all that money back? Roughly $200 million so far.

John

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 10:07 PM

The programmers don't normally get to choose the budget or deadlines.

The choice of "done now" vs "done right" is usually based upon which one is more likely to result in you keeping your job. Most of the time, "done now" wins.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 20, 2007 10:43 PM

Cores don't scale like that - Sun have done pretty well with their 8 core 64 thread cpu. I don't imagine we'll see so very many more cores on a device, because the interconnections and cache scaling get too difficult, and the whole device is too limited in memory bandwidth. It could get more, but why would anyone bother when there is a better way? The sort of application that works well with these devices is multi-process (or multi-thread) server software, such as web, email and database serving. These also scale well in clusters - there is little point in trying to run one OS on a 64 core machine when you can just as easily run 8 OS's on 8 8 core machines and get the same performance without anything more sophisticated than standard network connections.

If you are looking for a high performance server today, you can buy rack units with 4 separate PC's, each with 1 or 2 sockets for 4 or 8 core SMP. Load them all up with Linux in a cluster, and you have the same processing power as a 32 core system at a fraction of the cost.

And unlike in a single massively multi-core chip, such clusters can have redundancy built in - thus greatly increasing their reliability.

As a future prediction, I would expect to see motherboards with multiple independent PC's on the one board, designed specifically for this sort of cluster. I also expect to see virtual Ethernet links on these boards.

And as for your context switch obsession, you do realise that in an SMP server system, context switches waste only a fraction of a percent of the processing power? On a web server with 64 virtual cores, you'd expect something like a few thousand processes to be alive at a time, but most of them will be sleeping - the main worker threads will occupy the cores with very little context switching, while the sleeping threads can run as needed.

In other words, the organisation that exists today works perfectly well. There are plenty of things that can be improved in the software and hardware, but nothing is fundamentally broken (except perhaps the software development methods of many companies).

On the desktop side, there will be a gradual shift towards more multi-threading software for processor intensive applications like games and media converters.

There *are* reliable OS's available today - they just don't begin with the letter "W". There is plenty of unreliable software that runs on these OS's, but *nix systems designed for servers are solid (as are VMS, Netware, and many embedded OS's).

What you are trying to say, I think, is that the world is ready for reliable desktop software - that's a very different matter, and one I'd agree with.

So what? In all practical measurements, there is no context switch overhead in SMP systems today - it all drowns out in comparison to delays in I/O. If you are running cpu-intensive work on a desktop with one core and fast pre-emptive switching, then the switch overhead can be noticeable - but not on a server.

Who would want to run .NET at all, especially on a server?

I'm talking about compatibility at the source code level and above (i.e., the design of the software, and the way it works), not the object code. Many essential building blocks of the internet are based on code

15 years old or more - the *nix architecture is 30 years old or so. Any hardware that can't run this *kind* of software (even after modifications) can't run common general purpose software.

Code does have to be kept globally coherent (though it is easier to do so than for data), and cores can't keep running without data.

But you are right about the bandwidth limitations being a similar problem for having a few cores or many. It will be less of an issue for the few core device, since you'd have fewer latencies in switching all the data around the device. And if your 256 cores cannot do more real work than 4 cores could - what is the point in having them? Please don't just repeat that it avoids context switches - the tiny advantage that might give does not outweigh the costs of the rest of the device.

You can make systems small and correct when they are doing a limited and well-understood job - that's why we can make embedded systems that are reliable. It is also possible to make big systems that are correct and reliable, if you do it well enough (look at mainframes). But dividing a complex system into parts does not by itself make it more reliable - it only makes it easier for the developer to use solid development and test methodologies.

I'm not saying that smaller kernels and better protection through more advanced hardware are not helpful - merely that they are not a magic bullet (who cares if your RTOS never crashes if the application running on it dies? It's the whole system's reliability that is important), nor are they essential.

Only if you've sold them nothing more than dreams!

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Sep 21, 2007 1:18 AM

It's going to take a *LOT* more than a clock cycle. You have to find all the data in the file and you can't broadside that much data.

Why bother then? If you're giving it that much dead time simply do things serially. You're essentially allowing time for a complete context switch.

Because that's how it's done? You have another source/destination accessing the register file.

What does an amp matter at > Like many problems, start with a lookup table.

There is nothing in an 8051 that can be considered "fairly quick". ...and I rather like 8051s.

--
  Keith

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Sep 21, 2007 2:35 AM

One of the ones for video games had a whole bunch more pseudo-random circuits. It had timers and sound generators and all sorts of stuff using that method. The sound one had a very long complex pattern and a few much less ones. You could make all sorts of weird noises by programing it back and forth among the different codes.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Sep 21, 2007 11:24 AM

MooseFET snipped-for-privacy@rahul.net posted to sci.electronics.design:

A rather minimal article but the basic definition is there.

formatting link

The first result from a google search for cyclomatic complexity.