How to develop a random number generation device

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 1:47 PM

Where I work, we just did another install of SUSE Linux. We have a huge investment in DOS based code in the test department. Running "dosemu" under SUSE they work just fine. Under XP there were lots of problems. Under Vista there was no hope at all.

XP has a character dropping rate on the RS232 of about 1 in 10^5 to

10^7. This is much worse than the "its broken" limit on what is being tested.

XP also doesn't let DOS talk to USB to RS-232 converters. Under SUSE, it works just fine.

One of the machines is not on a network. XP seems to get unhappy if it is not allowed to phone home every now and then.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 2:06 PM

Dang, how do you get it to work that well? By waiting 15 seconds for all the characters to dribble in from the buffers?

John

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 2:20 PM

Yes, I think I mistook your point.

The way I had imagined it was that the registers of the virtual CPUs that are not currently running would be in a different place than the ones that are actually being used. My concern was not increasing the fan in and out of the busses on the ALU so that there would be no increase in the loading and hence delay in those circuits.

I also imagined the register exchanging having its own set of busses. Perhaps I was too worried about bus times and not worried enough about ALU times.

You may have a point here. I've never actually measured the sizes of such things. I was thinking back to the designs of bit slice machines.

The throughput continues to grow fairly quickly but you end up with a pipeline. When the circuit gets to a certain point, the stages become equivelent to a multiplier circuit.

BTW: There are four ways of getting to a sqrt() function. If you are doing it on a micro controller or other machince where dividing is very costly Newtons method is the slowest. If you have a fast multiply finding 1/sqrt(X) is much quicker.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 4:34 PM

As a general rule, I don't install OS patches either - they lead to too much trouble. When managing a network with windows machines, especially with a mixture of flavours, you just have to accept that they are vulnerable and ensure that you avoid bad stuff getting into the network

- there is no point in hoping that the latest windows patches will help. That means serious checking on incoming email (kill *all* executable attachments, and virus check the rest) with all other pop3 access blocked, a decent firewall (obviously not running windows!), and good user training backed up by nasty threats if anyone tries to do anything risky on the machines.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 4:40 PM

CPUs *are* a valuable resource - modern cpu cores take up a lot of space, even when you exclude things like the cache (which take more space, but cost less per mm^2 since you can design in a bit of redundancy and thus tolerate some faults).

The more CPUs you have, the more time and space it costs to keep caches and memory accesses coherent. There are some sorts of architectures which work well with multiple CPU cores, but these are not suitable for general purpose computing.

I would be very surprised to see a system where the number of CPU cores was greater than the number of processes. I expect to see the number of cores increase, especially for server systems, but I don't expect to see systems where it is planned and expected that most cores will sleep most of the time.

Multiple cores gives absolutely no benefits in terms of reliability or stability - indeed, it opens all sorts of possibilities for hard-to-debug race conditions.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 6:56 PM

There's not a lot I can do about Windows (I have to run it for some of the apps I use) but it's certainly worth $3K to have reliable hardware and drives. Every time a Dell dies, it costs me or one of my people a week or two to get everything back to where it was, and we're surely worth more than $3K a week.

The cool thing about raid hot-plug is that I can occasionally plug in a blank drive, and my C: drive gets cloned, OS and all. I stash the clone in a baggie. If my machine dies for any reason, I grab a spare box from down the hall, plug in the copy of C:, and I'm back online in

5 minutes.

And, once a year maybe, I plug a brand-new drive in one of the raid lots, so my drives never die from shear wear-out.

John

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 7:04 PM

Well, I remember 64-bit static rams, and 256-bit DRAMS. I can't see any reason we couldn't have 256 or 1024 cpu's on a chip, especially if a lot of them are simple integer RISC machines.

They don't if you insist on running a copy of a bloated OS on each. A system designed, from scratch, to run on a pool of cheap CPUs could be incredibly reliable.

It's gonna happen.

John

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 7:07 PM

Right. Move a lot of the functionality of the OS into hardware. Whether the 1024 CPUs are real hardware or pipeline tricks, similar to multithreading, we can count on the hardware to work right.

John

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 7:54 PM

Dear John, Try developing a perfect OS of your own. I did. That was a very enlightening experience of why certain things have to be done by the certain ways. Particular questions welcome.

1024 CPUs = 1048576 software interfaces and a hell of the bus arbitration.

The weak link is a developer. It is obviously more difficult to develop multicore stuff; hence it is a higher probability of flaws.

Especially if you remember about the 50-page silicon erratas for pretty much any modern CPU.

What do you think in particular would be better for a typical desktop applications?

You have to listen to the screams of the SEL software developers...

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 8:17 PM

I did write three RTOS's, one for the 6800, one for the PDP-11, one for the LSI-11. As far as I know, they were perfect, in that they ran damned fast and had no bugs. The 6800 version included a token ring LAN thing, which I invented independently in about 1974.

No worse a software interface than if each process was running on a single shared CPU; much less, in fact, since irrevelant interrupts, swapping, and context switches aren't going on. Each process absolutely owns a CPU and only interacts with other processes when

*it* needs to, probably through shared memory and semaphores.

As far as bus arbitration goes, they all just share a central cache on the chip, with a single bus going out to dram. Cache coherence becomes trivial.

Putting a few hundred RISC cores on a chip, connecting to a central cache, is easy. You only have to get it right once. In our world, incredibly complex hardware just works, and modestly complex software is usually a bag of worms. Clearly we need the hardware to help the software.

Intel, maybe. Are any of the RISC machines that bad? But my PC doesn't have hardware problems, it has software problems.

Oh, 256 CPUs and, say, 32 FPUs should be plenty.

Of course lots of software people won't like this. Well, they had their chance and blew it.

John

- J
- Joel Kolstad
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 8:56 PM

Sure it does (have hardware problems), Intel was just smart enough to allow patching of the microcode so that buggy features are worked around or simply not used. When the buggy simply can't be fixed, it's the compiler writers who get burdened with having to keep up which bugs are still present and insuring that code isn't generated that would expose them (very few people program x86 CPUs in assembly anymore!). Intel is also smart enough to do a lot of testing any time a new CPU comes out -- I'm sure there are still plenty of people there who remember the nasty FDIV bug, as well as the lesser-known problem with the first 1GHz CPUs that would randomly fail to compile the Linux kernel

I've been surprised at just how buggy a lot of OC software is if you actually start pushing it to its limits -- it's clear that much software today only gets very rudimentary testing. (And as I've stated before, I personally know "programmers" who believe they can claim on a progress report that they "finished such and such software" as soon as it *compiles*. :-( )

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 9:04 PM

You can certainly get 1024 CPUs on a chip - there are chips available today with hundreds of cores. But there are big questions about what you can do with such a device - they are specialised systems. To make use of something like that - you'd need a highly parallel problem (most desktop applications have trouble making good use of two cores - and it takes a really big web site or mail gateway to scale well beyond about

16 cores). You also have to consider the bandwidth to feed these cores, and be careful that there are no memory conflicts (since cache coherency does not scale well enough).

That's a conjecture plucked out of thin air. Of course a dedicated OS designed to be limited but highly reliable is going to be more reliable than a large general-purpose OS that must run on all hardware and support all sorts of software - but that has absolutely nothing to do with the number of cores!

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 9:17 PM

A shared memory interface for 1024 cpus? That's going to be absolutely vast, or have terrible latency.

I still don't understand why you think that interrupts or context switches are a reliability issue - processors don't have problems with them.

And I'd love to hear you explain to customers that while their web server has a load average of a couple of percent, they need to buy a second processor chip just to run an extra cron job. A single cpu per process will *never* be realistic.

"Just share a central cache?" It might sound easy to you, but I suspect it would be *slightly* more challenging to implement.

You are too used to solid, reliable, *simple* cores like the cpu32. Complex hardware is like complex software - it *is* complex software, written in design languages then "compiled" to silicon. Like software, big and complex hardware has bugs.

Yes, many RISC machines have substantial errata. The more complex you make the design, the more bugs you get.

What you seem to be missing is that although the cores on your 1K cpu chip are simple (and can therefore be expected to be reliable, if designed well), they don't exist alone. If you want them to support general purpose computing tasks, rather than a massive SIMD system, then you have a huge infrastructure around them to feed them with instruction streams and data, and enormous complications trying to keep memory consistent.

My desktop machine might well run more than 256 processes. How does that fit in your device? But most of the time, there are only 2 or 3 processes doing much work - often there will be 1 process which should run as fast as possible, as single-thread performance is the main bottleneck for desktop cpus.

- R
- Richard Henry
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 9:27 PM

That's because "Unit testing" is the next block on the Work Breakdown Structure.

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 10:37 PM

Doing so is the essence of a "buffer overrun exploit", one of the most common types of security vulnerability for code written in C/C++.

It allows a malicious user to make a program do something that it isn't supposed to do.

E.g. consider a program being run on a web server to process form input from a web page. If the program suffers from a buffer overrun flaw, simply sending the right data in a POST request can allow the attacker to execute arbitrary code on the web server.

Or a buffer overrun in a mail client could allow someone to run arbitrary code on the user's machine by sending them a specially crafted email.

This is one of the common ways that computer systems get "hacked".

Persuading a process to write outside of its allotted address space is harmless. The CPU will cause an exception and the OS will typically terminate the process. Even if it didn't, there's nothing there for it to damage. With modern hardware (e.g. 80286 and later running in protected mode), the address space of one process (or the OS kernel) simply isn't "visible" to another process.

- J
- Joel Kolstad
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 10:48 PM

True, but if you can manage to create a buffer overflow in a kernel process (the TCP/IP stack being a common target here, often implemented as a kernel-level driver), you have the keys to the kingdom.

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 11:01 PM

Yes; subject to the caveat that the term "buffer overrun" is normally used in reference to the exploitable case, where the overrun occurs between buffers within the process' address space. E.g. the wikipedia entry for "buffer overrun" or "buffer overflow" only addresses this case:

formatting link

Technical description

A buffer overflow occurs when data written to a buffer, due to insufficient bounds checking, corrupts data values in memory addresses adjacent to the allocated buffer. Most commonly this occurs when copying strings of characters from one buffer to another.

The case where the buffer is at the beginning or end of a mapped region, and the overrun attempts to modify a different region, is usually ignored. The OS will just kill the process, so there's no exploit potential, and it's statistically far less likely (most buffers aren't at the beginning or end of a mapped region).

What's to re-read? Exploitation requires the write to succeed, which requires that the overrun has to occur into memory which is writable by the task.

I know what *I'm* talking about, which is what most programmers mean by "buffer overrun", i.e. the exploitable case, not the segfault case.

The segfault case is uninteresting; it's a "solved" problem. The exploitable case is one of the main mechanisms through which computers get hacked. It's *the* main mechanism for most C/C++ code.

Indeed. But none of the current OSes are defective in this regard.

Windows 95/98/ME lacked memory protection on certain parts of memory for backwards compatibility (i.e. portions of the bottom megabyte were shared between processes and globally writable, for compatibility with real-mode (8086) code.

And this case isn't what people are normally referring to if they're talking about "problems", "vulnerabilities" etc of buffer overruns.

An attacker *may* gain control through this mechanism. Whether or not they can depends upon how the variables are laid out in memory and how the variables affect the program.

The point is that protection against this issue mostly has to be done by the language and/or compiler.

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 11:12 PM

I've done lots of all of those.

The linker only gets to "see" exported variables, i.e. global variables which aren't declared "static". The most common form of buffer overrun involves automatic (stack-based) variables, which don't exist outside of the compiler (i.e. they don't appear in the symbol table of an object file, library, or executable).

Depending upon the platform, the executables and DLLs may contain information on inter-object symbols (i.e. those exported by one object file and imported by another), or these may be eliminated during linking, leaving only those symbols which are required by the loader.

The former is the case on Linux ("nm -D" will list the symbol table), the latter on Windows.

As for memory management:

The process maintains a "heap" built from large chunks of memory, obtained from the OS via e.g. brk() or mmap(..., MAP_ANONYMOUS). The process then satisfies individual requests (malloc etc) from the heap. Memory which is released by e.g. free() is returned to the heap for later re-use; memory is seldom returned to the OS.

The OS only gets to see the bigger picture, i.e. brk(), mmap(), maybe munmap(), not all of the individual malloc() and free() calls.

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 11:15 PM

On the contrary, the Wikipedia entry for "buffer overrun" only mentions the case which I've been discussing, not the process isolation issue.

formatting link

- R
- Rich Grise
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Sep 17, 2007 11:15 PM

My God! You've got to quit using MICRO$~1 web servers!

Good Luck! Rich