Intel tri-gate finFETs

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 4:45 AM

What a wonderful idea. Just imagine the development of application, parts of which have to run on the completely different cores.

JFYI, I had to develop an application for TI OMAP DSP, which had to use the ARM and the TMS64xx cores at the same time. That wasn't my choice.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 5:44 AM

What, have there been no critical Windows secutity bugs discovered in the last 5 years?

John

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 6:24 AM

The problem is not the application design, but the host OS design. The application should be written for one platform and one OS.

Instead of running some foreign application in some emulator implemented in software or possibly firmware assisted instruction emulation), having different types of CPUs on a chip would allow throwing the application to an appropriate CPU code with a suitable virtual memory, based of the program origin.

Of course, some the lowest level kernel system services would have to be executed in the host OS environment.

- R
- Rich Grise
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 7:50 AM

Do you mean, besides it's a piece of crap? Whoever came up with that "segmentation" scheme should have been hanged by their thumbs.

Thanks, Rich

- R
- Rich Grise
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 7:52 AM

Linux is, if you don't have some IT weinerhead assigning passwords like "password."

Cheers! Rich

- R
- Rich Grise
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 8:04 AM

Oh, let's not get into a holy war (Dawg knows we've got enough of them on our hands already), but one thing I noticed was that the first two bits of the instruction were the "opcode," and for the "MOV" instruction, the second three bits went right directly to the "destination" data selector, and the last three bits went right directly to the "source" data selector.

I also was a little miffed that the resulting assembly language gave us "Move destination, source" which is kind of counterintuitive. Like, "Move the glass from the coffee table to the sink" kinda "feels" natural (maybe because I'm a native English speaker,) while "Move the glass to the sink from the coffee table" doesn't.

But that's about the extent of my complaints - my first "real" computer was a Scelbi 8H, 8008-based, and I learned to live with it. :-)

Cheers! Rich

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 3:47 PM

Multicore programming isn't that bad, because the communications latency is fairly small. SMP machines like most multicore CPUs have global cache coherence, so there are no big delays when two or more cores want to share memory objects. It's really no different from normal multithread programming, except that you have to pay attention to how much work each thread is doing.

Cheers

Phil Hobbs

(Who has been writing 32-bit multithreaded apps since 1992)

--
Dr Philip C D Hobbs
Principal
ElectroOptical Innovations
55 Orchard Rd
Briarcliff Manor NY 10510
845-480-2058

email: hobbs (atsign) electrooptical (period) net
http://electrooptical.net

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 4:33 PM

Not in Larkin fantasy world it isn't. He quietly forgets the need to do physical I/O, exchange data and synchronise between tasks.

I have some sympathy for the idea that the OS should make it impossible for tasks to damage each other and should kill anything that makes a fetch from uninitialised memory location stone dead. It would instil discipline.

But all of this can be done with existing virtual machine techniques...

If you have ever done any serious multiCPU processing you quickly learn that the highest priority task by far is the one that divides up the work amongst the rest. After about 8 CPUs it becomes hard to keep them all busy unless you have a problem that lends itself to vectorisation or some other easy parallelism. Graphics rendering CPU cores have been harnessed for cryptography research for instance. Chess engines are another area but there is a nasty law of diminishing returns there as you end up searching total dross lines more quickly that would be culled by the high level algorithm. Nodes searched per second rockets up but playing strength doesn't improve correspondingly fast.

He already has done. His is a "solution" looking for a problem. He knows he is right so he doesn't have to provide any evidence.

The most pressing need in modern CPUs at present are improvements to the set associative multilevel cache logic so that it doesn't slow down on large datasets with popular 2^N lengths. Using the lsb of addresses as a part of the cache index is fast but also causes problems.

Regards, Martin Brown

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 5:02 PM

Starting from i286, that was supported in the hardware. However, it turned out that nobody wants to bear the associated overhead at runtime, and nobody likes the additional layer of formality in the development.

The 286 dates 1982; I am pretty sure there were CPUs before 286 which supported for memory protection and virtualization.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 5:07 PM

The 286 was broken though--you had to reset the processor to make a ring transition. Real Rube Goldberg stuff.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal
ElectroOptical Innovations
55 Orchard Rd
Briarcliff Manor NY 10510
845-480-2058

email: hobbs (atsign) electrooptical (period) net
http://electrooptical.net

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 5:13 PM

Separating I and D space, and hardware protecting executable code from being paved over, was taken for granted on minicomputers ca 1975. For some reason, Microsoft's compilers insist on mixing code, data, buffers, and stacks in the same address space. I've never understood why. Microsoft's latest fix, randomizing program mapping, is pitiful.

Microsoft and Intel were both out of the mainstream of computing when they designed their stuff.

Maybe C++ dynamic allocations make space separation harder to think about.

John

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 5:22 PM

All those become easier if you assign a core to each, and have memory management hardware (controlled by the OS CPU) that enforces resource and memory access. Wintel systems tangle everything and protect very little. A VAX, or even a PDP-11, had serious hardware protections in

1980. Multicore would be even better.

Problem is, Wintel didn't take security seriously, and still barely do.

Multicore has serious power and speed advantages. Idle cores that aren't busy. No context switch overhead. Hardware sandbox processes so that no application error can hang the system or corrupt anything.

It's 2011 for Pete's sake. We can do better.

John

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 6:11 PM

I wouldn't call it a "ring transition." The 80286 required a reset to get back into "real mode." Once set into protected mode, the only way out was a hardware reset. It used the keyboard for that, sending it a command. It also used the calendar chip, powered by a lithium battery, to store a special byte that the BIOS could use to determine that the hardware reset was the result of such software and not a general reboot. So it involved the keyboard system (which, by the way, could also be used to download binary, executable code into the PC) and the clock/calendar chip to achieve. The delay in getting back into real mode was what slowed down access to protected memory.

Jon

- F
- Fred Bartoli
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 6:16 PM

formatting link

Look at AD

--
Thanks,
Fred.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 8:13 PM

There doesn't have to be a significant runtime overhead to these defensive tricks if it is hardware assisted - but there is a memory overhead since you need to keep track of read before write accesses. Even the 286 had array bounds checking instructions.

People decided that buffer overruns were a small price to pay for speed

- most of the 'Doze exploits rely on that design defect and careless coders. Some of the "cures" are mostly worse than the problem. Flat memory architectures do not help the situation at all.

If you want to be purist Harvard or segmented architecture where segments have to be marked as executable or as data gets around most of the executing data as code and vice versa.

It was horror story stuff and slow as hell to jump from real mode to protected mode and back. I don't think 286 had true ring transitions but it did provide something that would just about work on a good day.

It was the insistance of IBM marketting that OS/2 should support the legacy 286 AT hardware that made OS/2 both late and defective. And the pig headed S/3x guys who wanted to protect their own market share and so wanted PCs hobbled. It left the way open for Microsoft to go its own way with Windows to claim the pure 386 only high ground.

We are still paying the price for that design decision/error.

Regards, Martin Brown

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 8:17 PM

At least at this point I can understand the Intel design tradeoffs.

At that time, it was quite obvious that the 8086 segmented 20 bit addressing system was quite a mess.

With at least some (bad) 80286 virtualisation, it was assumed that the

8086 system would be used only for booting and after that, everything would be in some 32 bit addressing mode, without ever wanting to touch something 16 bit.

However, the commercial requirents demanded compatibility with previous 8086 codes.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 8:23 PM

With the current cost of silicon, why do you want to keep all the CPUs occupied ?

Isn't it enough to power down those CPUs that are not doing any usable work ?

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 8:35 PM

Practically any 1970s mini computer with memory management (segmented/paged) had some means of trapping references outside allowed addressable program memory range.

Regarding detecting uninitialized memory areas, some Burroughs mainframes in the 1960's initialized the data memory with core memory words with party errors. When a program attempted to read such location, it caused a memory parity error trap, which could be translated to a non-initialised-access trap.

- J
- John
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 8:41 PM

Nice one. How about =ABmagic bytes=BB like 0xDEADBEEF ?

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, May 9, 2011 10:21 PM

Exactly. Stop thinking of CPUs as valuable resources that have to be kept busy.

John