Lack of bit field instructions in x86 instruction set because of patents ?

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 6:30 PM

...yet the newer processors don't offer any significant breakthrough in the computing performance compared to the x86s.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 6:45 PM

And the mispredicted branches are so expensive because of the huge pipeline required to process the x86 instructions.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- S
- Stephen Fuld
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 7:02 PM

I question the use of "far". Others here have said the overhead of decoding the X86 instructions as a few percent of the total logic. Besides, on a current desktop or server chip, the overwhelming part of the silicon is taken up with cache, not CPU logic. So I suspect that there would be some savings in logic and power, I don't think it would be "far". And there is some countervailing effect of the smaller instructions meaning more instructions in a given size I cache, so perhaps a higher hit rate. I suspect this effect is small, but it is something.

--
  - Stephen Fuld
(e-mail address disguised to prevent spam)

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 7:13 PM

Nonsense. All modern bleeding edge processors have long pipes. x86 has little to do with it.

- J
- Joel Koltner
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 7:32 PM

At their cores all the x86 CPUs are now RISC anyway -- they're just surrounded by circuitry that breaks apart the x86 instructions into "micro operations." Hence, yeah, for the ultimate in performance, RISC is not really any better than x86 (and whatever inherent performance advantage a native RISC design might have it probably offset by Intel's excellent manufacturing/die shrinking abilities). They do still suffer a bit of a power penalty though, but even there Intel is aware of their failing nad will steer you towards the Atom CPU which is quite respectable when it comes to performance per watt. (I wonder if we will see Atom-based phones and PDAs?)

I almost hate to say it, but for all the failed projects that Intel has had, they've done a much better job at continuing to manage and evolve their core x86 product line than most other companies, e.g., Microsoft and Windows. Even their marketing campaigns, while sometimes completely absurd ("the Internet was designed to run on Intel processors" -- say what!? When the Internet began I suspect the only Intel CPUs in use were inside of the keyboards hooked up to the mainframes and workstations!?), have been effective.

---Joel

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 7:50 PM

In the US alone, several gigawatt-level power plants are working 24/7 to overcome Intel's and Microsofts crappy designs.

More people would use "suspend" if it worked. More people would turn off computers if they booted up quicker.

John

- N
- nmm1
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 7:50 PM

Actually, "nobody" has a point. Architectural ugliness has very little to do with the instruction set and a great deal to do with the basic computational model. However, in this respect, many "RISC" designs are as ugly as the x86 :-(

Take, for example, floating-point and page table management (TLBs). A well-designed architecture ensures that functionally separate instructions can be executed independently. But almost all of them fail to carry that through to interrupt handling, so the first TLB miss or floating-point exception/fixup causes the pipeline to glitch! Or doesn't, and causes the FLIH to have the most disgusting hacks to cover that up, and that STILL leaves a race condition that can cause serious problems!

My understanding is that a lot of the logic is concerned with trying to combine aggressive pipelining/parallelism, while still ensuring that such problems don't cause chaos. Interrupts are just an extreme case, and there are a zillion others in most architectures, often at a much lower level.

Yes. But the x86 is bad there - not as bad as most "RISC" systems, true. A good design could probably cut the cache requirement very considerably - or make the current amounts more effective.

Regards, Nick Maclaren.

- T
- Terje Mathisen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 8:49 PM

I was talking about 16-bit code, check the register names!

In 32-bit mode nearly all the important restrictions were removed, making the cpu a lot more orthogonal.

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"

- H
- hanukas
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 9:29 PM

r.

PDAs,

They do; per unit of power consumed. Another roadblock: power consumption at high clock frequencies. Put these two together, go back a few years and make a roadmap: multi-core architectures emerge.

- J
- James Arthur
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 9:56 PM

"Suspend" used to work on my machine. Now it works if I turn off the DSL modem.

Grrrrins, James Arthur

- S
- Stephen Sprunk
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 10:10 PM

The CISC-to-RISC decoder consumes a negligible fraction of the silicon and power in modern x86 chips. It's the register renaming, out-of-order execution, and on-die cache that consume the majority of the silicon and power these days on all high (single-threaded) performance chips -- regardless of the ISA.

x86 isn't the liability that you think it is.

S

--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS          smart people who disagree with them."  --Isaac Jaffe

- S
- Stephen Sprunk
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 10:17 PM

"Suspend" works fine for me. Most of the time, though, I "hibernate" my machine rather than turning it off (in fact, it's been at least two years since I rebooted or shut down my machine other than when Windows Update forces me to), though some genius at Microsoft apparently decided "hibernate" should be disabled by default, so you have to go digging into system menus to enable it. They could also put in some improved logic that would switch from sleeping to hibernating after a period of inactivity, which would make both features easier to use and more user-friendly. And, of course, we would save untold amounts of power if the default screen saver configuration was set to sleep after an hour or two instead of bounce a Windows logo around the screen for days on end...

S

--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS          smart people who disagree with them."  --Isaac Jaffe

- P
- Phil Carmody
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 10:28 PM

Well, there were the Nokia 9000 and 9110 Communicators.

Phil

--
I tried the Vista speech recognition by running the tutorial. I was 
amazed, it was awesome, recognised every word I said. Then I said the 
wrong word ... and it typed the right one. It was actually just 
detecting a sound and printing the expected word! -- pbhj on /.

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 11:11 PM

Cache doesn't consume anywhere near as much power as the "active" parts of the CPU.

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Mar 11, 2009 11:41 PM

On this topic, I see many statements like "as much", "far less", "overwhelming" and so on. Those adjectives mean nothing.

Can anyone back up his point with the particular facts, figures and quotations of the sources of the information?

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Mar 12, 2009 12:09 AM

Nevertheless, he's right. Caches draw next to nothing, per unit area. Remember it's STATIC. Nothing is switching, other than the line currently being accessed. Leakage is far more.

No one is going to give specifics publicly, but he's right. Just think about it.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Mar 12, 2009 2:46 AM

be

wer,

e,

th

.

My EEPC boots quickly :>

- M
- MitchAlsup
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Mar 12, 2009 2:53 AM

y

ith

r.

r

Actually, the decoders in Opteron (pre Barcelona) (all 7 of them) are smaller than a single 4KByte chunk of SRAM. 4 are one-byte decoders used when the predecode information is not present, and the other 3 are the multi-byte instruction at a time superscalar decoders.

The out-of-order stuff (reservation-stations/reorder-buffer/future- file/LS1/2) are several times larger than all the computation circuits put together (like 5X)

The branch predictor and associated circuitry is larger than all the computational circuitry put together.

Take the pipeline flip-flops out of all the conputational circuitry (int, mem, float), and the total area of the computaonal ciruts is smaller than 4KB of SRAM. Leave the pipeline flip-flops in and the computation circuitry is still less than 8KBytes of SRAM.

New instruction idiom recognition decoders are even converting MOV + 2- op instructions into 3-op instructions so as to execute them in a single cycle; compare+branch is done similarly, and a few others.

x86 (the instruction set) is not as hard to decode as is SPARC V9+VIS (and whatever they may have done to it over the last 9 years).

x86 is not any liability whatsoever (excepting perhaps the legal chalenges that might be brought forth).

Mitch

- E
- Eric Northup
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Mar 12, 2009 6:14 AM

[...]

What about the semantic complexity of the x86 ISA?

I suppose all of the trickier instructions get delegated to microcode, but isn't there a cost imposed by segmentation or the density of loads and stores?

I know the CPUs can optimize for the "flat" segment model, and bypass the logic for bounds checking and adding base addresses. But just because you can bypass the complex logic doesn't mean it was free. Perhaps I overestimate how those costs add up, but it seems like circuitry that must be located ~1 clock cycle's wire delay from the load/store units occupies some prime real estate. Especially considering that you're hoping to never use it!

Similarly, I thought having an extra source of faults (segment limit check violation) contributes to the complexity of the out-of-order stuff. Those issues would be compounded by the load/store density of x86/x64 code - those have to be fast paths.

Out of curiosity, is the 4/8KB SRAMs you mentioned for size comparison the vanilla single-ported variety? Would that be including the vector unit?

-Eric

- N
- nmm1
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Mar 12, 2009 9:21 AM

One of the things that the IA64 got right in principle and wrong in practice was to try to simplify that area by making it more explicit. I still think that could be done - but not that way!

Indeed? It is, of course, a computationally intractable (in the CS sense) task.

Including a full, glorious, optimised IEEE 754 unit? Boggle. If one adds full support for denormalised numbers, exceptional results and (heaven help us) decimal floating-point, that will clearly go up, but not by a huge factor.

That fails to surprise me! I have always been a supporter of RISC, the principle, and very unimpressed with RISC, the dogma.

Grrk. Now, THERE I disagree. It's extremely unclear how to extend it to allow for scalable parallelism, except by the tried (and not very successful) heavyweight threading approach. Of course, the same remark applies to all of the current 'RISCs' ....

Regards, Nick Maclaren.