Lack of bit field instructions in x86 instruction set because of patents ?

- C
- Chris M. Thomasson
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 1:17 PM

POSIX threading model aside for a moment... C++ will finally allow an expert to create highly efficient portable non-blocking algorihtms that can indeed scale up to 32 cores and beyond. C/C++ are very versatile. You can use C/C++ and highly platform specific techniques to create user-space RCU today. As you know, RCU can scale to a boatload of processors and is NUMA friendly.

I am all for NUMA models that have _very_ weak cache coherency mechanism; AFAICT, its basically the only way to scale. Luckily, for me anyway, C/C++ can address these architectures quite nicely.

What type of threading model do you have in mind?

:^o

- C
- Chris M. Thomasson
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 1:21 PM

However, I agree that there are very few experts that can actually create these types of exotic algorihtms. I personally don't have a problem, and have been creating and implementing scaleable synchronization techniques for years, but that puts me in a fairly narrow minority. Oh well.

;^(...

- N
- nmm1
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 2:01 PM

Yes and no. The answer to your other remark is the killer:

I am all for NUMA models that have _very_ weak cache coherency mechanism;

Oh, no, they can't! That's precisely the problem. The new C++ standard will move some way towards that - but ONLY if you use none of the C features in C++ (including cstring and, even worse, some C++ features that inherit their semantics from C).

The point is that there is no consistent consistency model for either C or POSIX, and the C++ addresses only the pure C++ aspects. Unless it has been vastly extended since I tried to get that aspect addressed.

Regards, Nick Maclaren.

- C
- Chris M. Thomasson
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 2:39 PM

This is a _major_ cop out, but I do indeed make heavy use of compiler and architecture specific techniques/guarantees to get the job done. For instance, I create most of my sensitive synchronization algorihtms in externally assembled libraries and link them into a C program, with link-time optimizations turned off course. So, I should of really said that assembly language, and some specific C/C++ compilers (e.g., GCC) can be used to address NUMA models with weak CC. You can get some degree of portability this way, but its definitely not fully portable in any way, shape or form. It can be a pain to port synchronization algorihtms to new architectures because I have to rewrite all of the damn assembly language files, and then _hope_ a C compiler that gives me the guarantees I need will be available. Basically, if you like to juggle running chainsaws, and you have patience, you can use C/C++ and ASM to bring great scalability, throughput and performance characteristics to concurrent programs.

Yeah. I am mostly interested in the fairly fine-grain memory barriers, specifically the relaxed barriers and data-dependant loads, that should be incorporated into the standard. It pleases me to know that Paul E. McKenney is giving his advise in the development process...

POSIX guarantees absolutely nothing if you don't use locks to guard any access to shared data. So, if you follow the standard, it can be extremely difficult to scale. There are some things you can do, but they have there limitations:

formatting link

This seems to scale better than most native POSIX rw-locks, however, the overhead in the write access is increased. Or:

formatting link

This allows for concurrent mutations, however it has limitations wrt traversals:

formatting link

I could easily use RCU to manage the traversal, but then I lose all sense of portability. Therefore, I conclude that its very difficult, if not impossible, to scale using 100% pure PThreads.

Yes; your right.

I haven't been following the cpp-threads group lately. I have made some comments on that list. Sadly, it does seem like there are a few people on there that don't see a need for fine-grain membars; they seem to think that sequential consistency is all that is needed because the only programmers that would ever use fine-grain barriers are hard core thread monkeys that are few and far between. Its good that Paul E. McKenney has seemed to successfully convinced them that data-dependant load barriers are an essential tool.

- N
- nmm1
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 2:58 PM

Yes, that describes the situation. But what you are really doing is using C/C++ as a syntactic harness for some wholly implementation- dependent semantics. That's where C started, after all :-)

That can also be done in any other language, and used to be done very extensively in Fortran. But there's no way that it will come back to the mainstream - damn few people can handle that sort of thing (and, yes, juggling running chainsaws is the right analogy).

Regards, Nick Maclaren.

- S
- Stephen Sprunk
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 5:04 PM

All of MSOffice, Outlook, etc. are written in C. All those SQL servers are written in C. All of those web servers and web browsers are written in C, and the Perl/PHP/Python/ASP/etc. scripts that the servers are running are being interpreted by a program written in C. The OSes that they're running on are written in C.

Most people's computers spend the vast majority of their time running code that was, at one time or another, output by a C compiler -- to the point that virtually nothing else matters except in very specialized applications.

HPCC is almost all Fortran or C, in my experience, because the libraries for the clustering code are only available in Fortran and C. Matlab is fine for small-scale stuff, but I bet that was written in C itself.

The most popular ones are still in C or C++; only the newest stuff is being written in C# and the like, and that's still running in a VM that's written in C calling libraries written in C on top of an OS written in C.

If the VM itself is written in C/C++, then I am assuming that the language that the VM is interpreting is subject to the same limitations as C/C++. And, of course, that VM is calling libraries written in C and running on top of an OS that's written in C.

Haskell was mentioned; I don't know the language, but if it's truly revolutionary to the point it enables new chip designs that can provide ten times the performance at a quarter the power consumption, I doubt that it's "incrementally different" from the C/C++/C#/Java/etc. that we're using today. If it were, by now (a) everyone would have switched, or (b) whatever it is that makes Haskell special would have been added to the more common languages.

I've not seen much progress in that direction; most libraries are still implemented in C/C++, though they may have bindings for "cooler" languages. That's because they're written for the largest possible audience, and virtually every language has _some_ way to call C functions, while the reverse is rarely true. Good luck getting your Java program to use a library written in Perl or vice versa -- but they can both use a library written in C.

S

--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS          smart people who disagree with them."  --Isaac Jaffe

- N
- nmm1
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 5:08 PM

It's very conventional - just a different convention! Functional languages.

Regards, Nick Maclaren.

- S
- Stephen Sprunk
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 7:17 PM

Radical change _is_ accepted now and then, but you need a heck of a motivator. Imagine, for example, the effect that anti-gravity devices would have on the aviation world...

For the record, I don't like it any more than you do.

If your processor is 10 times faster than today's x86 chips, then x86 chips in ~5 years will have caught up with it. Of course, if you can increase just as rapidly (which nobody has ever managed to do for long), you might get some converts, but it's still only a single order of magnitude faster.

From all appearances, virtually nobody can manage to write C code that doesn't have potential buffer overruns all over the place. That's a far bigger problem for most projects, but it still doesn't stop people from using C.

Lots of folks use POSIX threads and it mostly works; even more people use Windows threads, which are roughly the same, and those mostly work, too. As the saying goes, "good enough" is the enemy of "great" -- and you're trying to sell "great" in a world that has already bought several "good enough" solutions.

That seemed rather obvious several years ago. What is not obvious is how to take advantage of all those cores...

You'd need two or three orders of magnitude in performance gains before people will accept a radical change. Even then, most programmers these days write the kind of code that spends 99.999% of its life waiting for the user to do something, so they won't care even if you manage to make their idle loops run a million times faster -- and when they go to write that one performance-critical function in their program, they're going to use the same language that they used for the rest of their code.

The change you propose might happen before the time I retire -- or it might be the cause of my (and many other folks') retirement ;)

S

--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS          smart people who disagree with them."  --Isaac Jaffe

- N
- nmm1
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 7:45 PM

We are at cross-purposes. I am NOT talking about running any faster (serially) - I am talking about getting the performance out of the multiple cores. Currently, that ain't happening, except in HPC, video rendering and a few other specialist, embarrassingly parallel applications.

God help us, yes. But the first spectacular accident that is blamed on that will change things. And I don't mean a diddy little thing like an airliner crashing, killing 300 people - I mean a chemical plant going up near a population centre in the USA, total failure of air traffic control systems for 6 months, complete collapse of the banking system for a month (not the current loss of confidence, no transactions), and so on. It will happen - but when?

Actually, no, they don't. You may not realise it, but a significant proportion of the increasing unreliability of computer applications (and it IS increasing) is due to that usage. How much, I am not sure, but I have seen the signature fairly often.

Yup. And tackling that problem is precisely my point!

Not a problem. But just not before I retire. By 2025, certainly.

Regards, Nick Maclaren.

- T
- Tom
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 8:44 PM

snipped-for-privacy@cam.ac.uk wrote in news:gped54$ds6$ snipped-for-privacy@soup.linux.pwf.cam.ac.uk:

Could you give a little detail of what constitutes the tell-tales in such signatures?

thanks

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 9:40 PM

The SRAM cells are idle unless they're actually being read or written.

Tags are active, but:

Most of the cache is L2 (e.g. 8KiB L1 vs 256KiB L2 for the original P4), and that is only accessed if L1 misses.
N-way set-associative caches (4-way for L1, 8-way for L2 for P4) mean that only N tags need to be compared for any given access.
Relatively large cache lines (e.g. 64 bytes for P4) in combination with #2 limit the proportion of the transistors which are used for tags (active) versus SRAM cells (idle).

Looking at it from an engineering perspective, if most of the silicon is used for cache, anything you can do to reduce the power consumption of the cache will have a greater impact on overall power consumption than a similar reduction elsewhere. Furthermore, a cache consists of a few specific building blocks, each replicated a large number of times, so the impact of any specific design change will be magnified.

Coupled with the fact that there is no inherent *need* for the cache to be continually transitioning (unlike, e.g. the ALU and register file), I can't see how the cache *wouldn't* be using far (!) less energy per transistor than the "core".

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 10:24 PM

Not at all. Any interpreter can be converted to assembler, but that doesn't mean that the language has the same limitations as assembler.

Clearly, anything that can be implemented in any language which runs on a particular architecture can be implemented in assembler. But different languages make different things simple and different things complex.

In that regard, a language's "limitations" aren't about what the language absolutely prohibits (you can write an emulator in almost any language), but what it makes impractical.

Also, there doesn't have to be a VM, e.g. many ARM chips can execute Java bytecode natively (Jazelle).

No, Haskell is radically different. It's a pure functional language, not an imperative language. The language itself doesn't have any concept of mutable state, although there are specific modules (IO and ST) to support this.

FWIW, the most popular Haskell compiler (GHC) is written in Haskell. It does have the option to compile to C, although the resulting C code will look nothing like the original program; in this situation, it's using C as "portable assembler".

Judging from the most common criticisms levelled at Haskell by people who had to (superficially) learn it in academia, the biggest stumbling block seems to be psychological.

Programmers who have become fluent in imperative programming seem to have a significant aversion to discovering that there are actually other programming paradigms, and that they are really only fluent in one specific paradigm rather than in programming generally.

- N
- nmm1
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 13, 2009 10:46 PM

Not all that radically. In The Great Scheme Of Things, imperative and functional languages aren't all that different. There are some MUCH more radical designs! Even Prolog is more different from both Haskell and Fortran than the latter two are from each other.

Regards, Nick Maclaren.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 14, 2009 2:23 AM

I don't think so. Part of the issues inolved in the diffrence was a cascaded (2nd) programmable interrupt controller, which was necessary to use the 286.

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 14, 2009 2:30 AM

MSOffice (and large chunks of Windows itself) was originally written in Pascal, as was quite a bit of Mac software of the same era. No doubt some or all of it has been re-written in C or C++ in the mean-time, but it's not necessarily the case.

Yeah, probably, but while they're used by lots of people, I doubt that you could say that a very large fraction of the programming community are actively involved in their production or maintenance.

There are some fairly widely-deployed web servers in Java (Tomcat, Glassfish). If anyone was starting to write a new SQL server today (rather than tweak one of the existing ones), I'd be very surprised if they chose to write it in C.

Firefox is to Javascript a bit like emacs is to emacs-lisp: most of the user-facing code, and all of the GUI is written in the "scripting" language, which is rendered by the underlying rendering engine, which is, indeed, in C. Similarly for Adobe lightroom vs Lua, I believe.

Sure, but there's only one Larry Wall and one Guido van Rossum (and they each have teams of maintenance helpers, of course). Most of the code that is written *in* those languages doesn't care what the language implementation is written in, and indeed there are versions of Perl and Python that run on top of Java and .NET: no C involved (apart from some system-interface shim libraries, probably.) There's a Python compiler written in Python.

Of course. As I said, that's what it's for. Maybe that will always be the case, but maybe not. In any case, if someone were to give you a new Linux distribution that had been re-written in Java or something else, why would you care, if it offered the same system call API?

Yes, but quite a lot of that C has now been generated by a compiler for a different language, and so does not necessarily have the same code profile or idiom (or, in particular, propensity for buffer overflow bugs) as hand-written C.

Matlab was originally an interpretive wrapper around Fortran BLAS and LAPAC. I suspect that it is a very complicated beast underneath, these days. Much of it seems to have been re-written in Java (you can load and run java objects almost transparently), and it claims to have FFTW3 in it, and that's C code that was written by an Ocaml program.

Sun's JVM and libraries seem to be Java almost all the way down. MS's VM might be based on C, or it might not. I don't know. In any case, both VMs translate straight from byte-codes to machine language: there's no C idiom involved in most of that code execution. Same with the new javascript JIT compilers, and many (most) of the LISP compilers.

No, not at all. If a language has strong typing and checked array accesses, for example, then the generated C code will have those features too, even though C as a language doesn't. The checks are just more code that humans typically don't bother to write for themselves.

Same goes for parallel processing/threading/sharing models: if a language has a useful definition of those, then it can emit whatever is necessary in C or whatever to get the job done. Might not be as fast as a hand- tuned C or C++ thread application, where the programmer can make implicit a lot of the sharing assumptions, but it won't break because of threading model mistakes, either. Well, that depends on the model of course: it's pretty easy to break a multi-threaded application written in a wide variety of languages.

Sure, some of them.

Sure. Same goes for scheme+termite (or other actor-model implementations), or Clojure, or erlang. That's why I said that there's a lot more to be gained by those who *are* prepared to reexamine their programming preconceptions and fundamentals.

Indeed, that's what's happening. For example, clojure is more-or-less a LISP, but it comes with a bunch of pure-functional (immutable) data structure libraries and other features that are there to make large-scale parallelism more reliable and easier to program. I don't imagine that there will ever be a visible "switch-over", but there may well eventually be a "tipping point" where people start to wonder whether they need to dare go with "C", or stick with the safer, more parallel language that they're familiar with...

Well, certainly many libraries are like that, but I've started to notice that large chunks of the libraries in the Perl and Python distributions, and certainly all of those in the Java collections are no longer wrappers around the equivalent C library, but are new or re-writes in the native language. Mostly that's because they can be easier to use when they use the native language's object model and idioms. Probably simplifies the library portability and build requirements significantly, too.

Sure, but you'll find that most of the useful libraries are already native in Java, Perl, whatever. Indeed, there's a whole pile of web- development stuff that more-or-less *only* exists in Perl, Python and Ruby, and the only way to use that stuff from C, should you want to, is to import the relative language subsystem as a library.

Cheers,

--
Andrew

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 14, 2009 2:39 AM

Most (the vast majority) of the illegal business practices were Microsoft's doing, not Intel's. Intel sold industrial products (tangible goods); MS sold software (intangible goods), where somebody's opinion or general popularity mattered more.

Not so. The i860 and i960 were targeted at the embedded market and did quite well there. The ARM is not an Intel design, and is successful still.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 14, 2009 3:03 AM

I can't document it very well, but in gigascasle chips leakage curent losses (heat) is about equal to switching energy losses. At least so says my cow-orker who actually did work for Intel in the Pentium sales support part of the business.

- N
- Nicholas King
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 14, 2009 3:20 AM

The thing that worries me about basing the cost on the decoder size is that perhaps an alternative architecture wouldn't need the same level of branch prediction/O-o-O/register renaming that X86 need.

Ultimately the question that we face is what overhead does the x86 architecture place on the whole thing from the compiler right through the chips. Would a different architecture be able to provide more semantics to the chip allowing a simpler design? that could mean that the x86 overhead question has a different answer than one based solely upon decoder size.

Cheers, Nicholas King

- A
- Anne & Lynn Wheeler
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 14, 2009 6:16 AM

there was i432 presentation at annual SIGOPS circa 1980. one of the things they mentioned was some complex scheduling stuff had been moved to the hardware ... in theory "hiding" how many physical processors actually existed (i.e. tasks were placed on hardware queue for execution). this and some number of other complex things were "burned" in silicon. because of the complexity ... there were bugs ... and stuff being in silicon resulted in it difficult to distribute fixes (required new chips).

i had done something a few years earlier in an 370 SMP effort ... but it was microcode (however project was canceled before it was announced/shipped). The "scheduler" was in software. the microcode and the software shared a queue structure. the microcode basically looked for first available ... not already executing task ... to hand off to processor. it would also "mask" the number of actual processors ... but microcode part was much simpler ... and being in microcode ... it was about as easy to ship fixes as it was for the kernel software.

I also moved one or two other things into microcode.

--
40+yrs virtualization experience (since Jan68), online at home since Mar70

- S
- Stephen Sprunk
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 14, 2009 6:57 AM

That is not a problem with x86 per se; modern RISC chips are just as bad or, in some cases, even worse. Extracting instruction-level parallelism from a stream of sequential instructions and hiding memory latency (the two purposes of the OoO engines) and predicting branches (to hide the long pipeline which is a result of the OoO engine) are difficult problems -- not decoding or actually executing the instructions.

Remember, modern x86 chips are really RISC cores with an x86 decoder slapped on the front; the _only_ burden that x86 imposes compared to a native RISC chip is the cost of that decoder -- and it's a very, very tiny fraction of the chip's total cost.

S

--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS          smart people who disagree with them."  --Isaac Jaffe