Hi What are architectures that are not load/store called ? Toclairfy, architectures that have instructions other than load/storethan can access the memory. I am wondering what is the motivation forsuch instruction sets ? Are there any existing examples of suchprocessors. thanks dz

Sure. The following Z80 instruction for example:ADD A,(IX+d) Add location (IX+d) to accumulator

THINGS YOU SHOULD KNOW ABOUT BRANDON J. VAN EVERY BEFORE REPLYING TOONE OF HIS POSTS1. He has never designed any game, nor contributed to the design of any game, which has ever seen the light of day, despite referring to himself as a "game designer." (In rebuttal, he pointed out his "one complete game" from "1983" on the "Atari 800" which he showed to his "8th grade math teacher.")2. He has never been employed in the game industry as a game designer or game developer of any sort. Despite this, for some reason he managed to get named as an Independent Games Festival judge; a curious turn of events, since their stated intent is to appoint "professionals in the game industry" (their quote, not his).3. In fact, the only programming job he had listed on his resume was for only "2 years" ending in "1998," working in C and assembly on a graphics driver, as a "Sr. Software Engineer" -- a curious title, since this was his first (and only) mentioned job in...

Do you forward his stalks, with full headers, to the appropriateabuse address? In this case it would be Make sure you describe the problem.-- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems.

non load/store architecture?

B

Brandon J. Van Every 19 years ago

Hold onto that thought, that it matters in "a lot" of code.

Sure, but I got used to the load low, load high drill on the Alpha easily enough.

So it's faster at loading an immediate 32-bit constant. Big deal! That is not an important job. The RISC tradeoff is legitimate here. You don't really need to load immediate constants very much, so it is acceptable to do an "instruction dance," that doesn't typically run slower anyways, because the pipelining masks the latency. So you get to keep all your instructions 4 bytes long, which simplifies your decoder, and also your code cache alignment.

The Alpha had instructions for loading immediate constants that were more likely to matter. 16-bit constants could be done in 1 instruction, and 3 or 4 bit constants were typically part the instruction itself.

Yep, very different. Part of why I started posting here, to figure out what's different about "the kind of ASM I know" vs. "the kind of ASM embedded engineers typically do."

Not unless you can *really* compact the code.

Cheers, Brandon Van Every

Vote

D

David Brown 19 years ago

The load was, of course, just a simple example.

Pipelining increases latency, but lets the instructions overlap, which may or may not hide the extra delay, depending on when the loaded constant is needed in the following instruction stream, and how superscaler the cpu is (you know this better than me, but I'm learning a little here).

Consistent instruction lengths certainly simplify the decoder and alignments - but it costs code space when 32 bits is too much, and extra instructions when 32 bits is too little. Is this cost worth paying? As always, it depends on the typical use of the processor, and on the details of the implementation.

Certainly the x86's widely variable instruction lengths with only 8-bit alignment are a poor match for caches and fast decoders. The ColdFire has 16-bit alignment, and instructions are at most 3 words long - a compromise solution. The original m68k design, especially the advanced cores like the 68040, had several more addressing modes that could lead to longer instructions. When designing the ColdFire, FreeScale (then Motorola) removed the more complex modes exactly so that they could get a faster, simpler decoder and execution engine.

And the ColdFire can load 8-bit constants as part of a single 16-bit instruction word (v4 cores can also store a 3-bit constant directly to memory in a single instruction, without passing through a register).

It's certainly the case that many data constants are small, and it's important to optimise the ISA for that case. But addresses are often

32-bit (small data segments can help here - I don't know about the Alpha, but the PPC certainly uses them), and need to be loaded. The code "x = 1;" takes 3 instructions on the PPC, and 1 on the v4 ColdFire (12 bytes vs. 6 bytes, and 3 clocks vs. 1, excluding the actual write). Of course, real code sequences are unlikely to exhibit such a difference.

It's always interesting to hear different viewpoints. I have never used an Alpha (though I've read nice things about it), and my PPC experience is as a microcontroller core.

I could not give you any figures without making things up. But are you familiar with the ARM, and its Thumb mode? The ARM is a 32-bit pure RISC design (32 registers, 32-bit wide, 32-bit fixed instruction size with "Ra = Rb op Rc" style instructions). It also has a "Thumb" mode, in which the instruction set is 16-bit wide, with immediate data or addresses as extension words, and using a smaller register set and "Ra = Ra op Rb" instructions. These Thumb instructions are translated into full ARM instructions by an extra decoder. The reason for having the Thumb mode is to get significantly smaller code, for embedded systems. In general, the Thumb code is slightly slower than pure ARM code, but if the bandwidth to the code store (i.e., Flash) is slow, then the Thumb code is faster.

So somebody at ARM thought a compromise ISA (closer to the ColdFire than the full ARM) was worth the effort, at least for embedded systems.

mvh.,

David

Vote

B

Brandon J. Van Every 19 years ago

It does when you know what you're doing. i.e. either you're coding ASM by hand + know how to write optimizeable ( == pipelineable) algorithms, or else your compiler is fairly clever. DEC compilers actually were clever; they'd get the basics right.

Of course, the vast majority of programmers do not know what they're doing. That's why Intel went with out-of-order execution cores. They figure, the programmers suck, the compilers suck, let's just throw some gratuitous reshuffling at it to make it suck less. In terms of 80% solutions it's a reasonable idea. Just damn awful when you actually get in there to write a decent instruction schedule. You have to do everything "through the looking glass," it's rather baroque.

The DEC Alpha could do a FPU add and a FPU multiply simultaneously, with a latency of 4 clocks. It was possible to do a 4x4 matrix transform in 16 clocks, if you're an ASM guru and know how to write pipelined array iteration code. Knowing what was possible in aggressively pipelined RISC ASM, made me annoyed with vector instructions like SSE and Altivec. They are more of a marketing vehicle than an actually useful feature. Their units of data are too big, too bound together, and they're too latent. You'd be better off just doing the RISC pipelined code.

Yep. People whinge about this stuff, but it doesn't actually matter.

Again, my bias is towards number crunching performance.

Embedding 8 bits in an instruction that really really needs it ain't hard though. You don't need 2, 4, 6 byte instruction lenghts to achieve that.

But addresses are often

The Alpha, despite being 64-bit, had these funky 32-bit pointers to save space.

I do not know whether a factor of 2 is important to embedded devices that have no cache. No experience there.

Cheers, Brandon Van Every

Vote

D

David Brown 19 years ago

Since most code is compiled rather than hand-coded, especially on bigger processors, the compiler+cpu combination is very important. One of the reasons why the Itanium is such a bad solution in most cases is that compilers are not good enough to keep the core running at its full potential (no slur on the Itanium compiler writers - it's just that there has not been the magical progression in compiler technology that Intel predicted when it started the Itanium development).

For the kind of code you are doing (as far as I can guess), you have a lot of tight loops with predictable code. That means you can figure out the optimal instruction ordering and register usage when writing the assembly (or a smart enough compiler could figure it out). For a lot of code, however, instruction paths vary according at run-time depending on the data. Dynamic out-of-order execution is then the only way to get work done during unpredictable pipeline stalls or cache misses. With a decent cpu architecture, you need much less of it, but an Alpha with out-of-order execution could be made faster than one without it.

The "best" architecture is dependant on the problem at hand. I use the ColdFire as a high-end microcontroller - consistent and fast hardware access, interrupt handling, and good debugging features are more important than number crunching ability (though it's not bad, with a fairly powerful MAC engine, depending on the exact ColdFire core). The devices I use don't even have hardware floating point (though the faster ColdFire cores do). If I wanted top-speed number crunching, I'd pick a different architecture.

A factor of 2 (in speed or size) can be very important, especially for volume products. If you can save a dollar on components by reducing the size of flash needed, then you've earned your company an extra dollar profit per unit sold. I have not worked with many products where our margins are that small, but certainly others here in c.a.e. have.

mvh.,

David

Vote

B

Brandon J. Van Every 19 years ago

Yeah, I don't advocate VLIW. The world would have been better off working on RISC compilers on clean architectures like the Alpha. It would have been a lot easier. It's pretty frustrating to see a chip that was better than the Itanium in the real world disappear from the marketplace.

Not strictly true. Conditional moves can eliminate certain kinds of branches, and in general are not well understood or well utilized by either programmers or compilers. If your branch is usually going along the same path, branch prediction is fine. If your branch is fairly random, you want a conditional move if it'll work with the code. Some kind of code is inherently "butt code" though, and no tricks are going to help it.

Again, not true. Whether you're winning with out-of-order is entirely algorithm dependent. "Butt code" will choke any processor.

Fair enough.

Cheers, Brandon Van Every

Vote

D

David Brown 19 years ago

It's amazing how such an excellent architecture as the Alpha (I haven't used it, but I read many nice things about it) could be virtually destroyed by thoughtless greed.

Yes, conditional instructions would be a help (the ColdFire does not have conditionals, other than branches, so I didn't think of that!).

Agreed.

There's a lot of differences between designing and programming for "big systems" and small embedded systems. But I'm sure you'd enjoy it - there's a lot of low-level coding with full control of all details of your system, which requires the same sort of thinking as your low-level graphics drivers.

mvh.,

David

Vote

B

Brandon J. Van Every 19 years ago

Well, be sure to include thoughtless Dilbertism as well. DEC was an excellent engineering company, but couldn't market its way out of a paper bag. It had the opportunities to evangelize the Alpha but it squandered them. The competition was indeed tough: although the Alpha was a far better CPU than anything Intel had to offer, Intel's fabrication process was 1 to 1.5 years more advanced than DEC's. This is another aspect of Worse Is Better. The vendor that can achieve volume economy wins.

There's actually a debate about the IBM Cell chip now, which is being used in the Sony Playstation 3. IIRC it doesn't do out-of-order execution. It seems to be based more on the philosophy of "you the programmer, or the compiler, will figure out the proper way to schedule all these sub-units." We debated this at a local SeaFunc meeting one time. One developer was saying that lotsa game industry types were bitching and moaning about how this code wasn't going anywhere, and they're all shipping PS3 titles imminently. I countered that people were performance tweaking the PS2 for many, many years after it was considered an arcane architecture. It simply takes a lot of time for people to wrap their heads around such learning curves.

The XBox 360 may have an advantage with its 3 chip design. Its threading is more "coarse" and the programming style won't be anything new to most people. Can't remember what the Nintendo is like. There is a problem of different proliferating architectural styles, if you want to port to multiple consoles. Well hey maybe that's a business model for a consultant, heh!

I wonder. I don't think "getting it to work" is the fun part. The fun part is speeding it up once it already works, and seeing big improvements for the effort spent.

Cheers, Brandon Van Every

Vote

F

fredaster122 19 years ago

THINGS YOU SHOULD KNOW ABOUT BRANDON J. VAN EVERY BEFORE REPLYING TO ONE OF HIS POSTS

He has never designed any game, nor contributed to the design of any game, which has ever seen the light of day, despite referring to himself as a "game designer." (In rebuttal, he pointed out his "one complete game" from "1983" on the "Atari 800" which he showed to his "8th grade math teacher.")

He has never been employed in the game industry as a game designer or game developer of any sort. Despite this, for some reason he managed to get named as an Independent Games Festival judge; a curious turn of events, since their stated intent is to appoint "professionals in the game industry" (their quote, not his).
In fact, the only programming job he had listed on his resume was for only "2 years" ending in "1998," working in C and assembly on a graphics driver, as a "Sr. Software Engineer" -- a curious title, since this was his first (and only) mentioned job in the software industry. There is no evidence he has used C++, nor any other language, professionally. (And the company in question is defunct, anyway, so there is no way to verify his claim.)

The other jobs he has mentioned having after this one and only item on his resume are: "yard maintenance work," "painting apartments," "scrubbing floors," "sub minimum wage signature gathering," and working for "/hour at a Vietnamese restaurant."
The only personal project he actually wrote code for and released in some fashion was Free3d, a software 3D rendering engine. Stating that its goals were to be "100% efficient, 100% portable" and to release it in a "one year time frame," he started in "1993" and abandoned it in "1996," admitting that it "barely drew even a single polygon" and "did hardly anything in the 3D department."

Almost every Internet community (Usenet newsgroup, mailing list, etc.) he has ever introduced himself to has resulted in him repeating the same pattern: asking leading questions, demanding people do things his way, becoming hostile, annoying the other participants, alienating them, and finally leaving in disgust.
The project he has intermittently claimed to be working on, Ocean Mars, is vaporware -- and is one of his admitted "failures." He allegedly sunk "1.5 man years" and about "K" into it with only a "spherical hexified icosahedron" display to show for it, and has stated that he's "unlikely to use any of it." Ultimately he filed for a "Chapter 7" "bankruptcy of K," since he was "completely destitute" and "had no earning power to ever pay the money back."

Since his embarrassing frustration with his Ocean Mars project, he has decided that C and C++ aren't "worth anything as a resume skill anymore," and embarked on a quest in 2003 to find a high-level language that will suit his needs. After more than a year, at least ten languages, and not having even "written a line of code" in any of them, he still has yet to find a language that will suit him.
Finally, despite vehemently insisting that he is not a troll, many people quite understandingly have great difficulty distinguishing his public behavior from that of a troll.

Vote

B

Brandon J. Van Every 19 years ago

In case anyone is wondering what this is about... I acquired an anonymous internet stalker almost 2 years ago. He follows me around to any forum I post in. Some of what he writes is true, they are things I've publically disclosed. Others are, well... the product of his obsession.

Cheers, Brandon Van Every

Vote

C

CBFalconer 19 years ago

Do you forward his stalks, with full headers, to the appropriate abuse address? In this case it would be snipped-for-privacy@google.com. Make sure you describe the problem.

Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems.

Vote

B

Brandon J. Van Every 19 years ago

I did once. Got a reply back from either Yahoo or Google, saying they would take no action. I suppose I could be more persistent about it, but I think the energy trade is not likely to be profitable. Hm, I wonder how many people would pay a nominal fee for automated stalker reporting? Or maybe it would be a good open source capability to add to Thunderbird.

Last night I did some research on what kind of stalker he is. Looks like he's a "Cyber Bully" and a sociopath (pychologically rather than physically violent). One thing I did learn, is that such a person always starts with grains of truth, then distorts, to make the critique look legitimate. Actually the critique is a projection of the stalker's own insecurities, and every missive gives a lot of information about the stalker. Another thing I learned is just how far gone such a stalker actually is. There's no point going through the typical Usenet debating drill with such a person, as you're dealing with a serious mental illness.

What his posts are telling me:

- he is fixated on his personal prowess as a game developer, or lack thereof. That I can end up being an IGF judge, with my minimal qualifications, and he cannot, is disconcerting to him. Stalkers have often been rejected in their professions and attack people who are part of the profession.

- he is fixated on whether people have authority and legitimacy in Usenet forums. I became a target because I argue well and often hold controversial views. There's a definite need on his part to knock someone like me down. He calls me a troll because he has himself been called a troll. This is also why he remains anonymous, so that no one on Usenet can have any perceived or actual authority over him.

- he is fixated on matters of personal income. I'll wager he's had his own career financial problems. That I've worked odd jobs to support my open source habit is disproportionately important to him. Also he needs to cast doubt on whether I've ever made money in the computer industry, or ever had a job title.

- he may be a disgruntled open source developer. My recent successes in open source are notably absent from his critique, even though several people in comp.lang.scheme have called him to account on it.

I don't have a good link describing exactly his profile, though. Most of the stalking materials on the internet are focused on:

- pedophiles

- women being stalked for possible rape

- teenage bullies, preventing more Columbine shootings

- bullies in the workplace, where lawsuits are potentially profitable

He's closest to matching the "bullies in the workplace" modus operandi. Fortunately, we have no work relationship, so there's little he can actually do.

Cheers, Brandon Van Every

Vote

J

jacko 19 years ago

yes,

seems like he's changing the subject and so he's quite OT.

so back to load store architectures, and start a new thread for possible ID (you say you were a judge at some games fest?

cheers.

Vote

W

Walter Banks 19 years ago

I have worked on quite a few compiler code generators. . What I have found is it is better to develop code generation that deals with the instruction scheduling. Initially it doesn't make much difference but over time the compilers reflect learned experience and often use instruction scheduling as an important tool in overall optimization.

w..

Vote

non load/store architecture?

Join the Discussion

Didn't find your answer?