Portable Assembly

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, May 28, 2017 4:45 PM

So, then, what is a portable assembler?

One major variable of a processor architecture is the number of registers, and what you can do with them. On one side of the spectrum, we have PICs or 6502 with pretty much no registers, on the other side, there's things like x86_64 or ARM64 with plenty 64-bit registers. Using an abstraction like C to let the compiler handle the distinction (which register to use, when to spill) sounds like a pretty good idea to me. If you were more close to assembler, you'd either limit yourself to an unuseful subset that works everywhere, or to a set that works only in one or two places.

Stefan

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, May 28, 2017 5:47 PM

One which is not tied to a particular architecture, rather to an idealized machine model. It makes sense to use this assuming that processors evolve towards better, larger register sets - which has been the case last few decades. It would be impractical to try to assemble something written once for say 68k and then assemble it for a 6502 - perhaps doable but insane.

Using a phrase book is of course a good idea if you want to conduct a quick conversation. It is a terrible idea if you try to use the language for years and choose to stay confined within the phrases you have in the book.

Like I said before, there is no point to write code which can work on any processor ever made. I have no time to waste on that, I just need my code to be working on what is the best silicon available. This used to be 68k, now it is power. You have to program with some constraints - e.g. knowing that the "assembler" (which in reality is more a compiler) may use r3-r4 as it wishes and not preserve them on a per line basis etc. Since the only person who could make a comparison between a HLL and my vpa is me, I can say it has made me orders of magnitude more efficient. Obviously you can take my word for that or ignore it, I can only say what I know.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 1:59 AM

1) It's what Unicorns use when writing code to run the automation equipment used by Elves to mass-produce cookies inside hollow trees. 2) It's a trigger phrase that indicates the person using it is delusional and is about to lure you into a time-sink of relativistic proportions.

If I were you, I'd either smile politely and change the topic or just turn and run.

--
Grant

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 2:21 AM

Oh smile as much as you want. Then try to match 10% of what I have made and try to smile again.

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 3:04 AM

Not so much. Perhaps Fortran plus say, LINPACK.

"I am returning this tobacconist; it is scratched." - Monty Python.

It has been a long time since C presented a serious constraint in performance for me.

Mostly, I've seen the source code outlast the company for which it was written :)

I would personally view "megabytes of source" as an opportunity to infuse a system with better ideas through a total rewrite. I understand that this view is rarely shared; people prefer the arbitrage of technical debt.

L'il MCU projects are essentially disposable. Too many heresies.

--
Les Cargill

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 4:59 AM

I have used some very good portable C code across three or four different architectures (depending on whether you view a 188 and a 286 as different architectures). This all in one company over the span of 9 years or so.

So -- perhaps your scope is limited?

--
www.wescottdesign.com

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 7:53 AM

Sometimes people call C a "portable assembly" - they are wrong. But one of the purposes of C is so that you don't /need/ assembly, portable or not.

What has been discussed so far in this branch (I haven't read the whole thread yet) has been a retargetable assembler - a way to generate an assembler program for different processors without going through all the work each time. Such tools have existed for many years, and are an efficient way to make an assembler if you need to cover more than one target. They don't help much for writing the actual target assembly code, however - though usually you can share the same directives (commands for sections, macros, etc.). GNU binutils "gas" is the most widely used example.

As far as a portable assembly language is concerned, that does not and cannot exist. Assembly language is by definition too tightly connected to the ISA of the target. It is possible to have a language that is higher abstraction than assembler, but still lower level and with tighter control than C, and which can be translated/compiled to different target assemblies. LLVM is a prime example.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 8:43 AM

Writing a game involves a great deal more than just the coding. Usually, the coding is in fact just a small part of the whole effort - all the design of the gameplay, the storyline, the graphics, the music, the algorithms for interaction, etc., is inherently cross-platform. The code structure and design is also mostly cross-platform. Some parts (the graphics and the music) need adapted to suit the limitations of the different target platforms. The final coding in assembly would be done by hand for each target.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 9:00 AM

Exactly - you use a programming language appropriate for the job. For most low-level work, that is C (or perhaps C++, if you /really/ know what you are doing). Some parts of your code will be target-specific C, some parts will be portable C. And a few tiny bits will be assembly or "intrinsic functions" that are assembly made to look like C functions.

Most of the assembly used will actually be written by the toolchain provider (startup code, library code, etc.) - and if you are using a half-decent processor, this would almost certainly have been better written in C than assembly.

C is /not/ a "portable assembly" - it means you don't /need/ a portable assembly.

No, it is not a "portable assembler". It is just a translator to generate PPC assembly from 68K assembly, because you had invested so much time and code in 68K assembly and wanted to avoid re-writing everything for the PPC. That's a reasonable enough business strategy, and an alternative to writing an emulator for the 68K on the PPC, or some sort of re-compiler.

But it is not a "portable assembler". If you can take code written in your VPA and translate it into PIC, 8051, msp430, ARM, and x86 assembly, in a way that gives near-optimal efficiency on each target, while letting you write your VPA code knowing exactly which instructions will be generated on the target, /then/ you would have a portable assembler. But such a language cannot be made, for obvious reasons.

What you have is a two-target sort-of assembler that gives you reasonable code on two different targets. You could also say that you have your own personal low-level programming language with compiler backends for two different targets. Again, that's fair enough - and if it lets you write the code you want, great. But it is not a portable assembly.

Spoken like a true fanatic (or salesman).

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 9:33 AM

For embedded systems (before we called them that), yes. There were few compilers that were really worth the media they were delivered on -- and few meant to generate code for bare iron.

Speaking from the standpoint of the *arcade* game industry, games were developed on hardware specific to that particular game (trying, where possible, to leverage as much of a previous design as possible -- for reasons of economy).

Most games were coded from scratch in ASM; very little "lifted" from Game X to act as a basis for Game Y (this slowly changed, over time -- but, mainly in terms of core services... runtime executives predating "real" OS's).

Often, the hardware was *very* specific to the game (e.g., a vector graphic display didn't draw vectors in a frame buffer but, rather, directly controlled the deflection amplifiers -- X & Y -- of the monitor to move the "beam" around the display tube in a particular path). As such, the "display I/O" wasn't really portable in an economic sense -- no reason to make a Z80 version of a 6502-based game with that same wonky display hardware. E.g., Atari had a vector graphic display system (basically, a programmable display controller) that could ONLY draw curves -- because curves were so hard to draw with a typical vector graphic processor! (You'd note that every "line segment" on the display was actually a curve of a particular radius)

Also, games taxed their hardware to the limit. There typically wasn't an "idle task" that burned excess CPU cycles; all cycles were used to make the game "do more" (players are demanding). The hardware was designed to leverage whatever features the host CPU (often more than one CPU for different aspects of the game -- e.g., "sound" was its own processor, etc.) to the greatest advantage. E.g., 680x processors were a delight to interface to a frame buffer as the bus timing directly lent itself to "display controller gets access to the frame buffer for THIS half clock cycle... and the CPU gets access for the OTHER half cycle" (no wait states as would be the case with a processor having variable bus cycle timings (e.g., Z80).

Many manufacturers invested in full custom chips to add value (and make the games harder to counterfeit).

A port of a game to another processor (and perhaps entire hardware platform) typically meant rewriting the entire game, from scratch. But, 1980's games (arcade pieces) weren't terribly big -- tens of KB of executables. Note that any graphics for the game were directly portable (many of the driving games and some of the Japanese pseudo-3D games had HUGE image ROMs that were displayed by dedicated hardware -- under the control of the host CPU).

In practical terms, these were small enough projects that *seeing* one that already works (that YOU coded or someone at your firm/affiliate coded) was the biggest hurdle to overcome; you know how the game world operates, the algorithms for the "robots", what the effects should look like, etc.

If you look at emulations of these games (e.g., MAME), you will see that they aren't literal copies but, rather, just intended to make you THINK you're playing the original game (because the timing of the algorithms in the emulations isn't the same as that in the original game). E.g., the host (application) typically synchronized its actions to the position of the "beam" repainting the display from the frame buffer (in the case of a raster game; similar concepts for vector games) to minimize visual artifacts (like "object tearing") and provide other visual features ("OK, the beam has passed this portion of the display, we can now go in and alter it in preparation for its next pass, through")

In a sense, the games were small systems, by today's standards. Indeed, many could be *emulated* on SoC's, today -- for far less money than their original hardware and far less development time!

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 9:57 AM

Or, the platform on which it was originally intended to run!

OTOH, there are many "regulated" industries where change is NOT seen as "good". Where even trivial changes can have huge associated costs (e.g., formal validation, reestablishing performance and reliability data, etc.)

[I've seen products that required the manufacturer to scour the "used equipment" markets in order to build more devices simply because the *new* equipment on which the design was based was no longer being sold!] [[I've a friend here who hordes big, antique (Sun) iron because his enterprise systems *run* on old SPARCservers and the cost of replacing/redesigning the software to run on new/commodity hardware and software is simply too far beyond the company's means!]]

I've never seen this done, successfully. The "second system" effect seems to sabotage these attempts -- even for veteran developers! Instead of reimplementing the *same* system, they let feeping creaturism take over. The more developers, the more "pet features" try to weasel their way into the new design.

As each *seems* like a tiny little change, no one ever approaches any of them with a serious evaluation of their impact(s) on the overall system. And, everyone is chagrined at how much *harder* it is to actually fold these changes into the new design -- because the new design was conceived with the OLD design in mind (i.e., WITHOUT these additions -- wasn't that the whole point of this effort?).

Meanwhile, your (existing) market is waiting on the new release of the OLD product (with or without the new features) instead of a truly NEW product.

And, your competitors are focused on their implementations of "better" products (no one wants to play "catch-up"; they all aim to "leap-frog").

Save your new designs for new products!

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 12:08 PM

I might agree with that - if we understand "portable" as "universally portable".

Well who in his right mind would try to port serious 68020 or sort of code to a PIC or MSP430 etc. I am talking about what is practical and has worked for me. It would be a pain to port back from code I have written for power to something with fewer registers but within reason it is possible and can even be practical. Yet porting to power has been easier because it had more registers than the original 68k and many other things, it is just more powerful and very well thought, whoever did it knew what he was doing. It even has little endian load and store opcodes... (I wonder of ARM have big endian load/store opcodes).

Yet I agree it is not an "assembler" I suppose. I myself refer to it at times as a compiler, then as an assembler... It can generate many lines per statement - many opcodes, e.g. the 64/32 bit divide the 68020 has is done in a loop, no way around that (17 opcodes, just counted it). Practically the same what any other compiler would have to do.

It comes pretty close to that as long as your CPU has 32 registers, but you need to know exactly what each line does only during debugging, running step by step through the native code.

It may sound so but it is not what I intended. VPA has made me a lot more efficient than anyone else I have been able to compare myself with. Since I also am only human it can't be down to me, not by *that* much. It has to be down to something else; in all likelihood it is the toolchain I use. My "phrasebook" comment stays I'm afraid.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 1:06 PM

"Universally portable" is perhaps a bit ambitious :-) But to be called a "portable assembly", I would expect a good deal more than two architectures that are relatively closely related (32-bit, reasonably orthogonal instruction sets, big endian). I would imagine that translating 68K assembly into PPC assembly is mostly straightforward - unlike translating it into x86, or even ARM. (The extra registers on the PPC give you the freedom you need for converting complex addressing modes on the 68K into reasonable PPC code - while the ARM has fewer registers available.)

If the code were /portable/ assembly, then it would be possible. Standard C code will work fine on the msp430, ARM, x86, 68K and PPC - though it is unlikely to be efficient on a PIC or 8051.

(I agree that the PPC is a fine ISA, and have enjoyed using it on a couple of projects. ARM Cortex M, the most prevalent cores for microcontrollers, does not have big endian load or store opcodes. But it has byte-reverse instructions for both 16-bit and 32-bit values. The traditional ARM instruction set may have them - I am not as familiar with that.)

If it were /portable/ assembly, then your code that works well for the PPC would automatically work well for the 68K.

The three key points about assembly, compared to other languages, are that you know /exactly/ what instructions will be generated, including the ordering, register choices, etc., that you can access /all/ features of the target cpu, and that you can write code that is as efficient as possible for the target. There is simply no way for this to portable. Code written for the 68k may use complex addressing modes - they need multiple instructions in PPC assembly. If you do this mechanically, you will know exactly what instructions this generates - but the result will not be as efficient as code that re-uses registers or re-orders instructions for better pipelining. Code written for the PPC may use more registers than are available on the 68K - /something/ has to give.

Thus your VLA may be a fantastic low-level programming language (having never used it or seen it, I can't be sure - but I'm sure you would not have stuck with it if it were not good!). But it is not a portable assembly language - it cannot let you write assembly-style code for more than one target.

That's fine - you have a low-level language and a compiler, not a portable assembler.

Some time it might be fun to look at some example functions, compiled for either the 68K or the PPC (or, better still, both) and compare both the source code and the generated object code to modern C and modern C compilers. (Noting that the state of C compilers has changed a great deal since you started making VLA.)

Fair enough.

Good comparisons are, of course, extremely difficult - and not least, extremely expensive. You would need to do large scale experiments with at least dozens of programmers working on a serious project before you could compare efficiency properly.

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 1:48 PM

Nobody.

Yet, that's what a "Universal Assembler" would be able to do.

And it is not anything close to a "Universal Assembler".

--
Grant

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 4:43 PM

So, to what *is* it tied then? What is its *concrete* machine model?

Doable and not insane with C.

Actually, you can programm the 6502 in C++17.

My point being: if you work on assembler level, that is: registers, you'll not have anything more than a phrase book. A C compiler can use knowledge from one phrase^Wstatement and carry it into the next, and it can use grammar to generate not only "a = b + c" and "x = y * z", but also "a = b + (y*z)".

I am not an expert in either of these two architectures, but 68k has 8 data + 8 address registers whereas Power has 32 GPRs. If you work on a virtual pseudo-assembler level you probably ignore most of your Power.

A classic compiler will happily use as many registers as it finds useful.

The only possible gripe with C would be that it has no easy way to write a memory cell by number. But a simple macro fixes that.

Stefan

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 5:02 PM

Indeed, having more registers is of huge help. But it is not as straight forward as it might seem at first glance. Then while at it I did a lot more than just emulate the 68k - on power we have a lot more on offer, I wanted to take advantage of it, like adding syntax to not touch the CCR - as the 68k unavoidably does on moves, add and many others, use the 3 address mode - source1,source2, destination - and have this available not just as registers but as any addressing mode etc. If you assemble plain CPU32 code the resulting power object code size is about 3.5 times the native CPU32 code size. If you write with power in mind - e.g. you discourage all the unnecessary CCR (CR in power) updates - code sizes get pretty close. I have designed in a pass for optimizing that automatically, 1.5 decades later still waiting to happen... :-). No need for it which would deflect me from more pressing issues I suppose.

This is semantics - but since user level 68k code assembles directly I think it is fair enough to borrow the word "assembler". Not what everyone understands under it every time of course but must have sounded OK to me back then. Then I am a practical person and tend not to waste much time on names as long as they do not look outright ugly or misleading (well I might go on purpose for "ugly" of course but have not done it for vpa).

Well yes, if we accept that we have to accept that VPA (Virtual Processor Assembler) is not exactly an assembler. But I think the name is telling enough what to expect.

That is completely possible with vpa for power, nothing is stopping you from using native to power opcodes (I use rwlinm and rlwimi quite often, realizing there might be no efficient way to emulate them but I do what I can do best, if I get stuck in a world with x86 processors only which have just the few original 8086 registers I'll switch occupation to herding kangaroos or something. Until then I'll change to a new architecture only if I see why it is better than the one I use now, for me portability is just a means, not a goal).

Yes but they run in fewer cycles. Apart from the PC relative - there is no direct access to the PC on power, takes 2-3 opcodes to get to it alone - the rest works faster. And say the An,Dn.l*4 mode can take not just powers of 2... etc., it is pretty powerful.

Oh yes, backward porting would be quite painful. I do use all registers I have - rarely resorting to r4-r7, they are used for addressing mode calculations, intermediate operands etc., use one of them and you have something to work on when porting later. I still do it at times when I think it is justified... may well bite me one day.

Hmmm, not for any target - yes. For more than one target with the code not losing efficiency - it certainly can, if the new target is right (as was the case 68k -> power). Basically I have never been after a "universal assembler", I just wanted to do what you already know I wanted. How we call it is of secondary interest to me to be fair :-).

Yes, I would also be curious to see that. Not just a function - as it will likely have been written in assembly by the compiler author - but some sort of standard thing, say a base64 encoder/decoder or some vnc server thing etc. (the vnc server under dps is about 8 kilobytes, just looked at it. Does one type of compression (RRE misused as RLE) and raw).

Dimiter

====================================================== Dimiter Popoff, TGI

formatting link

======================================================

formatting link

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 5:23 PM

"Only" gripe?

Every language choice makes implicit tradeoffs in abstraction management. The sorts of data types and the operations that can be performed on them are baked into the underlying assumptions of the language.

What C construct maps to the NS16032's native *bit* array instructions? Or, the test-and-set capability present in many architectures? Or, x86 BCD data types? Support for 12 or 60 bit integers? 24b floats? How is the PSW exposed? Why pointers in some languages and not others?

Why do we have to *worry* about atomic operations in the language in a different way than on the underlying hardware? Why doesn't the language explicitly acknowledge the idea of multiple tasks, foreground/background, etc.?

Folks designing languages make the 90-10 (%) decisions and hope the

10 aren't unduly burdened by the wins afforded to the 90. Or, that the applications addressed by the 10 can tolerate the contortions they must endure as a necessary cost to gain *any* of the benefits granted to the 90.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 8:46 PM

When did the "embedded system" term become popular ?

Of course, there were some military system (such as SAGE) that used purpose built computers in the 1950s.

In the 1970s the PDP-11/34 was very popular as a single purpose computer and the PDP-11/23 in the 1980's. After that 8080/Z80/6800 became popular as the low end processors.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 8:54 PM

No idea. I was "surprised" when told that this is what I did for a living (and HAD been doing all along!).

I now tell people that I design "computers that don't LOOK like computers" (cuz everyone thinks they KNOW what a "computer" looks like!) "things that you know have a computer *in* them but don't look like the stereotype you think of..."

11's were used a lot as they were reasonably affordable and widely available (along with folks who could code for them). E.g., the Therac was 11-based.

The i4004 was the first real chance to put "smarts" into something that didn't also have a big, noisey box attached. I recall thinking the i8080 (and 85) were pure luxury coming from that more crippled world ("Oooh! Kilobytes of memory!!!")

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, May 29, 2017 9:13 PM

To be practical, it /should/ be a function - or no more than a few functions. (I don't know why you think functions might be written in assembly by the compiler author - the compiler author is only going to provide compiler-assist functions such as division routines, floating point emulation, etc.) And it should be something that has a clear algorithm, so no one can "cheat" by using a better algorithm for the job.