Portable Assembly - Page 3

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: Portable Assembly
On 27/05/17 23:31, Dimiter_Popoff wrote:
Quoted text here. Click to load it

Exactly - you use a programming language appropriate for the job.  For
most low-level work, that is C (or perhaps C++, if you /really/ know
what you are doing).  Some parts of your code will be target-specific C,
some parts will be portable C.  And a few tiny bits will be assembly or
"intrinsic functions" that are assembly made to look like C functions.

Most of the assembly used will actually be written by the toolchain
provider (startup code, library code, etc.) - and if you are using a
half-decent processor, this would almost certainly have been better
written in C than assembly.

C is /not/ a "portable assembly" - it means you don't /need/ a portable

Quoted text here. Click to load it

No, it is not a "portable assembler".  It is just a translator to
generate PPC assembly from 68K assembly, because you had invested so
much time and code in 68K assembly and wanted to avoid re-writing
everything for the PPC.  That's a reasonable enough business strategy,
and an alternative to writing an emulator for the 68K on the PPC, or
some sort of re-compiler.

But it is not a "portable assembler".  If you can take code written in
your VPA and translate it into PIC, 8051, msp430, ARM, and x86 assembly,
in a way that gives near-optimal efficiency on each target, while
letting you write your VPA code knowing exactly which instructions will
be generated on the target, /then/ you would have a portable assembler.
 But such a language cannot be made, for obvious reasons.

What you have is a two-target sort-of assembler that gives you
reasonable code on two different targets.  You could also say that you
have your own personal low-level programming language with compiler
backends for two different targets.  Again, that's fair enough - and if
it lets you write the code you want, great.  But it is not a portable

Quoted text here. Click to load it

Spoken like a true fanatic (or salesman).

Quoted text here. Click to load it

Re: Portable Assembly
On 29.5.2017 ?. 12:00, David Brown wrote:
Quoted text here. Click to load it

I might agree with that - if we understand "portable" as "universally

Quoted text here. Click to load it

Well who in his right mind would try to port serious 68020 or sort of
code to a PIC or MSP430 etc.
I am talking about what is practical and has worked for me. It would be
a pain to port back from code I have written for power to
something with fewer registers but within reason it is possible and
can even be practical. Yet porting to power has been easier because
it had more registers than the original 68k and many other things,
it is just more powerful and very well thought, whoever did it knew
what he was doing. It even has little endian load and store opcodes...
(I wonder of ARM have big endian load/store opcodes).

Yet I agree it is not an "assembler" I suppose. I myself refer to it
at times as a compiler, then as an assembler... It can generate many
lines per statement - many opcodes, e.g. the 64/32 bit divide the 68020
has is done in a loop, no way around that (17 opcodes, just counted it).
Practically the same what any other compiler would have to do.

Quoted text here. Click to load it

It comes pretty close to that as long as your CPU has 32 registers,
but you need to know exactly what each line does only during debugging,
running step by step through the native code.

Quoted text here. Click to load it

It may sound so but it is not what I intended.
VPA has made me a lot more efficient than anyone else I have been able
to compare myself with. Since I also am only human it can't be down
to me, not by *that* much. It has to be down to something else; in all
likelihood it is the toolchain I use. My "phrasebook" comment
stays I'm afraid.


Dimiter Popoff, TGI             http://www.tgi-sci.com

Re: Portable Assembly
On 29/05/17 14:08, Dimiter_Popoff wrote:
Quoted text here. Click to load it

"Universally portable" is perhaps a bit ambitious :-)  But to be called
a "portable assembly", I would expect a good deal more than two
architectures that are relatively closely related (32-bit, reasonably
orthogonal instruction sets, big endian).  I would imagine that
translating 68K assembly into PPC assembly is mostly straightforward -
unlike translating it into x86, or even ARM.  (The extra registers on
the PPC give you the freedom you need for converting complex addressing
modes on the 68K into reasonable PPC code - while the ARM has fewer
registers available.)

Quoted text here. Click to load it

If the code were /portable/ assembly, then it would be possible.
Standard C code will work fine on the msp430, ARM, x86, 68K and PPC -
though it is unlikely to be efficient on a PIC or 8051.

Quoted text here. Click to load it

(I agree that the PPC is a fine ISA, and have enjoyed using it on a
couple of projects.  ARM Cortex M, the most prevalent cores for
microcontrollers, does not have big endian load or store opcodes.  But
it has byte-reverse instructions for both 16-bit and 32-bit values.  The
traditional ARM instruction set may have them - I am not as familiar
with that.)

If it were /portable/ assembly, then your code that works well for the
PPC would automatically work well for the 68K.

The three key points about assembly, compared to other languages, are
that you know /exactly/ what instructions will be generated, including
the ordering, register choices, etc., that you can access /all/ features
of the target cpu, and that you can write code that is as efficient as
possible for the target.  There is simply no way for this to portable.
Code written for the 68k may use complex addressing modes - they need
multiple instructions in PPC assembly.  If you do this mechanically, you
will know exactly what instructions this generates - but the result will
not be as efficient as code that re-uses registers or re-orders
instructions for better pipelining.  Code written for the PPC may use
more registers than are available on the 68K - /something/ has to give.

Thus your VLA may be a fantastic low-level programming language (having
never used it or seen it, I can't be sure - but I'm sure you would not
have stuck with it if it were not good!).  But it is not a portable
assembly language - it cannot let you write assembly-style code for more
than one target.

Quoted text here. Click to load it

That's fine - you have a low-level language and a compiler, not a
portable assembler.

Some time it might be fun to look at some example functions, compiled
for either the 68K or the PPC (or, better still, both) and compare both
the source code and the generated object code to modern C and modern C
compilers.  (Noting that the state of C compilers has changed a great
deal since you started making VLA.)

Quoted text here. Click to load it

Fair enough.

Good comparisons are, of course, extremely difficult - and not least,
extremely expensive.  You would need to do large scale experiments with
at least dozens of programmers working on a serious project before you
could compare efficiency properly.

Re: Portable Assembly
On 29.5.2017 ?. 16:06, David Brown wrote:
Quoted text here. Click to load it

Indeed, having more registers is of huge help. But it is not as straight
forward as it might seem at first glance. Then while at it I did a lot
more than just emulate the 68k - on power we have a lot more on offer,
I wanted to take advantage of it, like adding syntax to not touch
the CCR - as the 68k unavoidably does on moves, add and many others,
use the 3 address mode - source1,source2, destination - and have this
available not just as registers but as any addressing mode etc.
If you assemble plain CPU32 code the resulting power object code size is
about 3.5 times the native CPU32 code size. If you write with power
in mind - e.g. you discourage all the unnecessary CCR (CR in power)
updates - code sizes get pretty close. I have designed in a pass
for optimizing that automatically, 1.5 decades later still waiting
to happen... :-). No need for it which would deflect me from more
pressing issues I suppose.

Quoted text here. Click to load it

This is semantics - but since user level 68k code assembles directly
I think it is fair enough to borrow the word "assembler". Not what
everyone understands under it every time of course but must have
sounded OK to me back then. Then I am a practical person and tend
not to waste much time on names as long as they do not look
outright ugly or misleading (well I might go on purpose for "ugly"
of course but have not done it for vpa).

Quoted text here. Click to load it

Well yes, if we accept that we have to accept that VPA (Virtual
Processor Assembler) is not exactly an assembler. But I think the
name is telling enough what to expect.

Quoted text here. Click to load it

That is completely possible with vpa for power, nothing is stopping you
from using native to power opcodes (I use rwlinm and rlwimi quite often,
realizing there might be no efficient way to emulate them but I do what
I can do best, if I get stuck in a world with x86 processors only
which have just the few original 8086 registers I'll switch occupation
to herding kangaroos or something. Until then I'll change to a new
architecture only if I see why it is better than the one I use now,
for me portability is just a means, not a goal).

Quoted text here. Click to load it

Yes but they run in fewer cycles. Apart from the PC relative - there is
no direct access to the PC on power, takes 2-3 opcodes to get to it
alone - the rest works faster. And say the An,Dn.l*4 mode can take
not just powers of 2... etc., it is pretty powerful.

Quoted text here. Click to load it

Oh yes, backward porting would be quite painful. I do use all registers
I have - rarely resorting to r4-r7, they are used for addressing mode
calculations, intermediate operands etc., use one of them and you
have something to work on when porting later. I still do it at times
when I think it is justified... may well bite me one day.

Quoted text here. Click to load it

Hmmm, not for any target - yes. For more than one target with the code
not losing efficiency - it certainly can, if the new target is right
(as was the case 68k -> power).
Basically I have never been after a "universal assembler", I just wanted
to do what you already know I wanted. How we call it is of secondary
interest to me to be fair :-).

Quoted text here. Click to load it

Yes, I would also be curious to see that. Not just a function - as it
will likely have been written in assembly by the compiler author - but
some sort of standard thing, say a base64 encoder/decoder or some
vnc server thing etc. (the vnc server under dps is about 8 kilobytes,
just looked at it. Does one type of compression (RRE misused as RLE) and


Dimiter Popoff, TGI             http://www.tgi-sci.com

Re: Portable Assembly
On 29/05/17 19:02, Dimiter_Popoff wrote:
Quoted text here. Click to load it

<snipped some interesting stuff about VPA>

Quoted text here. Click to load it

To be practical, it /should/ be a function - or no more than a few  
functions.  (I don't know why you think functions might be written in  
assembly by the compiler author - the compiler author is only going to  
provide compiler-assist functions such as division routines, floating  
point emulation, etc.)  And it should be something that has a clear  
algorithm, so no one can "cheat" by using a better algorithm for the job.

Re: Portable Assembly
On 30.5.2017 ?. 00:13, David Brown wrote:
Quoted text here. Click to load it

I am pretty sure I have seen - or read about - compiler generated
code where the compiler detects what you want to do and inserts
some assembly prewritten piece of code. Was something about CRC
or about tcp checksum, not sure - and it was someone who said that,
I don't know it from direct experience.

But if the compiler does this it will be obvious enough.

Anyway, a function would do - if complex and long enough to
be close to real life, i.e. a few hundred lines.

But I don't see why not compare written stuff, I just checked
again on that vnc server for dps - not 8k, closer to 11k (the 8k
I saw was a half-baked version, no keyboard tables inside it etc.;
the complete version also includes a screen mask to allow it
to ignore mouse clicks at certain areas, that sort of thing).
Add to it some menu (it is command line option driven only),
a much more complex menu than windows and android RealVNC has
I have and it adds up to 25k.
Compare this to the 350k exe for windows or to the 4M for Android
(and the android does only raw...) and the picture is clear enough
I think.


Dimiter Popoff, TGI             http://www.tgi=sci.com


Re: Portable Assembly
On 30/05/17 15:53, Dimiter_Popoff wrote:
Quoted text here. Click to load it

A compiler sees the source code you write, and generates object code
that does that job.  It be smart about it, but it will not insert
"pre-written assembly code".  Code generation in compilers is usually
defined with some sort of templates (such a pattern for reading data at
a register plus offset, or a pattern for doing a shift by a fixed size,
etc.).  They are not "pre-written assembly", in that many of the details
are determined at generation time, such as registers, instruction
interleaving, etc.

The nearest you get to pre-written code from the compiler is in the
compiler support libraries.  For example, if the target does not support
division instructions, or floating point, then the compiler will supply
routines as needed.  These /might/ be written in assembly - but often
they are written in C.

A compiler /will/ detect patterns in your C code and use that to
generate object code rather than doing a "direct translation".  The
types of patterns it can detect varies - it is one of the things that
differentiates between compilers.  A classic example for the PPC would be:

#include <stdint.h>

uint32_t reverseLoad(uint32_t * p) {
  uint32_t x = *p;
  return ((x & 0xff000000) >> 24)
       | ((x & 0x00ff0000) >> 8)
       | ((x & 0x0000ff00) << 8)
       | ((x & 0x000000ff) << 24);

I am using gcc 4.8 here, since there is a convenient online version as
part of the <https://gcc.godbolt.org/ "compiler explorer".  gcc is at
7.0 these days, and has advanced significantly since then - but that is
the version that is most convenient.

A direct translation (compiling with no optimisation) would be:

        stwu 1,-48(1)
        stw 31,44(1)
        mr 31,1
        stw 3,24(31)
        lwz 9,24(31)
        lwz 9,0(9)
        stw 9,8(31)
        lwz 9,8(31)
        srwi 10,9,24
        lwz 9,8(31)
        rlwinm 9,9,0,8,15
        srwi 9,9,8
        or 10,10,9
        lwz 9,8(31)
        rlwinm 9,9,0,16,23
        slwi 9,9,8
        or 10,10,9
        lwz 9,8(31)
        slwi 9,9,24
        or 9,10,9
        mr 3,9
        addi 11,31,48
        lwz 31,-4(11)
        mr 1,11

Gruesome, isn't it?  Compiling with -O0 puts everything on the stack
rather than holding variables in registers.  Code like that was used in
the old days - perhaps at the time when you decided you needed something
better than C.  But even then, it was mainly only for debugging - since
debugger software was not good enough to handle variables in registers.

Next up, -O1 optimisation.  This is a level where the code becomes
sensible, but not too smart - and it is not uncommon to use it in
debugging because you usually get a one-to-one correspondence between
lines in the source code and blocks of object code.  It makes it easier
to do single stepping.

        lwz 9,0(3)
        slwi 3,9,24
        srwi 10,9,24
        or 3,3,10
        rlwinm 10,9,24,16,23
        or 3,3,10
        rlwinm 9,9,8,8,15
        or 3,3,9

Those that can understand the PPC's bit field instruction "rlwinm" will
see immediately that this is a straightforward translation of the source
code, but with all data held in registers.

But if we ask for smarter optimisation, with -O2, we get:

        lwbrx 3,0,3

This is, of course, optimal.  (Even the function call overhead will be
eliminated if the compiler can do so when the function is used.)

Quoted text here. Click to load it

If you had some examples or references, it would be easier to see what
you mean.

Quoted text here. Click to load it

A function that is a few hundred lines of source code is /not/ real life
- it is broken code.  Surely in VLA you divide your code into functions
of manageable size, rather than single massive functions?

Quoted text here. Click to load it

A VNC server is completely useless for such a test.  It is far too
complex, with far too much variation in implementation and features, too
many external dependencies on an OS or other software (such as for
networking), and far too big for anyone to bother with such a comparison.

You specifically need something /small/.  The algorithm needs to be
simple and clearly expressible.  Total source code lines in C should be
no more than about a 100, with no more than perhaps 3 or 4 functions.
Smaller than that would be better, as it would make it easier for us to
understand the VLA and see its benefits.

Here is a possible example:

// Type for the data - this can easily be changed
typedef float data_t;

static int max(int a, int b) {
  return (a > b) ? a : b;

static int min(int a, int b) {
  return (a < b) ? a : b;

// Calculate the convolution of two input arrays pointed to by
// pA and pB, placing the results in the output array pC.
void convolute(const data_t * pA, int lenA, const data_t * pB,
    int lenB, data_t * pC, int lenC) {

  // i is the index of the output sample, run from 0 to lenC - 1
  // For each i, we calculate the sum as j goes from -inf to +inf
  // of A(j) * B(i - j)
  // Clearly we can limit j to the range 0 to (lenA - 1)
  // We use k to hold i - j, which will run down as j runs up.
  // k will be limited to (lenB - 1) down to 0.
  // From (i - j) >= 0, we have j <= i
  // From (i - j) < lenB, we have j > (i - lenB)
  // These give us tighter bounds on the run of j

  for (int i = 0; i < lenC; i++) {
    int firstJ = max(0, 1 + i - lenB);
    int endJ = min(lenA, i + 1);
    data_t x = 0;
    for (int j = firstJ; j < endJ; j++) {
      int k = i - j;
      x += (pA[j] * pB[k]);

    pC[i] = x;

With gcc 4.8 for the PPC, that's about 55 lines of assembly.  An
interesting point is that the size and instructions are very similar
with -O1 and -O2, but the ordering is significantly different - with
-O2, the pipeline scheduling is considered.  (I don't know which
particular cpu model is used for scheduling by default in gcc.)

To be able to compare with VLA, you'd have to write this algorithm in
VLA.  Then you could compare various points.  It should be easy enough
to look at the size of the code.  For speed comparison, we'd have to
know your target processor and compile specifically for that (to get the
best scheduling, and to handle small differences in the availability of
particular instructions).  Then you would need to run the code - I don't
have any PPC boards conveniently on hand, and of course you are the only
one with VLA tools.

Comparing code clarity and readability is, of course, difficult - but
you could publish your VLA and we can maybe get an idea.  Productivity
is also hard to measure.  For a function like this, the time is spent on
the details of the algorithm and getting the right bounds on the loops -
the actual C code is easy.

You can get a gcc 5.2 cross-compiler for PPC for Windows from here
<http://tranaptic.ca/wordpress/downloads/ , or you can use the online
compiler at <https://gcc.godbolt.org/ .  The PowerPC is not nearly as
popular an architecture as ARM, and it is harder to find free
ready-built tools (though there are plenty of guides to building them
yourself, and you can get supported commercial versions of modern gcc
from Mentor/CodeSourcery).  You can also find tools directly from

Re: Portable Assembly
On 31.5.2017 ?. 12:36, David Brown wrote:
Quoted text here. Click to load it
   Code generation in compilers is usually
Quoted text here. Click to load it
 > types of patterns it can detect varies - it is one of the things that
 > differentiates between compilers.

We are referring to the same thing under different names - again.
At the end of the day everything the compiler generates is written
in plain assembly, it must be executable by the CPU.
Under "prewritten" I mean some sort of template which gets filled
with addresses etc. thing before committing.
To what lengths the compiler writers go to make common cases look
good know only the writers themselves, my memory is vague but I
do think the guy who said that a few years ago knew what he was
talking about.

Quoted text here. Click to load it

Above all this is a good example how limiting the high level language
is. Just look at the source and then at the final result.

You will get *exactly* the same result (- the return) with no
optimization in vpa from the line:

  mover.l (source),r3

Logic optimization is more or less a kindergarten exercise. If you need
logic optimization you don't know what you are doing anyway so the
compiler won't be able to help much, no matter how good.

Of course if you stick by a phrase book at source level - as is the case
with *any* high level language - you will need plenty of optimization,
like your example demonstrates. I bet it will will be good only in demo
cases like yours and much less useful in real life, so the only benefit
of writing this in C is the source length, 10+ times the necessary (I  
counted it and I included a return line in the count, 238 vs. 23 bytes).
While 10 times more typing may seem no serious issue to many 10 times
higher chance to insert an error is no laughing matter, and 10 times
more obscurity just because of that is a productivity killer.

Quoted text here. Click to load it

I meant "function" not the in C subroutine kind of sense, I meant it
more as "functionality", i.e. some code doing some job. How it split
into pieces etc. will depend on many factors, language, programmer
style etc., not relevant to this discussion.

Quoted text here. Click to load it

Actually I think a comparison between two pieces of code doing the same
thing is quite telling when the difference is in the orders of
magnitude, as in this case.
Writing small benchmarking toy sort of stuff is a waste of time, I am
interested in end results.

Quoted text here. Click to load it

No, something "small" is kind of kindergarten exercise again, it can
only be good enough to fool someone into believing this or that.
It is end results which count.


Dimiter Popoff, TGI             http://www.tgi-sci.com

Re: Portable Assembly
On 01/06/17 21:43, Dimiter_Popoff wrote:
Quoted text here. Click to load it

OK.  I think your naming and description is odd, but I am glad to see we
are getting a better understanding of what the other is saying.

Quoted text here. Click to load it

I think of "prewritten" as referring to larger chunks of assembly code,
with much more concrete choices of values, registers, scheduling, etc.
You described the "prewritten" code as being easily recognisable - in
reality, the majority of the code from modern compilers is generated
from very small templates with great variability.  And on a processor
like the PPC, these will be intertwined with each other according to the
best scheduling for the chip.

As an example, if we have the function:

int foo0(int * p) {
  int a = *p * *p;
  return a;

The template for reading "*p" generates

    lmz 3, 0(3)

(Register r3 is used for the first parameter in the PPC eabi.  It is
also used for the return value from a function, which is why it may seem
"over used" in the examples here.  In bigger code, and when the compiler
can inline functions, it will be more flexible about register choices.
I don't know whether you follow the standard PPC eabi in your tools.)

Multiplication is another template:

    mullw 3, 3, 3

As is function exit, in this case just:


I find it very strange to consider these as "pre-written assembly".

And if the function is more complex, the intertwining causes more
mixups, making it less "pre-written":

int foo1(int * p, int * q) {
  int a = *p * *p;
  int b = *q * *q;
  return a + b;

    lwz 9,0(3)
    lwz 10,0(4)
    mullw 9,9,9
    mullw 3,10,10
    add 3,9,3

Quoted text here. Click to load it

Well, it is known to the compiler writers and to users who look at the
generated code!  Certainly there is plenty of variation between tools,
with more advanced compilers working harder at this sort of thing.
Command line switches with choices of optimisation levels can also make
a big difference.

How much experience do you have of using C compilers, and studying their

Quoted text here. Click to load it

<skipping the details>

Quoted text here. Click to load it

No, that is a good example of how smart the compiler is (or can be)
about generating optimal code from the source.

You may in addition view this as a limitation of the C language, which
has no direct way to specify a "bit reversed pointer".  That is fair
enough.  However, it is not really any harder than defining a function
like this, and then using it.  For situations where the compiler can't
generate ideal code, and it is particularly useful to get such optimal
assembly, it is also possible to write a simple little inline assembly
function - it is not really any harder than writing the same thing in
"normal" assembly.

Another option (for newer gcc) is to define the endianness of a struct.
 Then you can access the fields directly, and the loads and stores will
be reversed as needed.

typedef struct __attribute__((scalar_storage_order ("little-endian"))) {
  uint32_t x;
} le32_t;

uint32_t reverseLoad2(le32_t * p) {
  return p->x;

        lwbrx 3,0,3

So the high level language gives you a number of options, with specific
tools giving more options, and the implementation gives you efficient
object code in the end.  You might need to define a function or macro
yourself, but that is a one-time job.

Quoted text here. Click to load it

When you say "no optimisation" here, does that mean that VPA supports
some kinds of optimisations?

Quoted text here. Click to load it

What do you mean by "logic optimisation" ?  It is normal for a good
compiler to do a variety of strength reduction and other re-arrangements
of code to give you something with the same result, but more efficient
execution.  And it is a /good/ thing that the compiler does that - it
means you can write your source code in the clearest and most
maintainable fashion, and let the compiler generate better code.

For example, if you have a simple division by a constant:

uint32_t divX(uint32_t a) {
  return a / 5;

The direct translation of this would be:

    lis 4,5
    divwu 3,3,4

But a compiler can do better:

divX:            // divide by 5
        lis 9,0xcccc
        ori 9,9,52429
        mulhwu 3,3,9
        srwi 3,3,2

Such optimisation is certainly not a "kindergarten exercise", and doing
it by hand is hardly a maintainable or flexible solution.  Changing the
denominator to 7 means significant changes:

divX:            // divide by 7
        lis 9,0x2492
        ori 9,9,18725
        mulhwu 9,3,9
        subf 3,9,3
        srwi 3,3,1
        add 3,9,3
        srwi 3,3,2

Quoted text here. Click to load it

I still don't know what you mean with "phrase book" here.

Quoted text here. Click to load it

Nonsense.  The benefits of using a higher level language and a compiler
get more noticeable with larger code, as the compiler has no problem
tracking register usage, instruction scheduling, etc., across large
pieces of code - unlike a human.  And it has no problem re-creating code
in different ways when small details change in the source (such as the
divide by 5 and divide by 7 examples).

Quoted text here. Click to load it

You have this completely backwards.  If I write a simple example like
this, in a manner that is compilable code, then it is going to take
longer in high-level source code.  But that is the effect of giving that
function definition.  In use, writing "reverseLoad" does not take
significantly more characters than "mover" - and with everything else
around, the C code will be much shorter.  And this was a case picked
specifically to show how some long patterns in C code can be handled by
a compiler to generate optimal short assembly sequences.

The division example shows the opposite - in C, I write "a / 7", while
in assembly you have to write 7 lines (excluding labels and blr).  And
the C code there is nicer in every way.

Quoted text here. Click to load it

In real code, the C source will be 10 times shorter than the assembly.
And if the assembly has enough comments to make it clear, there is
another order of magnitude difference.

Quoted text here. Click to load it


But again, it has to be a specific clearly defined and limited
functionality.  "Write a VNC server" is not a specification - that would
take at least many dozens of pages of specifications, not including the
details of the interfacing to the network stack, the types of library
functions available, the API available to client programs that will
"draw" on the server, etc.

Quoted text here. Click to load it

No, it is not.  The code is not comparable in any way, and does not do
the same thing except in a very superficial sense.  It's like comparing
a small car with a train - both can transport you around, but they are
very different things, each with their advantages and disadvantages.

If you want to compare your VNC server for DPS written in VPA to a VNC
server written in C, then you would need to give /exact/ specifications
of all the features of your VNC server, and exact details of how it
interfaces with everything else in the DPS system, and have someone
write a VNC server in C for DPS that follows those same specifications.
 That would be no small feat - indeed, it would totally impossible
unless you wanted to do it yourself.

The nearest existing comparison I can think of would be the eCos VNC
server, written in C.  I can't say how it compares in features with your
server, but it has approximately 2100 lines of code, written in a wide
style.  Since I have no idea about how interfacing with DPS compares
with interfacing with eCos (I don't know either system), I have no idea
if that is a useful comparison or not.

Quoted text here. Click to load it

Then we will all remain in ignorance about whether VPA is useful or not,
in comparison to developing in C.

Re: Portable Assembly
Quoted text here. Click to load it


Yet, that's what a "Universal Assembler" would be able to do.

Quoted text here. Click to load it

And it is not anything close to a "Universal Assembler".


Re: Portable Assembly
On 5/27/2017 2:17 PM, Les Cargill wrote:
Quoted text here. Click to load it

Arguably, ANY HLL.

Quoted text here. Click to load it

Depends on how you handle your abstractions in the design.
If you tie the design directly to the hardware, then you've
implicitly made it dependent on that hardware -- without
even being aware of the dependencies.

OTOH, you can opt to create abstractions that give you a "slip sheet"
above the bare iron -- at some (small, if done well) cost in efficiency.
(e.g., "Hardware Abstraction Layer" -- though not necessarily as
explicit or limiting therein)

E.g., my current RTOS moves reasonably well between different hardware
platforms (I'm running on ARM and x86, currently) with the same sorts of
services exported to the higher level API's.

OTOH, the API's explicitly include provisions that allow the "application"
layers to tailor themselves to key bots of the hardware made largely
opaque by the API (e.g., MMU page sizes, number and granularity of hardware
timers, etc.)

But, this changes the level of proficiency required of folks working
with those API's.  Arguably, I guess it should (?)

Of course, if you want to shed all "hardware dependencies" and just code
to a POSIX API...  <shrug>

One could make an abstraction that is sufficiently *crude* (the equivalent of
single-transistor logic) and force the coder to use that as an implementation
language; then, recognize patterns of "operations" and map those to templates
that correlate with opcodes of a particular CPU (i.e., many operations -> one
opcode).  Or, the HLL approach of mapping *an* operation into a sequence of
CPU-specific opcodes.  Or, many<->many, in between.

Re: Portable Assembly
On Sat, 27 May 2017 16:17:57 -0500, Les Cargill wrote:

Quoted text here. Click to load it

I have used some very good portable C code across three or four different  
architectures (depending on whether you view a 188 and a 286 as different  
architectures).  This all in one company over the span of 9 years or so.

So -- perhaps your scope is limited?


Re: Portable Assembly
On 27.5.2017 ?. 22:39, rickman wrote:
Quoted text here. Click to load it

The only thing of that kind I know of is vpa (virtual processor
assembler) which I have created some 17 years ago.
Takes 68k source and assembles it into power architecture code.
It is a pretty huge thing, all dps (the OS I had originally written
for 68k (CPU32), toolchains, application code for our products etc.
etc. (millions of lines) go through it - and it can do a lot more
than just assemble statements, it does everything I ever wanted
it to do - and when it could not I extended it so it could.

It would be a lot simpler for a smaller processor and less
demanding code of course - as the typical mcu firmware would be.
Basically apart from some exceptions any source working on one
processor can be assembled into code for another one; and the
exceptions are not bulky, though critical - like some handlers within
the supervisor/hypervisor code, task switching, in general dealing
with exceptions is highly processor dependent - though in large part
the code which does the handling is still processor independent, one
has to go through it manually.


Dimiter Popoff, TGI             http://www.tgi-sci.com

Re: Portable Assembly
On Saturday, May 27, 2017 at 2:39:41 PM UTC-5, rickman wrote:
Quoted text here. Click to load it
Quoted text here. Click to load it
Quoted text here. Click to load it
Quoted text here. Click to load it
Quoted text here. Click to load it
Quoted text here. Click to load it
Quoted text here. Click to load it
Quoted text here. Click to load it
Quoted text here. Click to load it

]>ported to a number of different embedded processors including custom proc
essors in FPGAs

It's possible to do direct threaded code in C.  For small projects, the num
ber of threaded code routines is small and highly application specific.
So all the thread code segments are very portable and the debugging is in t
he threaded code routines (e.g. one can perfect the application in C on a P
C and then migrate to any number of custom ISAs).

That said, am currently creating a system of symbolic constants for all the
 op-codes and operand values (using VHDL and for each specific ISA).  One c
an create symbolic constants for various locations in the code (and manuall
y update the constants as code gets inserted or deleted).  Per-opcode funct
ions can be defined that make code generation less troublesome.  The code (
either constant expressions or function calls) is laid out as initializatio
n for the instruction memory.  Simulation can be used to debug the code and
 the ISA.  A quick two step process: edit the code and run the simulator.

One can also write a C or any other language program that generates the bin
ary code file which is then inserted into FPGA RAM during the FPGA compile  
step.  Typically one writes a separate function for each op-code or label g
enerator (and for each ISA).  Two passes through all the function calls (e.
g. the application program) first pass to generate the labels and the secon
d pass to generate the binary file.  For use with FPGA simulation this is a
 three step process: edit the application program, run the binary file gene
rator and run the FPGA simulator.

The preferred solution is to support label generators in the memory initial
ization sections of the VHDL or Verilog code.  
Would be very interested if someone has managed to do label generators?

Re: Portable Assembly
On 2017-05-27 3:39 PM, rickman wrote:
Quoted text here. Click to load it

I have done a few portable assemblers of the general type your
describing. There are two approaches. One is to write macro's for the
instruction set for the target processor and effectively assembler
processor A into processor B with macros. This might work for
architecturally close processors but even then has significant problems.
To give an example 6805 to 6502. The carry following the subtract of 0 -
0 is different.

There is one approach that I have used that does work reasonably well.
Assemble processor A into functionally rich intermediate code and
compile the intermediate code into processor B. The resulting code is
quite portable between the processors and it is capable of supporting a
diverse architectures quite well.

I have done mostly 8 bit processors this way 6808 3 major families to
PIC many varieties 12,14,14x,16 families. In all cases I set up the
translation so I could go either way. I have also targeted some 16,24,
and 32 bit processors. For pure code this has worked quite well with a
low penalty for the translation.

Application code usually has processor specific I/O which can actually
be detected by the translator but generally needs to have some hand


Re: Portable Assembly
Quoted text here. Click to load it

LLVM has a pretty generic intermediate assembler language, though I'm not  
sure if it's meant for actually writing code in.


Another portable assembly language is Java Bytecode, though it assumes a  
32-bit machine.

(Remove the obvious prefix to reply privately.)
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Re: Portable Assembly
On 02/06/2017 16:03, Boudewijn Dijkstra wrote:
Quoted text here. Click to load it

Interesting, but its not obvious who the audience is. Why would anyone  
want to learn another language that is not in common use or aligned to  
any specific CPU?

Quoted text here. Click to load it

I've been watching this thread for some time. My first impression was  
why not just write in C? So far that impression hasn't changed. Despite  
the odd line of CPU specific assembler code for those occasions that  
require it, C is still perhaps the most portable code you can write?

Mike Perkins
Video Solutions Ltd
We've slightly trimmed the long signature. Click to see the full one.
Re: Portable Assembly
On 6/5/2017 7:39 AM, Mike Perkins wrote:
Quoted text here. Click to load it

Esperanto?  :>

Quoted text here. Click to load it

The greater the level of abstraction in a language choice, the less
control you have over expressing the minutiae of what you want done.

When I design a new processor (typ. application specific), I code up
sample algorithms using a very low level set of abstractions... virtual
registers, virtual operators, etc.

Once I'm done with a number of these, I "eyeball" the "code" and sort out
what the instructions (opcodes) should be for the processor.  I.e., invent
the "assembly language".

If I'd coded these algorithms in a HIGHER level language, I'd end up
implementing a much more "complex" processor (because it would have
to implement much more capable "primitives")

C's portability problem isn't with the language, per se, as much as it is
with the "practitioners".  It could benefit from much stricter type
checking and a lot fewer "undefined/implementation-defined behaviors"
(cuz it seems folks just get the code working on THEIR target and
never see how it fails to execute properly on any OTHER target!)

Re: Portable Assembly

Quoted text here. Click to load it

The argument always has been that if implementation defined behaviors
are locked down, then C would be inefficient on CPUs that don't have
good support for <whatever>.

Look at the (historical) troubles resulting from Java (initially)
requiring IEEE-754 compliance and that FP results be exactly
reproducible *both* on the same platform *and* across platforms.

No FP hardware fully implements any version of IEEE-754: every chip
requires software fixups to achieve compliance, and most fixup suites
are not even complete [e.g., ignoring unpopular rounding modes, etc.].
Java FP code ran slower on chips that needed more fixups, and the
requirements prevented even implementing a compliant Java on some
chips despite their having FP support.

Java ultimately had to entirely back away from its reproducibility
guarantees.  It now requires only best consistency - not exact
reproducibility - on the same platform.  If you want reproducible
results, you have to use software floating point (BigFloat), and
accept much slower code.  And by requiring consistency, it can only
approximate the performance of C code which is likewise compiled. Most
C compilers allow to eshew FP consistency for more speed ... Java does

Of course, FP in general is somewhat less important to this crowd than
to other groups, and C has a lot of implementation defined behavior
unrelated to FP.  But the lesson of trying to lock down hardware
(and/or OS) dependent behavior still is important.

There is no question that C could do much better type/value and
pointer/index checking, but it likely would come at the cost of far
more explicit casting (more verbose code), and likely many more
runtime checks.

A more expressive type system would help [e.g., range integers, etc.],
but that would constitute a significant change to the language.  

Some people point to Ada as an example of a language that can be both
"fast" and "safe", but many people (maybe not in this group, but many
nonetheless) are unaware that quite a lot of Ada's type/value checks
are done at runtime and throw exceptions if they fail.

Obviously, a compiler could provide a way to disable the automated
runtime checking, and even when enabled checks can be elided if the
compiler can statically prove that a given operation will always be
safe.  But even in Ada with its far more expressive types there are
many situations in which the compiler simply can't do that.

More stringent languages like ML won't even compile if they can't
statically type check the code.  In such languages, quite a lot of
programmer effort goes toward clubbing the type checker into


Re: Portable Assembly
On 06/06/17 06:24, George Neuner wrote:
Quoted text here. Click to load it

I don't think C would benefit from a /lot/ fewer undefined or
implementation-dependent behaviours.  Some could happily be removed
(IMHO), but most are fine.  However, I would like to see
/implementations/ working harder towards spotting this sort of thing in
user code - working to fix the /real/ problem of bad programmers, rather
than changing the language.

For example, some people seem to think that ints have two's complement
wrap-around behaviour on overflow in C, just because that is how the
underlying cpu handles it.  Some languages (like Java) avoid undefined
behaviour by giving this a definition - they say exactly how signed
overflow should be handled.  In my opinion, this is missing the point -
if your code has signed integers that overflow, you've got a bug.  There
is /no/ right answer - picking one and saying "we define the behaviour
/this/ way" does not make it right.  So allowing the compiler to assume
that it will never happen, and to optimise accordingly, is a good idea.

But compilers should do their best to spot such cases, and hit out hard
when they see it.  When the compiler sees "for (int i = 0; i >= 0;
i++)", it should throw a tantrum - it should not merely break off
compilation with an error message, it should send an email to the
programmer's boss.  (I'll settle for a warning message that is enabled
by default.)

Compilers /are/ getting better at warning on undefined behaviour, but
they could always be better.

Quoted text here. Click to load it

Yes - and it is still a good argument.

It might be a nice idea to do a little bit of clean-up of some of the
options.  I don't think it would do much harm if future C standards
enforced two's complement signed integers without padding, for example -
one's complement and signed-magnitude machines are extremely rare.

Quoted text here. Click to load it

That is not necessarily the case - but to get much stronger type
checking in C, you would need to include so many features to the
language that you might as well use C++.  For example, it is quite
possible in C to define types "speed", "distance" and "time" so that you
can't simply add a "distance" and a "time", while still being able to
generate optimal code.  But you can't use normal operators in
expressions with the types - you can't write "v = d / t;", but need to
write "v = divDistanceTime(d, t);".

Quoted text here. Click to load it

Yes, and yes.

Ranged integer types would be nice, and would give not just safer code,
but more efficient code.

There are many things that would be nice to add to the language (and to
C++), some of which are common as extensions in compilers but which
could usefully be standardised.  An example is gcc's
"__builtin_constant_p" feature.  This can be used to let the compiler do
compile-time checking where possible, but skip run-time checks for code

extern void __attribute__((error("Assume failed"))) assumeFailed(void);

// The compiler can assume that "x" is true, and optimise or warn
// accordingly
// If the compiler can see that the assume will fail, it gives an error
#define assume(x) \
    do { \
        if (__builtin_constant_p(x)) { \
            if (!(x)) { \
                assumeFailed(); \
            } \
        } \
        if (!(x)) __builtin_unreachable(); \
    } while (0)

If such features were standardised, they could be used in all code - not
just gcc-specific code.

Quoted text here. Click to load it

They also involve a good deal more verbose code.

Quoted text here. Click to load it

Site Timeline