Apologies where they are due

G

Gareth's Downstairs Computer 8 years ago

I must apologise if I periodically raise the same subjects as before, such as the graphics processor and instruction encodings, etc, but the situation is that I have a number of very deep seated interests other than just low-level computing, particularly amateur radio, horology, model railways and bellringing, and I tend to be single mindedly being utterly fascinated by only ever one at any time with the result that it can be many weeks or months before returning to any one interest.

However, it remains that my computing interest, resulting from my first real job 45 years ago, is to produce an interactive programming language (the interactivity of BASIC or FORTH) but that runs at the speed of compiled code and yet is fully reconstructable as the original source.

For example, due to operator precedence, parentheses might prove to be unnecessary, but you'd be justifiably miffed if they disappeared from the source code, as source code has human understandabilty considerations, and so it becomes necessary to insert a special form of NOP in the compiled code to represent the parentheses; only one NOP for each pair.

So, the reason for wanting the instruction encoding of the A53 (now discovered in excelsis) is to seek ambiguous forms of the same instruction to let me effectively insert such NOPs but without unnecessarily slowing down a program.

This interest might well be obsolete nowadays, now that there is no shortage of either RAM or disk files to store both source and object, but old habits die hard!

I also have no doubt that these ideas of mine will have been well resolved by countless others over the decades and that I will be re-inventing the wheel but calling it fire.

I previously worked on this project prior to retirement

32 years ago with the CRAP and SOAS lanuages on the x86 processors but found that intensive programming in the evenings after intensive programming during the working hours was just too tiring.

CRAP, Create Rapidly Application Programs, was a tokenised interpreter, just to get the ideas down (just 6 weeks of eveinings and weekends to get an interpreter that could list itself) and SOAS, S*it Off A Shovel, was the faster version where the source code tokens were the compiled code itself.

Sorry, apologies again, waffling on!

Vote

M

Martin Gregorie 8 years ago

You probably already know this, but don't forget that many BASICs were in reality compiled because RUN compiled into some more compact form and then executed that. Many UNIX scripting languages are the same (AWK, Perl, etc): they generate some form of P-code, which is then interpreted by the runtime system. This seems to be a nice compromise, especially if the code generated from each statement is the minimum needed to pass variables to the previously compiled runtime system - the result could be nearly as fast as optimised compiled code.

BTW, have you looked at JOSS or its derivatives, JEAN and FOCAL? JOSS was one of the earlier interactive languages. JEAN ran on 1900 mainframes and FOCAL was a DEC language.

There may be some nice ideas there, especially that like BASIC, JEAN line numbers were treated as labels but, unlike BASIC, the line numbers were real numbers. This made the editor simple since lines were always in numeric order and, if you needed to add a line between 1.2 and 1.3 you just gave it a line number of 1.25 Another nice trick was that all the lines with the same integer part (known in JEAN as a 'part') could be treated as a subroutine.

The main drawback was that conditions were suffixes to the statement they controlled, so didn't have ELSE branches:

1.1 INPUT R as "Input a radius" 1.15 FINISH IF R = 0.0 1.2 DO PART 2 IF R > 0.0 1.25 GOTO 1.1 1.3 PRINT "Radii must be positive values" 1.4 GOTO 1.1 2.0 A = Pi * R * R 2.1 PRINT A "Area is %.%%"

.... or something like that. Its been 50 years since I last used JEAN.

Martin | martin at Gregorie | gregorie dot org

Vote

G

Gareth's Downstairs Computer 8 years ago

Interesting response, thankyou.

The essence behind my thinking is that there has to be a unique tokenised form of source code, which is usually ASCII or Unicode, and why should that unique form not also be the compiled code?

The sort of ambiguities that I seek are the two different ways of adding R3 to R2 ...

ADD R2,R2,R3

or

Add R2,R3,R2

OK, can't compile completely as there has to be some symbol table info somewhere so that a variable defined as, eg, CurrentArrayIndex does not list back as V32 :-)

Vote

M

Martin Gregorie 8 years ago

You might also want to think about the editor part of your runtime. I'm saying this because a major difference in usability between interactive languages (and even between various BASICs) is the ease of use or plain horridness of some of the editors they've used to view and change programs.

I had a quick search for a runnable version of JEAN but, unless there is one bundled in with the George 3 emulator, I don't think there are any runnable versions of it or JOSS available. I know I used JEAN under Minimop way back when and *thought* I had also used it during George 3 interactive MOP sessions but maybe I was mistaken.

Martin | martin at Gregorie | gregorie dot org

Vote

G

Gareth's Downstairs Computer 8 years ago

Writing an editor is something to come after the basic text in/out has been resolved and bootstrapped into its own language.

It's also something with pitfalls, for, if editing into the compiled code may take several steps in different parts of the source which would be incompatible with each other and hence cause a compilation error!

Vote

D

Dennis Lee Bieber 8 years ago

On Sun, 22 Apr 2018 14:16:06 +0100, Gareth's Downstairs Computer declaimed the following:

Unless there's a significant timing difference I'd probably pick the first -- as mentally I still think of 2-argument operations and the first looks closer to "add r3 to r2".

But unless you are adding a lot of register optimization to variable allocations, a high-level statement like

varA = varA + varB

is going to turn into (pseudo-code, I'm not going to dig up an ARM instruction table)

LD r2,varA LD r3,varB ADD r2,r2,r3 STO r2,varA

whilst

varA = varB + varA

might become

LD r2,varB LD r3,varA ADD r3,r2,r3 STO r3,varA

just due to the order of assigning free registers to the variables of the expression first, and only later handling the saving of the result.

OR, using an extra register which you dedicate as an accumulator

LD r2,varX LD r3,varY ADD r1,r2,r3 STO r1,varX

(for chained calculations you'd then repeat the "accumulator" as the first source)

Hence why early BASIC only supported variable names of the form: [A..Z][|0..9], along with data type suffixes (originally -- no suffix->numeric, $->string; later adding codes for integer, float, double).

Only 26*11 => 286 variable names maximum *4 (type codes) => 1144 possible combinations; easily regenerated from a simple numeric index. Only takes 11-bits to encode the entire variable space, leaving 5 bits to encode other indicators (use one bit to indicate byte-code with rest indexing the keyword/operator table)... Oh -- a bit to indicate Array, but as I recall, Arrays still used the core variable name space.

Use one bit to indicate "line number" and you have 32K lines to identify -- that's a pretty large program in those days. But the only way you're going to encode line numbers into the object file is going to require either a tag-on symbol table or some wasteful

BR PC+2 DATA

(branch program counter relative to jump over the line number itself)

Many years ago there was a book "Implementing BASICs" available (along with one for "Threaded Interpreted Languages") -- but Amazon searches find nothing for either subject.

Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/

Vote

A

Ahem A Rivet's Shot 8 years ago

Another common approach with BASIC was semi-compiled where statements would be compiled on the fly as they were encountered and a cache kept of compiled statements - the first target when memory runs short of course. The earliest of these caused some consternation as they performed unreasonably well on benchmarks which were mostly short loops and were thus fully compiled after the first iteration.

Steve O'Hara-Smith | Directable Mirror Arrays C:\>WIN | A better way to focus the sun The computer obeys and wins. | licences available see You lose and Bill collects. | http://www.sohara.org/

Vote

M

Martin Gregorie 8 years ago

Thats exactly why it may be a good idea to bear the edit requirements in mind when designing the language.

For instance a design using line numbers and single line statements should be fairly bulletproof even for incremental compilation, especially if line numbers automatically control statement sequence (unlike some BASICS!) and each statement corresponds to a chunk of position- independent code thats independent of the code in adjacent statements, but a block-structured language probably can't be made to work that way. As others have said, if this is the way you'd want to language to work, it suggests quite strongly that automatic variable creation combined with BASIC or Perl-like variable type annotation would be useful.

Martin | martin at Gregorie | gregorie dot org

Vote

G

Gareth's Downstairs Computer 8 years ago

Threaded Interpretive Languages by Loeliger.

I got a copy recently from Abebooks.com, although had to wait for delivery from Yankland.

Vote

G

Gareth's Downstairs Computer 8 years ago

The sort of difficulties that I had in mind were changing the number of parameters passed to a procedure / function / subroutine when there are already commpiled calls to it with the wrong number; or possibly changing the definition of a record or structure, again with compiled references already existing.

Perhaps there are some changes, particularly those changing the numbers of parameter or structure members that call for complete source reconstruction and recompilation.

Not thought everything through yet, so may be some inconsistencies in the above.

Vote

M

Martin Gregorie 8 years ago

The only place where I know that approach is used currently is the JIT optimising compiler in depths of the Java JVM.

The closest I've got to doing anything like the OP wants to do has involved using lex + yacc or (better) Coco/R to define a special purpose language that was implemented by generating C or Java and compiling that.

I think the weirdest special purpose language I've used has been NCC Filetab. It did the sort of job we'd now do with awk, Perl or even RPG, but it used one or more linked decision tables to describe what needed to be done with a file's contents plus an almost pictorial printed output definition. I remember it ran surprisingly fast as well as being fairly quick and easy to write.

Martin | martin at Gregorie | gregorie dot org

Vote

G

Gerald Lester 8 years ago

You may want to take a look at the current developments of using LLVM to compile Tcl to native code (Tcl already does a JIT byte code compilation).

Not Basic like you want, but you may get some ideas on how to do it for Basic.

+----------------------------------------------------------------------+ | Gerald W. Lester, President, KNG Consulting LLC | | Email: Gerald.Lester@kng-consulting.net | +----------------------------------------------------------------------+

Vote

D

druck 8 years ago

No, only the source needs such human readable information, it should not appear in compiled code, or even in the output of hand written assembler.

I couldn't think of a worse way to encode that information, who will remember in a years time which ambiguous form mean what?

Incidentally; there is only one guaranteed NOP in ARM, and that is ADD R0,R0,R0. The 32bit conditional on never instructions (NV) have been deprecated and reused for new instructions, and this may well happen to other "non useful with no side effects" instructions such as ADD Rx,Rx,Rx where Rx is not 0.

---druck

Vote

G

Gareth's Downstairs Computer 8 years ago

Not wishing to be rude, but may I suggest that you read this thread from the top, for the discussion, in my terms at least, is for the source to be the compiled code!

I repeat my exhortation as above.

Exhortation repeated, the discussion revolves around the 64bit ARMv8

Nevertheless, thankyou for your contribution

Vote

R

Richard Kettlewell 8 years ago

It?s a common strategy. Other examples are CLR implementations and QEMU. Modern x86 CPUs do something conceptually very similar, caching micro-ops rather than (only) architectural instructions.

https://www.greenend.org.uk/rjk/

Vote

M

Michael J. Mahon 8 years ago

Loved JOSS and its only error message: EH?

-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com

Vote

G

Gordon Henderson 8 years ago

I'm not convinced this would be possible and I think a lot will depend on the underlying architecture (although you've "fixed" that for now, so it may be less of an issue).

One of many issues regarding speed, might be the efficiency of the current compilers for e.g. C which can perform instruction re-ordering, register re-use and so on - de-compiling that into something that resembled the original source code may well be impossible - that is, if you're trying to achieve the speed and efficiency of a modern compiler. If you take a more linear/simplified approach then you might get something that can work.

FWIW: My experience of BASICs over the past 40 years is that most of the interactive ones (but not all) tokenise the input, line at a time, as you enter it. So the LIST command essentially "de-compiles" the tokenised code. The tokenised code is effectively pseudo-code for some "perfect" BASIC CPU (or not, in some cases).

Having written my own BASIC interpreter in recent years, I found that this way was not always perfect - you lose things like indentation and your own personal style of capitalisation (PRINT, Print, ?, GoTo and so on) and you only need to look at something like AppleSoft BASIC for another example where it tries to line statements up on the left, but when you miss out the LET command it looks weird... You type in:

100 REM Test 110 A = 5 120 FOR I = 1 to A

and you get

LIST 100 REM Test 110 A = 5 120 FOR I = 1 to A

and so on.

In my BASIC, I thought I'd be clever and store the binary form of numbers to speed-up run-time execution. That worked well until:

100 a = 3.3

LIST

100 a = 3.300000001

(contrived example, but I'm sure most folks here know instantly what's happening)

Then there's expression evaluation - In a fit of CompSci, I used Dijkstra's Shunting Yard to turn expressions into an RPN stack when could then be executed at run-time, so

100 a = b + c * d

becomes

c d * b + => a

how to de-compile that into the original? It's harder as the ordering has now been changed.

but a lot will depend how you choose to evaluate expressions as to how de-compilable they become.

I have looked at compiling my BASIC to native code - and there is more or less a 1:1 relationship with a line of BASIC and a Line/Block of ASM

- it's no-where near efficient as C though (and 'compiling' BASIC into C is just as easy and may actually result in faster code, but that's a job for another day)

Another thing you might want to look at is the Gigatron computer. It's a kit computer project, but part of it features code for a virtual CPU which runs on-top of the (very) RISC 8-bit computer. The syntax is somewhat eseoteric though, but I suspect it could be de-compilable.

This is a horizontal/or vertical line drawing function in their GCL (Gigatron Control Language)

[def {- DrawLine -- Draw line

-} Count i= // Take value of Count, store in i [do if>0 // While Acc > 0 Color Pos. // Fetch Colour, store @ Pos: poke (pos, colour) Pos Step+ Pos= // Add Step to pos, store in pos. i 1- i= // i = i - 1 loop] ret ] DrawLine=

"Acc" is the virtual machines accumulator - which is effectively the last variabled used - it's sort of like a 1-deep Forth type stack.

More at

formatting link

I've no commercial interest in this other than just having bought and built one.

Sounds like an interesting project to embark on though. I've written many scripting type lanuages over the years and a full-on BASIC, not sure I want to do another!

Cheers,

-Gordon

Vote

J

jack4747 8 years ago

ritto:

It's not that C is inherently efficient, it's the compiler that optimize th e asm. If you compile a C source code with optimization disabled, you'll get a (al most) 1:1 relationship line of C : line of asm.

Bye Jack

Vote

T

The Natural Philosopher 8 years ago

I think that is only half of it: The other half is that C IS assembler, written in shorthand. I cannot offhand think of any native c operation that is not an atomic assembler operation.

Ok sometimes on some processors operation on the contents of memory are two opcodes.

The difference with for example C++ is astonishing.

?it should be clear by now to everyone that activist environmentalism (or environmental activism) is becoming a general ideology about humans, about their freedom, about the relationship between the individual and the state, and about the manipulation of people under the guise of a 'noble' idea. It is not an honest pursuit of 'sustainable development,' a matter of elementary environmental protection, or a search for rational mechanisms designed to achieve a healthy environment. Yet things do occur that make you shake your head and remind yourself that you live neither in Joseph Stalin?s Communist era, nor in the Orwellian utopia of 1984.? Vaclav Klaus

Vote

R

Richard Kettlewell 8 years ago

! compiles to two instructions on ARM and three on x86, in isolation. Signed division is a subroutine call on ARM and two instructions on x86. Initialization and assignment of large objects may involve many instructions or a subroutine call.

The prevalence of undefined behavior in C also punches a rather large hole in any assumption of a direct mapping into assembler.

https://www.greenend.org.uk/rjk/

Vote

Apologies where they are due

Join the Discussion

Didn't find your answer?