Writing a simple assembler

- A
- Alex
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 5:34 PM

First, I would like to thank everyone for a response and advice.

Second - the purpose was to write a simple assembler in order to generate an op code on a PC and then run it on my IC (fpga is used as a host controller). I understand that the task is trivial for gurus, but being a novice in this thing first thing that came to my mind was simply to make two passes: first - preprocessor, detects all the variables etc., second actual translation - recognising mnemonics and generate an opcode . Obviously it is not a "proper" way to do it (grammar descriptions and so on..). That's why I was asking about some examples and articles about this issue.

--
Alex

- R
- randyhyde
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 5:48 PM

Speaking from exprience ... :-)

First, choosing a good algorithm up front is not "premature optimization." It's simply good design.

Second, despite the best intentions (encapsulation and other good software engineering methods), it's often not so simple to just replace one symbol table search routine with another.

Third, "a simple assembler" today may be a complex assembler tomorrow. Better to do it right the first time around, especially as using a hash table lookup isn't a whole lot more complex than a linear search.

I regret the day I said to myself "heck, this is just a prototype, I'll use a simple linear search right now and fix it in the final version."

82 versions later and over 100,000 lines of code, I can attest that this was the second worst design decision I made for my "prototype assembler" (the #1 bad design decision was using Flex and Bison). Cheers, Randy Hyde

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 6:19 PM

In all fairness, a "symbol table search routine" that fails to be easily replacable by another, should immediately be reported to the nearest Committee on Abusive Nomenclature and Fraudulent Assumption of Titles. If it can't be replaced, it's clearly not a search routine, but a hack.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- R
- randyhyde
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 6:33 PM

I couldn't agree more. But more often than not, guess what happens. Better to plan ahead, and be realistic. Cheers, Randy Hyde

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 7:06 PM

Then you did it wrong. The API for a symbol lookup should be dead simple.

--
Grant Edwards                   grante             Yow!  I feel like I'm
                                  at               in a Toilet Bowl with a
                               visi.com            thumbtack in my forehead!!

- S
- samIam
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 7:43 PM

Randy, I tip my hat to you sir. It seems many here have forgotten the tenet of Software Engineering.

I said it before and Ill say it now .. it SHOWS in the sort of software currently developed.

Another thing that irks me is the "its done in this library so just use it" argument. No one stops to question WHETHER that library is doing it right/efficiently.

Anyway I am getting off the point ...

- I
- Isaac Bosompem
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 8:05 PM

Well the hash lookups were not for premature optimization. I first decided on doing simple string parsing but I quickly realized it would be a nightmare to code. I am simply looking for a solution that is easy to code.

That choice might have been overambitious, but I do feel I need to get a grasp of OOP programming.

I must ask though why you guys are so adamant about using TCL or Python?! I have not started to code it yet, I am still in the planning phase.

-------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.

- T
- toby
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 9:03 PM

It can be argued that since you are in learning mode there is no 'wrong' way to go about it. Go forth and build your prototype; doing so is a great way to learn about languages and tools that might be related or make the task easier. :)

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 9:33 PM

Because the sort of stuff you're planning wouldn't need planning in a high level language. Things like symbol lookups are just basic built-in operations.

--
Grant Edwards
grante@visi.com

- J
- Jim Stewart
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 9:58 PM

Total agreement. In the total time spent posting on this thread, a simple assembler could have been written and debugged :)

Myself, I'd just define some macros for MASM, the old Microsoft DOS assembler.

- M
- msg
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 9:58 PM

Hi,

I am interested in seeing your BASIC assembler; if it is on paper tape, I can read it (and return the tape with conversion of your choice). I imagine this predates RMB (Rocky Mountain Basic)?

Regards,

Michael Grigoni Cybertheque Museum

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 10:07 PM

A two pass assembler is definitely a practical way of doing a cross-assembler. The first pass must generate the correct amount of code for each instruction, in order to "detect" the locations of all branch target labels in the program. In the second pass do the same thing again and using the label addresses stored in the symbol table, generate the correct code (especially the branch/jump instructions).

When running the assembler on a system with a huge memory (such as PC) just generate the code into memory with any forward jump/branch set to

0 and saving this location into a fix-up record. When the assembly is ready to calculate the address of the label, you have to make a fix-up patch at all those locations that referenced that label. Finally, write the patched code into a file or target.

I would consider implementing a two pass cross assembler to be the simplest thing, just use brute force :-).

If you are comfortable with a single pass assembler with fix-ups, you could write as well a linker or a linking loader.

Paul

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 10:09 PM

... snip ...

If you are talking about my recommendation of hashlib, you are perfectly free to comment on the correctness and efficiency of that library. I claim it is both, although it may give up some slight efficiencies for ease of use. It is out there in source form, totally exposed to the winds of criticism.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: 
Also see

- T
- toby
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 11:56 PM

Hi Jim,

Now that is a technique I have heard of before! A gentleman Tom Evans clued me into this, and I hope he will allow me the liberty of quoting:

There's two ways to "write an assembler in Macro 11".

The IMP/PACE one I mentioned was "the classic version". It had the parser, symbol table management and everything, all lovingly coded in individual machine instructions. Lots of them. How redundant...

The SC/MP cross assembler (and other ones I've worked on ... that generated microcode) consisted of MACROS.

This is cheating big-time. The Macro-11 assembler is being abused to assemble and emit code for a different CPU, or sometimes not even for a CPU but for a ROM Sequencer or worse. The macros have the same names as the target CPU's op-codes and they simply generate the appropriate code, (ab)using the symbol table management built into Macro-11.

As a huge benefit you can also use all of the powerful macro facilities in Macro-11. Try emulating all of that in lex/yacc.

Of course if the targeted CPU uses opcodes with the same names as the ones the PDP-11 uses there's a bit of strife, ...

Macro-11 isn't exactly fast when abused like this. It took about 5 minutes to make [a] 1023-byte ROM ...

Kids these days have it easy! Lex! Yacc! Puts me in mind of:

formatting link

"SECOND YORKSHIREMAN: Luxury. We used to have to get out of the lake at six o'clock in the morning, clean the lake, eat a handful of 'ot gravel, work twenty hour day at mill for tuppence a month, come home, and Dad would thrash us to sleep with a broken bottle, if we were lucky!"

--Toby

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 9, 2006 12:18 AM

Macro-11 is pretty good and I've read exactly these cases, back in the

1970's when I was using Macro-11. I never wrote Macro-11 code for a cross-assembler, though. Just heard about it.

I've actually used MASM/ML from Microsoft for such things, though. From my vague recollection of Macro-11 macros, MASM/ML's macro facilities aren't nearly as general and can be confusing to figure out, at times. But the linker will actually punch out a .COM file, which is a clean, exact, binary image. MASM/ML will allow you to place things in separate segments so that you can, on the fly, place things into nicely organized sections which will later be fused together as you see fit. (You can generate a .EXE, but you will need another tool to 'fix' it up.)

Jon

- R
- Roger Ivie
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 9, 2006 1:56 AM

I haven't done this with either MACRO-11 or MASM, but I have done it a few times with MACRO-32.

First time was for FQAM, the QBus adapter for the VAXstation 3520/3540. It was built around a simple state machine implemented with registered EPROMs.

Currently, I'm using a set of MACRO-32 macros that let me migrate microcode assembly for a 2910/29116 based system off an old META29R setup that required a VAX onto an Alpha using MACRO-32. I've managed to retain all the original mnemonics used in the META29R code and quite a bit of the syntax. This let me do automated source code conversion between the two assemblers, which was nice.

The result is a .PSECT containing an initialized array that I link against a FORTRAN program to extract the binary in a variety of formats.

Neither is MACRO-32. It took a MicroVAX 2000 half an hour to assemble the microcode for the FQAM.

--
roger ivie
rivie@ridgenet.net

- R
- randyhyde
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 9, 2006 2:01 AM

Having a good compile-time language (macro processor plus other goodies) in an assembler is useful for creating all kinds of different languages, not just assemblers. Interested individuals might want to take a look at my chapter on "Domain Specific Languages" in "The Art of Assembly Language" where it discusses how to use HLA's macros and compile-time language to create "mini-languages". Certainly, an assembler would be fairly trivial to write. Indeed, I'm using this technique to create a small assembler for a virtual machine I've created to help with some object code obfuscation. Cheers, Randy Hyde

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 9, 2006 3:24 AM

I was hoping you'd pop in on this.

Jon

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 9, 2006 4:08 AM

Is your HLA v2.0 stable enough yet, that it can be used as a macro assembler ? - I see it had recent update.

-jg

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 9, 2006 8:07 AM

That would be why people are recommending languages like Python - so that such basic blocks like string parsing and hash tables are as easy as possible.

OOP would be a good idea for such a task. May I recommend a good OOP language, such as ... Python ? There are actually a fair number of good OOP programming languages that could be used for such a task - C++ isn't really one of them (multiple inheritance, friends, and templates are some of the "features" of C++ OOP that quickly lead to horrible ugly, unreadable and unmaintainable code).

I don't think anyone has recommended TCL, nor are they likely to. TCL has its place, and can be a useful language - but not for a task such as this. I'm a Python fan, and I have no doubts in recommending Python as a suitable language for writing an assembler. I'm not a Perl expert, but I know enough about it to know it is also up to the job, but I don't think it would be as good a fit. There are plenty of other options that may or may not be good fits - I don't know them well enough (such as Java, Smalltalk, OCAML, Ruby). And I know several languages that could be used, such as C, C++ and Pascal, but which force you to start much nearer the bottom instead of providing the basic building blocks.

Choosing your language is a much more important step than the choosing the implementation for a particular data structure, and is definitely part of the planning phase (do you want to choose languages *after* starting coding?).