Writing a simple assembler

- B
- Bu
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 5:32 PM

Hi,

About 30 years ago i wrote a (cross) assembler for the Intel 8080 processor (8 bit) in Basic (from Hewlett Packard). I believe i have somewhere still the listing of this program. If you are very interested i will look what i still have and try to scan it (i do not have it in electronic format, maybe as punch paper tape (:-)

As far as i remember it is a two pass assembler with simple macro's.

Bu.

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 6:11 PM

Why such overkill? I held back this long, but now there's no longer any way of avoiding it. It has to be mentioned now, to put an end to this discussion

Henry Spencer's "Amazing Awk Assembler"

And yes, you can still find that beast out there, if you want to.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- D
- dkelvey
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 6:20 PM

HI Alex You might look at the example of an assembler for the 8080, written by John Cassidy, in the language Forth. Although, it is only a single pass assembler, it includes control structures. The part that really makes it worth while is that the entire source code fits easily on a single

8.5X11 sheet of paper. This makes it easy to understand and easy to modify. With a little more modifications, it can still be a single pass assembler with forward referencing. Still, it is possible to handle all of ones referencing problems in a simple fast single pass assembler. The other thing to remember is that the order you assemble into the address space does not need to be sequencial. The simple control structures like 'if-then-else' help to make your assembly source code more readable and reduce the number of simple entry point labels that tend to clutter most assembly source. Last is that you have the Forth language in the background to create macro functions to any complexity that you like. Dwight

- A
- Artenz
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 6:21 PM

Not necessarily. For example, my old Athlon XP 1800 does a strcmp() between two matching 4-char strings in 17 nanoseconds. On average, a 4 char opcode can be found in 0.5 microseconds by linear search in array of 60 entries.

In order to compensate for 10 minutes of time spent implementing a better search, you'll need to run 1.2 billion lines of code through the assembler.

- T
- toby
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 6:29 PM

Chaining is a well studied and simple idea, see, for instance Knuth.

There is an informed discussion of symbol table implementations at

formatting link

Hash tables are more efficient. See Anton Ertl's post:

Resizing is an O(n) operation (where n is the no. of entries in the hash table at the time of resizing), but you resize only once every O(n) elements, so the amortized cost of resizing per insertion is O(n/n)=O(1); of course, there is no resizing cost when you are only doing lookups.

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 8:24 PM

Who cares? It's an _assembler_. It'll be run at most a few dozen times a day. It's not the kernel's scheduler.

Ah, the assembler is running on an 8-bit processor. I guess I missed that. I assumed the assembler would be running on a Linux or Windows host.

--
Grant Edwards                   grante             Yow!  Kids, don't gross me
                                  at               off... "Adventures with
                               visi.com            MENTAL HYGIENE" can be
                                                   carried too FAR!

- I
- Isaac Bosompem
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 9:07 PM

Hi CBFalconer,

WOW, your package will again make my life a lot easier. Thanks! I will definitely make use of it!

Also I am writing for the PC, I was not aware that the OP was writing for the CPU itself (well that is what I have gotten from the replies posted).

Also guys, I didnt know of any other methods to implement it, I am definitely going to stay away from parsing the text directly because it would be fairly difficult to code.

What is a red - green tree? I know of a binary tree used in Huffman Compression and the like, is this data structure similar?

-------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.

- S
- samIam
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 9:07 PM

This attitude is why there isnt much thought given to softare development nowadays and its clearly reflected in the output and the drive for faster processors and huge amounts of memory/resources

- S
- samIam
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 9:11 PM

Biggest assumption you are making here is that THERE IS SPACE TO ALLOCATE/RESIZE.

Your assembler would not run on the target processor with that mentality. It would exist only as a cross development tool on a system with the resources to house it.

Software should be designed much better than that. But hey, what do I know? Nothing apparently!

- I
- Isaac Bosompem
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 9:18 PM

--------------------------------------------------------------------------------------------------------------

Hi Alex,

I decided to use a hash table for step 2 to make life easy. I originally thought of using a lookup table based on the first letter within the string but I found it to be too complicated.

With a hash value, I can read the asm source, grab a token, run a hash on it and look it up. It was done for simplicity, I didn't initially believe there would be any speed improvements because you will have to do a lookup of the hash value in the table to check if it is a valid symbol and decide from there.

I don't know any scripting languages. I guess that is a weakness (and I am also not that great with analog electronics as well) I do know VB though haha. I dont know Python or TCL for that matter. I will pick them up eventually, but the way I learn best is if I use the tool to make something useful with it.

If you do decide to write the assembler in Python, please let me know.

-Isaac

-------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.

- A
- Artenz
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 7, 2006 11:02 PM

On the contrary. I'm all for optimization, but only if it makes sense. If, for any reasonable input size, the assembler takes no noticable time to process the entire source, what point is there in further improvements ? Instead, I prefer to optimize scarcer resources, such as my own time. If making the assembler is just a step towards the goal of having a binary program ready for a product, and not a goal in itself, then spending more time on the assembler than you will ever gain back by the optimizations simply makes no sense.

Gigahertz processors, and gigabyte memories are here. It would be waste of resources not to use them whenever it saves us time.

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 6:55 AM

In the 1970's I wrote several cross-assemblers in Fortran running on PDP-11s (which have 64 KiB address space) for various processors such as 8080 and 1802. I never bothered to use anything more fancier than linear search for opcode or symbol table search, since the total time was dominated by the program load time, the source file read time (two passes) and the binary output file writing (or even punching to paper tape for downloading to the target :-).

It would be very hard to write so huge modules for any small target processor that would require such a huge number of labels, that the inefficient symbol table search time would have been of any significance relative to the I/O times.

Paul

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 8:47 AM

It's for this sort of reason that such programs should be written in languages like Perl or Python (or even PHP). Just as C++ is a poor choice of language for an 8051, so C/C++ is a poor choice of language for an assembler running on a PC.

Both Perl and Python will give you much simpler regular expression engines than any C library, vastly easier (and more flexible, and probably faster) hash dictionaries, and a host of ready-to-use libraries.

- 4
- 42Bastian Schick
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 11:28 AM

The OP wrote "for an 8 bit" not "on an 8 bit" CPU.

--
42Bastian
Do not email to bastian42@yahoo.com, it's a spam-only account :-)
Use @monlynx.de instead !

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 2:44 PM

My point exactly. Worrying about hash tables for symbol lookup reeks of premature optimization for "a simple assembler".

I'm also surprised that somebody thinks they're going to use C++ and generic hashing libraries on an 8-bit target with a 64K address space.

--
Grant Edwards                   grante             Yow!  .. someone in DAYTON,
                                  at               Ohio is selling USED
                               visi.com            CARPETS to a SERBO-CROATIAN

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 2:46 PM

That's what I said, but apparently the assembler is running on an 8-bit target with a 64K address space. So Python is out of the question. But C++ and generic libraries are going to fit?

Yup.

--
Grant Edwards                   grante             Yow!  Darling, my ELBOW
                                  at               is FLYING over FRANKFURT,
                               visi.com            Germany...

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 2:48 PM

Then why does the target's 64K byte address space preclude the use of a high-level programming language?

--
Grant Edwards                   grante             Yow!  You must be a CUB
                                  at               SCOUT!! Have you made your
                               visi.com            MONEY-DROP today??

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 3:17 PM

No one is writing an assembler to run *on* the target. Someone jumped into this thread with a discussion about running on an 8-bit target (as far as I've noticed, the OP has not mentioned the type of target), and how we should program efficiently in assembler rather than using high level languages and hash tables. As far as I can figure out, he is assumed that the thread was about writing in assembly language rather than writing an assembler, and his confusion has spread.

- G
- Grant Edwards
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 4:39 PM

Ah.

If that's the case, then I stand by my original suggestion of using a high-level language like Python to write the assembler.

--
Grant Edwards                   grante             Yow!  Did you find a
                                  at               DIGITAL WATCH in YOUR box
                               visi.com            of VELVEETA??

- T
- toby
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 8, 2006 4:54 PM

Yes, or Perl, and perhaps using a lexer/parser library. Even C++ with a decent parser generator.

The fact that a thread headed "Writing a simple assembler" is dominated by the topic of hashing indicates that something is seriously off the rails somewhere.