About 30 years ago i wrote a (cross) assembler for the Intel 8080 processor (8 bit) in Basic (from Hewlett Packard). I believe i have somewhere still the listing of this program. If you are very interested i will look what i still have and try to scan it (i do not have it in electronic format, maybe as punch paper tape (:-)
As far as i remember it is a two pass assembler with simple macro's.
Why such overkill? I held back this long, but now there's no longer any way of avoiding it. It has to be mentioned now, to put an end to this discussion
Henry Spencer's "Amazing Awk Assembler"
And yes, you can still find that beast out there, if you want to.
--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
HI Alex You might look at the example of an assembler for the 8080, written by John Cassidy, in the language Forth. Although, it is only a single pass assembler, it includes control structures. The part that really makes it worth while is that the entire source code fits easily on a single
8.5X11 sheet of paper. This makes it easy to understand and easy to modify. With a little more modifications, it can still be a single pass assembler with forward referencing. Still, it is possible to handle all of ones referencing problems in a simple fast single pass assembler. The other thing to remember is that the order you assemble into the address space does not need to be sequencial. The simple control structures like 'if-then-else' help to make your assembly source code more readable and reduce the number of simple entry point labels that tend to clutter most assembly source. Last is that you have the Forth language in the background to create macro functions to any complexity that you like. Dwight
Not necessarily. For example, my old Athlon XP 1800 does a strcmp() between two matching 4-char strings in 17 nanoseconds. On average, a 4 char opcode can be found in 0.5 microseconds by linear search in array of 60 entries.
In order to compensate for 10 minutes of time spent implementing a better search, you'll need to run 1.2 billion lines of code through the assembler.
Chaining is a well studied and simple idea, see, for instance Knuth.
There is an informed discussion of symbol table implementations at
formatting link
Hash tables are more efficient. See Anton Ertl's post:
Resizing is an O(n) operation (where n is the no. of entries in the hash table at the time of resizing), but you resize only once every O(n) elements, so the amortized cost of resizing per insertion is O(n/n)=O(1); of course, there is no resizing cost when you are only doing lookups.
WOW, your package will again make my life a lot easier. Thanks! I will definitely make use of it!
Also I am writing for the PC, I was not aware that the OP was writing for the CPU itself (well that is what I have gotten from the replies posted).
Also guys, I didnt know of any other methods to implement it, I am definitely going to stay away from parsing the text directly because it would be fairly difficult to code.
What is a red - green tree? I know of a binary tree used in Huffman Compression and the like, is this data structure similar?
-------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.
This attitude is why there isnt much thought given to softare development nowadays and its clearly reflected in the output and the drive for faster processors and huge amounts of memory/resources
Biggest assumption you are making here is that THERE IS SPACE TO ALLOCATE/RESIZE.
Your assembler would not run on the target processor with that mentality. It would exist only as a cross development tool on a system with the resources to house it.
Software should be designed much better than that. But hey, what do I know? Nothing apparently!
I decided to use a hash table for step 2 to make life easy. I originally thought of using a lookup table based on the first letter within the string but I found it to be too complicated.
With a hash value, I can read the asm source, grab a token, run a hash on it and look it up. It was done for simplicity, I didn't initially believe there would be any speed improvements because you will have to do a lookup of the hash value in the table to check if it is a valid symbol and decide from there.
I don't know any scripting languages. I guess that is a weakness (and I am also not that great with analog electronics as well) I do know VB though haha. I dont know Python or TCL for that matter. I will pick them up eventually, but the way I learn best is if I use the tool to make something useful with it.
If you do decide to write the assembler in Python, please let me know.
-Isaac
-------------------------------------------------------------------------------------------------------------- I am an EE student looking for summer employment in Toronto, Canada area If you have any openings please contact me at isaacb[AT]rogers[DOT]com.
On the contrary. I'm all for optimization, but only if it makes sense. If, for any reasonable input size, the assembler takes no noticable time to process the entire source, what point is there in further improvements ? Instead, I prefer to optimize scarcer resources, such as my own time. If making the assembler is just a step towards the goal of having a binary program ready for a product, and not a goal in itself, then spending more time on the assembler than you will ever gain back by the optimizations simply makes no sense.
Gigahertz processors, and gigabyte memories are here. It would be waste of resources not to use them whenever it saves us time.
In the 1970's I wrote several cross-assemblers in Fortran running on PDP-11s (which have 64 KiB address space) for various processors such as 8080 and 1802. I never bothered to use anything more fancier than linear search for opcode or symbol table search, since the total time was dominated by the program load time, the source file read time (two passes) and the binary output file writing (or even punching to paper tape for downloading to the target :-).
It would be very hard to write so huge modules for any small target processor that would require such a huge number of labels, that the inefficient symbol table search time would have been of any significance relative to the I/O times.
It's for this sort of reason that such programs should be written in languages like Perl or Python (or even PHP). Just as C++ is a poor choice of language for an 8051, so C/C++ is a poor choice of language for an assembler running on a PC.
Both Perl and Python will give you much simpler regular expression engines than any C library, vastly easier (and more flexible, and probably faster) hash dictionaries, and a host of ready-to-use libraries.
That's what I said, but apparently the assembler is running on an 8-bit target with a 64K address space. So Python is out of the question. But C++ and generic libraries are going to fit?
Yup.
--
Grant Edwards grante Yow! Darling, my ELBOW
at is FLYING over FRANKFURT,
visi.com Germany...
No one is writing an assembler to run *on* the target. Someone jumped into this thread with a discussion about running on an 8-bit target (as far as I've noticed, the OP has not mentioned the type of target), and how we should program efficiently in assembler rather than using high level languages and hash tables. As far as I can figure out, he is assumed that the thread was about writing in assembly language rather than writing an assembler, and his confusion has spread.
Yes, or Perl, and perhaps using a lexer/parser library. Even C++ with a decent parser generator.
The fact that a thread headed "Writing a simple assembler" is dominated by the topic of hashing indicates that something is seriously off the rails somewhere.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.