Benchmark: STL's list vs. hand-coded one

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Nov 22, 2007 9:20 PM

While this is true, there's a point of having dynamically linked libraries. The problem is that the dominating OS on desktop does have very limited mechanism of versioning of libraries (and packages in general, MSI is an attempt to better direction, but it still doesn't handle depedencies, versioning etc.).

Few years ago on the NT4 age, it was quite common, that after installing application Y - which overwrote some shared libraries in the system directory - disabled application X because of a version conflict.

The solution is that the program installer will install the shared libraries to the application's directory. But what is the point of having shared libraries after this..

(once again, this is out of scope of this ng)

Right. At least I have been thought (though I had previous work experince) to write custom memory allocation for a system that simply cannot crash because memory allocation failed.

For instance, many people think that garbage collection in Java must be very inefficient; actually it is quite the opposite, the garbage collector has information about memory usage, fragmentation and so on. It is likely, that it is able to do better job than the average programmer using manual memory allocation/deallocation.

I have a four years old computer, and I'm happy with it with Ubuntu 7.10. I bought it immediately with 1 gigabytes of memory, since it was quite cheap at the time. XP is definetly heavier with the same amount of concurrent applications than Ubuntu. Vista I wouldn't mention in the same sentence..

Actually, there are application that really require huge amounts of memory: image processing, CAD, etc. It is not just the M$ "bloat" (not saying that it wouldn't exist..).

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- N
- nospam
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Nov 22, 2007 9:26 PM

And having 64MB buffer allocated you will need to write your own new(), free(), or malloc() implementations to manage memory in the buffer and do you *know* they are going to be more efficient than the library and OS code you are avoiding?

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Nov 22, 2007 9:45 PM

Since this discussion started about a lexer, at least I find it quite unlikely that a custom memory allocator can be any faster than a generic one; a lexer simply needs a generic allocator (unless all the tokens of the language are, let's say fixed to sizeof(char).. :)).

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Nov 22, 2007 10:49 PM

Custom allocators are typically inlined and only need to increment an index or a pointer that is already in a register (think *token_ptr++ = token) and often don't need to worry about deallocation or running out of memory etc. That's a lot less work than a standard allocator is required to do.

Wilco

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 5:17 AM

Hmmm, can you tell me how

myptr = malloc(UNITSIZE1);

can be more efficient than

myptr1 = &bigbuf(Endptr); Endptr += UNITSIZE1;

Equally efficient, I could believe if you can show me a zero-overhead function call. More efficient is hard to believe without an example.

Mark Borgerson

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 5:23 AM

Hey, memory is cheap. Simple fix each element to a defined maximum size, say 100 bytes. Then allocate enough memory for an array of

64,000 of those elements. (64,000 * 100 bytes ~= 64MBytes.). How much simpler than

char *nextptr = &bigbuf[0];

elemptr = &nextptr; nextptr+= 100;

can your hypothetical generic memory allocator get??? The above two lines of C should compile down to about two instructions on a 32-bit processor. You can't get much more efficient than that.

Mark Borgerson

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 5:25 AM

often

Thanks. My point exactly. A custom allocator like this already has access to a large block of memory and is allowed to do what it likes with that memory. Compared to maintaining the memory yourself, it still has function call overhead---which inlining will eliminate if your compiler allows.

Mark Borgerson

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 6:39 AM

often

While this is true also, I wouldn't consider the lexer being the problem in a modern compiler in the case of either A) memory allocation or B) CPU processing. All the other stages of the pipe are likely to consume a larger part of resources.

But this is once again out of scope.. :)

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 6:42 AM

Even in lexer, memory allocation is not likely to be the bottleneck. Besides, tokens are often pulled _on demand_ by the parser, which means that you can have one static buffer of MAX_TOKEN_SIZE (ie. the whole source is not tokenized at once, and then an array/list of tokens would be given to the parser - it is the other way around).

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 11:58 AM

often

I guess you've never written a fast lexer then... Lexing is quite a performance critical part of a compiler as it looks at every character and compresses them into tokens. It also does symbol lookup, another critical part of a compiler. It's not unusual for the lexer to take 10-15% of the total compilation time when maximum optimization is enabled. Even skipping spaces and comments takes a lot of time.

Of course I'm talking about advanced professional compilers here which can easily lex and preprocess over 100MBytes/s of source code.

Wilco

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 12:12 PM

What I have done or not done, does have very little relevance in this discussion..

Classically, the only task of a lexer is to provide token to the parser, when the parser "pulls" them. For this, you need only one static buffer, sizeof the maximum length of one token. No dynamic allocation needed.

While you can build a a symbol table while lexing (at least lex/flex allows you to do this), I find building the symbol table belonging to a later stadge of the compiler pipe.

Do you consider 10-15% being a critical part of used CPU time in a compiler?

lexer compilers have been doing "fast enough" lexers, at least for me, being usually table-driven. Sounds like you have hand-crafted your lexers?

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- N
- nospam
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 4:09 PM

Hmm, if you bother to check the pointer from malloc or use new() and throw an exception when you meet a big enough program instead of producing garbage or crashing would that be more efficient?

Yes you can write a custom memory management system which trades memory for speed efficiency or provides a more efficient subset of functionality or is more efficient for the type of memory management you most often use but it isn't trivial and you are competing against well tested and optimised library code. It certainly isn't as trivial as allocating a 64MB buffer at the start of the program.

- E
- Everett M. Greene
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 4:20 PM

But not everyone runs out and buys a new computer every year or so just so it can run the idle task faster.

Quite true. But if its performance is adequate, why replace it?

I've seen humorous writings suggesting running a 6502-hosted compiler on a PC with a 6502 instruction simulator so as to slow things to the pace we had in the "good old days".

None, probably. Which is my only use of PCs...

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 5:55 PM

In the ten years I've been self-employed, I've bought 5 desktop PCs and 2 laptops. I'm still using both laptops and 3 of the desktops. I think each computer purchased has been cheaper and included as much or more RAM than it's predecessor.

I suppose that depends on whether you are doing the same things with your computers that you were 10 years ago. I suppose I could still run some version MS Word on a 10-year old system, but I don't think MATLAB would run, nor would I be able to debug new embedded systems with my IAR EWARM system with it's USB dongle and USB-attached debugger.

I do remember multipass assemblers for the 6502 or 6800 that would allow me time to go to the fridge to refill my Pepsi glass while the program thrashed away at the floppy disk drive.

I was a pretty dedicated Mac user myself until the early 90's. I then had to switch to the PC primarily for the cross compilers that weren't available for the MAC.

Mark Borgerson

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 6:29 PM

"Big enough program"??? Have you ever seen a program that would need more than one million symbol table entries of 64 bytes each? If that isn't enough, spend another $100 on RAM and increase the buffer to 640MBytes.

In any case, there's no reason the compiler should crash or produce garbage with your hypothetical large program. The program can certainly keep track of the number of symbol table entries used and throw an exception as easily as you could with malloc().

Actually, it IS that trivial for some applications. The most recent time that I used this RAM resource was when I wrote a strip-chart display program. I wanted to keep a significant trace history available to scroll through, so I simply allocated space for

8 arrays of 86400 floating point values. That allowed me room for one day's worth of data for 8 traces at 1Hz. I suppose that doing the same thing on a 16-bit OS would have involved either much smaller data retention or the use of history files.

I suppose one of the nicest things about large static arrays is that you never need to worry about uninitialized pointers or memory leaks when lots of 'new()' statements aren't matched with the same number of 'free()' statements.

It's my, perhaps biased, opinion that new(), malloc(), and free() on PCs are sad remnants of a time when PC memory was about as limited as that on a mid-size ARM chip. Today, you might as well simply specify that the PC app reserve 8MB for its stack and keep most of your temporary variables as automatic variables on the stack.

I admit that my reluctance to use a lot of malloc()s in my code is because I write most of my code for embedded systems where I control all the memory in the system and generally don't have to compete with an OS for the limited resources. If I use malloc() it is to allocate a few large arrays at startup. I would declare the arrays as static data, but some of the systems for which I generate code include those static arrays as part of the load image. I find that same approach works well on the PC also, although I usually don't end with a hardware reset!

Mark Borgerson

- N
- nospam
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 7:29 PM

To support your efficient memory management you limit symbol size to something significantly less that 64 characters? What about debug information? What about tags support requiring all symbol references to be stored? What about all the other memory a compiler needs? Function and data structure definitions, call trees for register tracking, optimisation structures etc, etc, etc are you going to allocate 64 (or 640)MB big buffers for all these too?

Yes you could require the compiler (and programmer writing it) to jump through hoops to make up for deficiencies in the memory management system you are suggesting,

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 7:50 PM

code

Sure---why not. I see no particular problem with 10 separate 64MByte buffers, one for each function.

If limiting symbol names to 64 bytes is a problem, just double the size for that buffer and limit the size to 128 bytes. Or you could just allocate room for 512,000 symbols of 256 bytes--similar to the decorated symbol names used in VC++.

I don't see the deficiencies being any more annoying than those inherent in malloc() and free(), but then I don't write compilers, but do write application programs for embedded systems. There, the problems with malloc() are more significant.

Mark Borgerson

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 23, 2007 8:28 PM

That's certainly possible. The reason lexers often do tokenization a line at a time is to avoid the overhead of calling it once per token. A lot of things are pushed into the lexer to avoid the overhead of doing things multiple times.

Yes. This was after a lot of optimization and fine tuning after profiling.

All compilers I know use handwritten lexers and parsers. A generator is way too slow. Lexing as it is easier to do by hand, and C++ is way too complex to be handled by parser generators.

Wilco