OT:C/C++ Opinion Poll

Like your first claim of STL's string being 40x (or was it 50x?) slower than a C string?

I'm not knowing enough on this subject, that I would comment on you, but one thing with languages having GC, is that there is a thread doing the GC job, while the CPU (even better when there are multiple cores; then a paraller GC can be used) is idling. When using manual deallocation, it happens at the same moment (well, nitpicking, free() could put the pointer to a queue which the deallocation thread would eat while the other system is idle..). It would be interesting to see a benchmark about this subject.

Maybe the most significant point of having GC, is that it enables even incompetent programmers to program without memory leaks.. ;) (whether this is a sad thing, or a happy thing, it depends..)

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen
Loading thread data ...

It turned out to be ~30 times on modern compilers/CPUs.

You got the idea exactly. One could process the free list and merge adjacent free blocks when absolutely required or in a separate thread (various mallocs do the former, often called lazy merging). The difference with GC is that a collector first needs to scan all memory to create the free list - and that is the time consuming part.

That's another GC myth. You do have memory leaks with GC if you forget to set pointers to unused objects to null. So you do have to "free" memory by clearing it.

Any particular language or safety feature can never turn a bad programmer into a good programmer.

Wilco

Reply to
Wilco Dijkstra

... snip ...

Which can leave copies of that pointer.

I.e. programmer action is always needed. This means that the only advantage to GC is that you set the pointer to NULL, rather than passing it to free(). The call to free will normally detect that the pointer is fouled, the assignment will not.

So this leaves the timing of the actual action. With GC you don't know just when that action occurs, or how long it will take. With free it will occur when you call free. The duration depends on the malloc package, and can be O(1) (but normally is not). For an O(1) system, compiled for DJGPP (but fairly portable) see nmalloc.zip, on my download section.

Use of threads (in GC) just tries to hide the timing problems, and makes them less amenable to correction.

--
 Chuck F (cbfalconer at maineline dot net)
   
 Click to see the full signature
Reply to
CBFalconer

While this is true, it is kind of over reaction. Your statement may be very much misunderstond by the other readers.

For instance, in Java it is common to do something like this:

public static X509Certificate parseCertificateFromFile(String fileName) throws CertificateException, IOException { FileInputStream fileInputStream = null; try { fileInputStream = new FileInputStream(fileName); CerificateFactory cf = CertificateFactory.getInstance("X.509"); return (X509Certificate)cf.generateCertificate(fileInputStream); } finally { // let's do a graceful exit, and close resources // and not rely on the finalizer to do the cleanup // (while in this example it is not a real problem, // it is when people don't explicitely close database // connections for example, which are a limited resource if (fileInputStream != null) { fileInputStream.close(); } } // now fileInputStream (and cf) becomes garbage, without // explicitely setting it to null }

Agreed.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Absolutely. To be really sure you have no memory leaks you have to clear ALL pointers to a particular object. And that is a lot harder than just calling free when you know an object is dead.

Yes. And if you accidentally keep using a freed object, you often notice quickly when the memory is reused (debug libraries often clear freed memory so that you notice immediately). With GC you can access dead objects without ever noticing it - they never get collected.

I thought all but the most trivial mallocs support O(1) free. With doubly linked lists you can quickly merge adjacent free blocks. The space overhead for this can be as low 4 bytes per allocated object. The one time I made free O(log N) was to support O(log N) allocation for a real time heap (malloc has O(N) worst case).

Wilco

Reply to
Wilco Dijkstra

... snip ...

All operation in nmalloc are O(1). I developed this when I found that every system on my machines had O(N) free, which caused ridiculous delays when freeing a few thousand items. This included at least two Microsoft libraries and one GNU (well, DJGPP) one.

Not many programs are bothered by O(N) free. But when they are, it is very obvious.

--
 Chuck F (cbfalconer at maineline dot net)
   
 Click to see the full signature
Reply to
CBFalconer

It's always a good idea to clear pointers, especially if you use a conservative collector. It is less important on local variables with short lifetimes in a GC'd language as in your example, but you may later add more code and thus extend the lifetime of some objects. Of course a good compiler could mark dead pointers in it's GC tables or add the nullptr assignment by itself, but I'm not sure whether this is ever done.

Wilco

Reply to
Wilco Dijkstra

The word "only" is a bit too strong. Maybe it seems that way with Java or similar things. But the original point of GC was to first remove the memory allocation/deallocation abstraction entirely in higher level languages. There is not even any assignment to null needed to cause something to be collected, just stop referring to the object (assign something else to the variable, return from the routine that was referring to it, etc). Languages like Smalltalk and Lisp don't even have pointers, and some functional languages with GC don't even have assignment...

If programmer action is required, then either the GC is broken (ie, another attempt to use reference counting) or the language well suited for GC. If an object is unused, then a good GC will be able to collect it.

Second, GC can give a performance boost. Most good garbage collectors need to move memory and do this in a way that also happens to locality of reference. That is caching and swapping are more efficient. With generational scavangers there can be less time CPU spent doing GC over the life of a program than would be spent in explicit alloc/free. Fragmentation is eliminated as well.

The snags though are that the run time system for a good GC system requires virtual memory support, and of course a language with a high level abstraction. C++ is a low level language, so GC doesn't really fit (though some people have tried to retrofit GC which seems a bit misguided I think).

-- Darin Johnson

Reply to
Darin Johnson

... snip ...

However, as discussed in this thread, the object language was C (or possibly C++). In either case the loose bandying of pointers seems to make any GC system pretty pointless. However in a better structured language, such as Pascal or Ada, the system might well be useful for more than relieving the Poor Programmer of the dire memory control pressures. :-)

--
 Chuck F (cbfalconer at maineline dot net)
   
 Click to see the full signature
Reply to
CBFalconer

Removing references is what was meant. This is done differently in various languages. In imperative languages you'd need to assign NULL to an object reference in order to remove it, in Lisp you'd need to unlink a cell from a list etc.

A GC can only collect objects that are not referenced, not objects that are referenced but never used again. In order to collect these objects you have to explicitly remove references to them. Once we're talking about complicated datastructures with many references, you effectively have to do all collection manually.

In other words, a GC can only automatically collect a subset of the calls to free. Even replacing all calls to free with a null pointer assignment doesn't make a GC as powerful as malloc/free...

A moving generational GC is indeed a lot better than the mark&sweep style (which effectively uses malloc) but has higher runtime overhead. For example every pointer assignment has to update the state of the object, and compiler optimization is often severely constrained. GC's typically have a higher per-object overhead than malloc, and need more memory to allow for less frequent collections, so I'd find it hard to believe you get significant locality benefits.

Agreed. If you do use GC, you need to design the whole system (language, runtime, OS integration) around it, or it'll be far from optimal.

Wilco

Reply to
Wilco Dijkstra

Or nothing if the reference goes out of scope or when the object holding the reference is garbage collected.

See above. That being said, one of the disadvantages of GC is that it makes appears to make memory management deceptively simple ("you don't have to worry about it, the system will take care of it"), which is not always true. GC assures that an object will not be prematurely be released, but doesn't assure an object is released when it is no longer needed. Even with GC the programmer does need to concern him/herself with lifetime management (constructs like weak references were added to Java for a reason).

The problem with garbage collection in most programming is that it only concerns itself with memory, while there are a lot of other resources whose lifetime need to managed as well. C++ programmers who have moved to a language with GC know what I'm talking about.

significant

OTOH the process of allocating and deallocating with moving generational GC can be substantially simpler. As often with performance, sweeping statements like A is faster than B, are meaningless without a precisely defined context to which that statement applies.

Reply to
Dombo

The big advantage of GC, IMO, has nothing to do with performance, but is mostly to do with an ability to make nicer library designs. To put that backwards: the big problem with the malloc/free model is a design one: which software subsystem should "own" the allocated objects, and therefore have responsibility for cleaning them up. GC makes it significantly easier to have library functions that operate on and produce collections, which in turn makes for better and higher level designs. Yes, it's *possible* to do all of that in C and C++, but it's real work, over and above the work of getting the main algorithm right.

These days I only code in C for things that I don't have enough control of the target environment to arrange for the presence of a garbage collector. Which is to say, I code in C a *lot*, but I do as much as I can in Java, scheme, python and the ilk, too.

Cheers,

Andrew

Reply to
Andrew Reilly

Which, in a non-GC system is a recipe for dangling-pointer bugs. "You" (i.e., the subsystem that you're coding right now) might know that an object is dead, but you've forgotten that some other subsystem still has that pointer sitting in a hash-table cache...

Any object that you can access is, by definition, not dead :-) If it's dead so that you can't access it, then it'll get garbage collected, eventually.

Cheers,

Andrew

Reply to
Andrew Reilly

Well, I have to concede that my opinions have been limited to the C or C++ use of GC. I have no objection to its use in other more suitable languages.

--
 Chuck F (cbfalconer at maineline dot net)
   
 Click to see the full signature
Reply to
CBFalconer

Wilco Dijkstra wrote

This is frequently brought up in discussions about manual vs. automatic memory management, but I've never quite understood it. An object that is referenced but will never be used again needs to live in a global scope (truly global, global to a compilation unit or something similar).

Suppose there is a point in the execution of the program where you know that the value will never be used again. Now in a malloc/free approach, you would use free to deallocate the memory, and in a garbage collected system you would set the variable to null and let GC take care of the deallocation. In both cases explicit programmer action is needed for the storate to be reclaimed.

The real reason why I've never really bought this is that if you have variables in a program that will never be used again, then it implies that the program has some sort of phase structure. If you design the program to reflect this structure, then variables that will never be used again will go out of scope at phase transition, and GC will deal with them with no extra programmer action.

To me the real power of GC is that one does not need to deal with transitive recerence chains. Just drop what you don't need, and if it implies further deallocation the machine will do it for you.

--
Pertti
Reply to
Pertti Kellomäki

I've had long discussions about this with a colleague who is *very* knowledgeable about C++. What it seems to boil down is that one should be able to separate memory allocation from resource lifetime management. Trying to use GC for lifetime management using some kind of finalizers is not satisfatory because of the inherent nondeterminism of GC. On the other hand, using mechanisms such as destructors of local variables in C++ for deterministic lifetime management of resources other than memory makes it hard to retrofit GC for plain memory.

--
Pertti
Reply to
Pertti Kellomäki

Andrew Reilly wrote

My background is in Lisp and functional programming, so I always look in amazement when C++ programmers agonize over "ownership". Since arranging for proper deallocation for shared object can be quite complicated, programmers seem to resort to copying to be on the safe side. In a GC environment handling a pointer to some other piece of code is no big deal, so sharing is probably much more common under a GC regime.

One place where the problems become quite obvious is when one tries to interface pieces of code that use different approaches to memory management. I have worked with Boost::Python, which allows C++ libraries to be used from within Python, and one of the things that the interface code needs to know are the ownership issues. There is no way to deduce them from the declarations of C++ classes and members. In other words, this crucial piece of information is well hidden inside the implementation, not advertised in the interface as one would hope.

Lisp programmers have a somewhat arrogant saying that every sufficiently complex piece of software will contain a badly implemented subset of Common Lisp. There is some truth to it, though, as evidenced by the many half-way solutions to memory management like auto_ptr, shared_ptr etc. All IMHO of course.

--
Pertti
Reply to
Pertti Kellomäki

There are also big problems when working with complex data structures where portions are shared between structures. You can do a lot with reference counters but it quickly ends up a long-winded PITA. Then you have the problem of "is this substructure shared or is it mutable?" which can get difficult to diagnose. With GC if you want to reference a substructure you simply link to it with a pointer. If you want to alter it you can simply copy it without the risk of the original being left orphaned. The garbage collector sorts it all out for you. Often the ultimate performance is not an issue

- you want something that works coded in finite time and is maintainable.

If I don't need low-level I/O or system facilities then I tend to do a lot of work in ML. I find it a lot more productive than C, where it seems that you spend more time worrying over memory than anything else. Even if you need to recode in C or another language for performance or portability, having a rapidly developed working prototype is always useful, and I rarely find recoding is necessary in practice. Of course, when you need things like ioctls or whatever, while they are available, using ML with them becomes more of a hindrance than a benefit.

--
Andrew Smallshaw
andrews@sdf.lonestar.org
Reply to
Andrew Smallshaw

I think this is the important thing. In procedural languages like C/C++ where GC is grafted on after the fact, you need to deal with explicitly removing references. In languages with GC designed in from day one, it's almost never necessary. It happens with scoping, or when the variable is reassigned.

Consider it this way. When writing in assembly language, the programmers often have to deal with explicit allocation and deallocation of a stack. When writing in C/C++ it is extremely rare for the programmer to need any explicit operations regarding the stack. The stack is "freed" when the variables go out of scope. That's the same sort of feel that GC gives in a language where it's built in and natural; the programmer rarely needs to consider it.

True. The same is true perhaps for automatic stack management. The novice programmers don't think about overflowing stacks or putting huge buffers there, whereas the experienced programmers know there's important stuff happening underneath.

Weak references solve a problem that are a level up from the normal "remember to free objects" problem. They help deal with what in C++ might cause dangling references. Both memory problems in C++, but in the first case it's about not freeing an object that should be freed, and the second is about freeing an object that should not be freed (yet).

There are lots of uses, but many of them are to make some of these lifetime management "clean up" issues as easy as garbage collection issues. Ie, a lot of stuff that might go into a C++ destructor can be done away with by using weak references (in a cleaner way than finalization functions).

True. GC removes a low level set of management details that the programmer shouldn't have to worry about. But there are still a lot of high level issues still left behind. The programmers soon learns that it's not a magic bullet.

True. I don't know if research comparing generational GC with explicit alloc/free. But there is old research showing improved locality and performance of generational GC versus mark and sweep. I think GC got a bad rep in the past because a lot of people were exposed to the "go get some coffee while waiting" style of GC in college and built up a notion that GC in general is slow and cumbersome.

-- Darin Johnson

Reply to
Darin Johnson

A big snag for traditionally compiled C/C++ with GC is that it is very difficult for the runtime system to look at memory and know what type of object is there. An integer can look just like a pointer. You can guess of course, which is essentially what these "conservative garbage collectors" do, and they do a good job of guessing. But presumably these are restricted to mark-and-sweep garbage collection.

-- Darin Johnson

Reply to
Darin Johnson

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.