Benchmark: STL's list vs. hand-coded one

Hello everyone, in the compiler thread someone (was it Dijkstra?) claimed that M$ STL implementation of the doubly linked list was 50x slower than a hand-coded implementation. Now, I get the metrics that they are equally fast.

OS: Ubuntu 7.10 CPU: Athlon XP 2600+ compiler: gcc 4.1.3 opts: -O3 -march=athlon-xp

Results:

Custom impl. took 4.000000 seconds of CPU time. STL impl. took 4.000000 seconds of CPU time.

(I couldn't find a method of more accurate timing in glibc)

Here's the source:

#include #include #include #include #include

#define N 4000000

struct Node { unsigned int data; struct Node *prev, *next; };

struct List { struct Node *head, *tail; };

/** * Allocates a new doubly linked list. */ static List *list_New() { List *list = (struct List *)malloc(sizeof(struct List)); if (list == NULL) perror("malloc() failed"); list->head = list->tail = NULL; return list; }

/** * Frees a node. */ static void node_Delete(struct Node *node) { free(node); }

/** * Frees resources allocated to a list */ static void list_Delete(struct List *list) { assert(list != NULL); struct Node *node = list->head; while (node != NULL) { struct Node *current = node; node_Delete(current); node = node->next; } free(list); }

static struct Node *list_GetHead(const struct List *list) { assert(list != NULL); return list->head; }

static struct Node *node_GetNext(const struct Node *node) { assert(node != NULL); return node->next; }

/** * Allocates a new node */ static struct Node *node_New() { struct Node *node = (struct Node *)malloc(sizeof(struct Node)); if (node == NULL) perror("malloc() failed."); node->prev = node->next = NULL; return node; }

/** * "Accessor" */ static void node_SetData(struct Node *node, int data) { assert(node != NULL); node->data = data; }

/** * "Accessor". */ static int node_GetData(const struct Node *node) { assert(node != NULL); return node->data; }

/** * Adds a node to the end of the list. */ static void list_Add(struct List *list, struct Node *node) { assert(list != NULL); assert(node != NULL); assert(node->prev == NULL && node->next == NULL); node->prev = list->tail; if (list->head == NULL) list->head = node; if (list->tail != NULL) list->tail->next = node; list->tail = node; }

int main() { time_t t1, t2; t1 = time(NULL); struct List *list = list_New(); int sum1 = 0; for (unsigned int i = 0; i < N; i++) { struct Node *node = node_New(); node_SetData(node, i); list_Add(list, node); } struct Node *node = list_GetHead(list); for (int i = 0; i < N; i++) { sum1 += node_GetData(node); node = node_GetNext(node); } list_Delete(list); t2 = time(NULL); double dt = t2 - t1; printf("Custom impl. took %f seconds of CPU time.\n", dt); t1 = time(NULL); std::list *stlList = new std::list; int sum2 = 0; for (unsigned int i = 0; i < N; i++) { stlList->push_back(i); } std::list::const_iterator i1 = stlList->begin(); std::list::const_iterator i2 = stlList->end();

while (i1 != i2) { sum2 += *i1; i1++; } delete stlList; t2 = time(NULL); dt = t2 - t1; printf("STL impl. took %f seconds of CPU time.\n", dt);

// do prevent optimizer to remove all code as dead return sum1 + sum2; }

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen
Loading thread data ...

On most platforms the clock() function in gives more accurate timing information than the time() function.

Reply to
Patrick de Zeester

On Linux, it returns 0 for some reason, perhaps it is not implemented.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

On the version of Linux you are running that is, on the version of Linux I use the clock() function works just fine.

Reply to
Patrick de Zeester

What distribution, kernel and glibc versions are you using?

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

I checked it on a very old Mandrake distro: Linux version 2.6.8.1-12mdk ( snipped-for-privacy@n5.mandrakesoft.com) (gcc version

3.4.1 (Mandrakelinux (Alpha 3.4.1-3mdk)) #1 Fri Oct 1 12:53:41 CEST 2004 glibc-2.3.3-23.1.101mdk
Reply to
Patrick de Zeester

Ok, gettimeofday() did the trick. Now the times are for 4000000 nodes:

Custom impl. took 0.946796 seconds of CPU time. STL impl. took 0.972560 seconds of CPU time.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

I didn't claim that, nor did anyone else for that matter. I was talking about STL strings and in particular string concatenation.

Anyway, there are some obvious flaws here. For starters few would use a linked list if they had to store such a large number of integers. Applications that do use large linked lists typically do something very different. Your implementation uses 24 bytes for just one node (16 bytes if the heap is 4-byte aligned) compared to just over 4 for a specialized implementation. But worst of all, do you have any idea where over 99% of the time is spent? Let me give you a hint - it is NOT the linked list code...

Wilco

Reply to
Wilco Dijkstra

If this is the case, of course this micro-benchmark is irrelevant. I have to read your post again (if I can find it still..), and perhaps do another benchmark. Easier would be if you could describe the problem with STL string concatenation again?

Well - int was just chosen, it could be anything else. I'll do the same test with a struct that contains a larger amount of data.

What kind of an implementation is this with 4 bytes for a node?

code...

I didn't profile, but very likely it is the memory allocation.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

The problem is as follows: you have 4 small character strings (char *), and create a temporary string by concatenating them with spaces between each string.

Implementing this in the obvious way gives a factor of 50 times in VC++ between strcat/strcpy vs STL.

It wouldn't change the result much, you still spend most of the time in malloc/free. What matters is not the size of the data but the usage pattern - since the benchmark doesn't do anything but create/destroy a long list, it only measures the (high) overhead of memory allocation.

A linked list of arrays.

code...

Correct. This is why big applications use custom allocators that are much faster than malloc/free - think 2 orders of magnitude.

Wilco

Reply to
Wilco Dijkstra

Wilco Dijkstra wrote:

Ok, I'll do a benchmark according to this, when I have some time.

And iterates over the list, that is one point also.

I didn't quite understand. Do you mean something like this:

struct Node { struct Data array[MAX]; struct Node next; };

Ok, profiled it.

g++-4.1.3, -O3 -march=athlon-xp -pg -fno-inline as flags, results:

inlining seems to be quite important for having good STL performance.

(by the way, STL has several memory allocation schemes:

formatting link

gprof output (sorry about the long lines):

Flat profile:

Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 17.01 0.18 0.18 main 16.52 0.35 0.17 1 170.19 300.34 std::_List_base::_M_clear() 7.78 0.43 0.08 16000000 0.00 0.00 std::_List_base::_M_get_Tp_allocator() const 5.83 0.49 0.06 8000001 0.00 0.00 std::list::end() 4.86 0.54 0.05 8000000 0.00 0.00 std::list::_M_create_node(unsigned long long const&) 4.86 0.59 0.05 8000000 0.00 0.00 node_New() 3.89 0.63 0.04 8000000 0.00 0.00 std::_List_const_iterator::operator++(int) 2.92 0.66 0.03 8000000 0.00 0.00 node_GetData(Node const*) 2.92 0.69 0.03 8000000 0.00 0.00 list_Add(List*, Node*) 2.92 0.72 0.03 8000000 0.00 0.00 __gnu_cxx::new_allocator::deallocate(std::_List_node*, unsigned int) 2.92 0.75 0.03 8000000 0.00 0.00 __gnu_cxx::new_allocator::allocate(unsigned int, void const*) 2.92 0.78 0.03 8000000 0.00 0.00 __gnu_cxx::new_allocator::construct(unsigned long long*, unsigned long long const&) 2.92 0.81 0.03 8000000 0.00 0.00 __gnu_cxx::new_allocator::max_size() const 2.92 0.84 0.03 8000000 0.00 0.00 std::list::_M_insert(std::_List_iterator, unsigned long long const&) 2.43 0.86 0.03 8000000 0.00 0.00 operator new(unsigned int, void*) 1.94 0.88 0.02 16000000 0.00 0.00 std::allocator::allocator(std::allocator const&) 1.94 0.90 0.02 8000000 0.00 0.00 node_SetData(Node*, int) 1.94 0.92 0.02 8000000 0.00 0.00 __gnu_cxx::new_allocator::destroy(unsigned long long*) 1.94 0.94 0.02 8000000 0.00 0.00 std::_List_base::_M_put_node(std::_List_node*) 1.94 0.96 0.02 8000000 0.00 0.00 std::list::push_back(unsigned long long const&) 0.97 0.97 0.01 16000001 0.00 0.00 __gnu_cxx::new_allocator::new_allocator() 0.97 0.98 0.01 16000001 0.00 0.00 __gnu_cxx::new_allocator::~new_allocator() 0.97 0.99 0.01 8000001 0.00 0.00 std::_List_const_iterator::operator!=(std::_List_const_iterator const&) const 0.97 1.00 0.01 8000000 0.00 0.00 std::_List_const_iterator::operator*() const 0.97 1.01 0.01 2 5.01 5.01 time_Subtract(timeval*, timeval*) 0.97 1.02 0.01 2 5.01 5.01 std::_List_const_iterator::_List_const_iterator(std::_List_iterator const&) 0.97 1.03 0.01 1 10.01 10.01 std::allocator::allocator() 0.00 1.03 0.00 16000001 0.00 0.00 std::allocator::~allocator() 0.00 1.03 0.00 8000002 0.00 0.00 std::_List_iterator::_List_iterator(std::_List_node_base*) 0.00 1.03 0.00 8000000 0.00 0.00 node_Delete(Node*) 0.00 1.03 0.00 8000000 0.00 0.00 node_GetNext(Node const*) 0.00 1.03 0.00 8000000 0.00 0.00 std::_List_base::_M_get_node() 0.00 1.03 0.00 2 0.00 0.00 __gnu_cxx::new_allocator::~new_allocator() 0.00 1.03 0.00 1 0.00 0.00 list_Delete(List*) 0.00 1.03 0.00 1 0.00 0.00 list_GetHead(List const*) 0.00 1.03 0.00 1 0.00 0.00 list_New() 0.00 1.03 0.00 1 0.00 0.00 __gnu_cxx::new_allocator::new_allocator(__gnu_cxx::new_allocator const&) 0.00 1.03 0.00 1 0.00 0.00 __gnu_cxx::new_allocator::new_allocator() 0.00 1.03 0.00 1 0.00 0.00 std::allocator::allocator(std::allocator const&) 0.00 1.03 0.00 1 0.00 0.00 std::allocator::allocator(std::allocator const&) 0.00 1.03 0.00 1 0.00 0.00 std::allocator::~allocator() 0.00 1.03 0.00 1 0.00 0.00 std::allocator::~allocator() 0.00 1.03 0.00 1 0.00 0.00 std::_List_base::_List_impl::_List_impl(std::allocator const&) 0.00 1.03 0.00 1 0.00 0.00 std::_List_base::_List_impl::~_List_impl() 0.00 1.03 0.00 1 0.00 0.00 std::_List_base::_M_init() 0.00 1.03 0.00 1 0.00 0.00 std::_List_base::_List_base(std::allocator const&) 0.00 1.03 0.00 1 0.00 300.34 std::_List_base::~_List_base() 0.00 1.03 0.00 1 0.00 0.00 std::list::begin() 0.00 1.03 0.00 1 0.00 0.00 std::list::list(std::allocator const&) 0.00 1.03 0.00 1 0.00 0.00 std::list::~list()

% the percentage of the total running time of the time program used by this function.

cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it.

self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing.

calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank.

total the average number of milliseconds spent in this ms/call function and its descendents per call, if this function is profiled, else blank.

name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed. Call graph (explanation follows)

granularity: each sample hit covers 2 byte(s) for 0.97% of 1.03 seconds

index % time self children called name [1] 100.0 0.18 0.86 main [1] 0.02 0.32 8000000/8000000 std::list::push_back(unsigned long long const&) [2] 0.00 0.30 1/1 std::_List_base::~_List_base() [4] 0.05 0.00 8000000/8000000 node_New() [13] 0.04 0.00 8000000/8000000 std::_List_const_iterator::operator++(int) [14] 0.03 0.00 8000000/8000000 list_Add(List*, Node*) [16] 0.03 0.00 8000000/8000000 node_GetData(Node const*) [15] 0.02 0.00 8000000/8000000 node_SetData(Node*, int) [21] 0.01 0.00 8000001/8000001 std::_List_const_iterator::operator!=(std::_List_const_iterator const&) const [25] 0.01 0.00 8000000/8000000 std::_List_const_iterator::operator*() const [26] 0.01 0.00 2/2 time_Subtract(timeval*, timeval*) [27] 0.01 0.00 2/2 std::_List_const_iterator::_List_const_iterator(std::_List_iterator const&) [28] 0.01 0.00 1/1 std::allocator::allocator() [29] 0.00 0.00 1/8000001 std::list::end() [8] 0.00 0.00 1/16000001 __gnu_cxx::new_allocator::new_allocator() [23] 0.00 0.00 1/16000001 __gnu_cxx::new_allocator::~new_allocator() [24] 0.00 0.00 8000000/8000000 node_GetNext(Node const*) [38] 0.00 0.00 1/1 list_New() [42] 0.00 0.00 1/1 list_GetHead(List const*) [41] 0.00 0.00 1/1 list_Delete(List*) [40] 0.00 0.00 1/1 std::_List_base::_List_base(std::allocator const&) [52] 0.00 0.00 1/1 std::list::list(std::allocator const&) [54] 0.00 0.00 1/16000001 std::allocator::~allocator() [35] 0.00 0.00 1/1 std::list::begin() [53] 0.00 0.00 1/2 __gnu_cxx::new_allocator::~new_allocator() [39] 0.00 0.00 1/1 std::allocator::~allocator() [48] 0.00 0.00 1/1 std::_List_base::_List_impl::~_List_impl() [50] 0.00 0.00 1/1 std::list::~list() [55]

----------------------------------------------- 0.02 0.32 8000000/8000000 main [1] [2] 32.5 0.02 0.32 8000000 std::list::push_back(unsigned long long const&) [2] 0.03 0.23 8000000/8000000 std::list::_M_insert(std::_List_iterator, unsigned long long const&) [5] 0.06 0.00 8000000/8000001 std::list::end() [8]

----------------------------------------------- 0.17 0.13 1/1 std::_List_base::~_List_base() [4] [3] 29.1 0.17 0.13 1 std::_List_base::_M_clear() [3] 0.04 0.02 8000000/16000000 std::_List_base::_M_get_Tp_allocator() const [7] 0.02 0.03 8000000/8000000 std::_List_base::_M_put_node(std::_List_node*) [12] 0.02 0.00 8000000/8000000 __gnu_cxx::new_allocator::destroy(unsigned long long*) [22] 0.01 0.00 8000000/16000001 __gnu_cxx::new_allocator::~new_allocator() [24] 0.00 0.00 8000000/16000001 std::allocator::~allocator() [35]

----------------------------------------------- 0.00 0.30 1/1 main [1] [4] 29.1 0.00 0.30 1 std::_List_base::~_List_base() [4] 0.17 0.13 1/1 std::_List_base::_M_clear() [3]

----------------------------------------------- 0.03 0.23 8000000/8000000 std::list::push_back(unsigned long long const&) [2] [5] 24.8 0.03 0.23 8000000 std::list::_M_insert(std::_List_iterator, unsigned long long const&) [5] 0.05 0.18 8000000/8000000 std::list::_M_create_node(unsigned long long const&) [6]

----------------------------------------------- 0.05 0.18 8000000/8000000 std::list::_M_insert(std::_List_iterator, unsigned long long const&) [5] [6] 21.8 0.05 0.18 8000000 std::list::_M_create_node(unsigned long long const&) [6] 0.00 0.06 8000000/8000000 std::_List_base::_M_get_node() [10] 0.03 0.03 8000000/8000000 __gnu_cxx::new_allocator::construct(unsigned long long*, unsigned long long const&) [11] 0.04 0.02 8000000/16000000 std::_List_base::_M_get_Tp_allocator() const [7] 0.01 0.00 8000000/16000001 __gnu_cxx::new_allocator::~new_allocator() [24] 0.00 0.00 8000000/16000001 std::allocator::~allocator() [35]

----------------------------------------------- 0.04 0.02 8000000/16000000 std::_List_base::_M_clear() [3] 0.04 0.02 8000000/16000000 std::list::_M_create_node(unsigned long long const&) [6] [7] 10.7 0.08 0.03 16000000 std::_List_base::_M_get_Tp_allocator() const [7] 0.02 0.00 16000000/16000000 std::allocator::allocator(std::allocator const&) [20] 0.01 0.00 16000000/16000001 __gnu_cxx::new_allocator::new_allocator() [23]

----------------------------------------------- 0.00 0.00 1/8000001 main [1] 0.06 0.00 8000000/8000001 std::list::push_back(unsigned long long const&) [2] [8] 5.8 0.06 0.00 8000001 std::list::end() [8] 0.00 0.00 8000001/8000002 std::_List_iterator::_List_iterator(std::_List_node_base*) [36]

----------------------------------------------- 0.03 0.03 8000000/8000000 std::_List_base::_M_get_node() [10] [9] 5.8 0.03 0.03 8000000 __gnu_cxx::new_allocator::allocate(unsigned int, void const*) [9] 0.03 0.00 8000000/8000000 __gnu_cxx::new_allocator::max_size() const [18]

----------------------------------------------- 0.00 0.06 8000000/8000000 std::list::_M_create_node(unsigned long long const&) [6] [10] 5.8 0.00 0.06 8000000 std::_List_base::_M_get_node() [10] 0.03 0.03 8000000/8000000 __gnu_cxx::new_allocator::allocate(unsigned int, void const*) [9]

----------------------------------------------- 0.03 0.03 8000000/8000000 std::list::_M_create_node(unsigned long long const&) [6] [11] 5.3 0.03 0.03 8000000 __gnu_cxx::new_allocator::construct(unsigned long long*, unsigned long long const&) [11] 0.03 0.00 8000000/8000000 operator new(unsigned int, void*) [19]

----------------------------------------------- 0.02 0.03 8000000/8000000 std::_List_base::_M_clear() [3] [12] 4.9 0.02 0.03 8000000 std::_List_base::_M_put_node(std::_List_node*) [12] 0.03 0.00 8000000/8000000 __gnu_cxx::new_allocator::deallocate(std::_List_node*, unsigned int) [17]

----------------------------------------------- 0.05 0.00 8000000/8000000 main [1] [13] 4.9 0.05 0.00 8000000 node_New() [13]

----------------------------------------------- 0.04 0.00 8000000/8000000 main [1] [14] 3.9 0.04 0.00 8000000 std::_List_const_iterator::operator++(int) [14]

----------------------------------------------- 0.03 0.00 8000000/8000000 main [1] [15] 2.9 0.03 0.00 8000000 node_GetData(Node const*) [15]

----------------------------------------------- 0.03 0.00 8000000/8000000 main [1] [16] 2.9 0.03 0.00 8000000 list_Add(List*, Node*) [16]

----------------------------------------------- 0.03 0.00 8000000/8000000 std::_List_base::_M_put_node(std::_List_node*) [12] [17] 2.9 0.03 0.00 8000000 __gnu_cxx::new_allocator::deallocate(std::_List_node*, unsigned int) [17]

----------------------------------------------- 0.03 0.00 8000000/8000000 __gnu_cxx::new_allocator::allocate(unsigned int, void const*) [9] [18] 2.9 0.03 0.00 8000000 __gnu_cxx::new_allocator::max_size() const [18]

----------------------------------------------- 0.03 0.00 8000000/8000000 __gnu_cxx::new_allocator::construct(unsigned long long*, unsigned long long const&) [11] [19] 2.4 0.03 0.00 8000000 operator new(unsigned int, void*) [19]

----------------------------------------------- 0.02 0.00 16000000/16000000 std::_List_base::_M_get_Tp_allocator() const [7] [20] 1.9 0.02 0.00 16000000 std::allocator::allocator(std::allocator const&) [20]

----------------------------------------------- 0.02 0.00 8000000/8000000 main [1] [21] 1.9 0.02 0.00 8000000 node_SetData(Node*, int) [21]

----------------------------------------------- 0.02 0.00 8000000/8000000 std::_List_base::_M_clear() [3] [22] 1.9 0.02 0.00 8000000 __gnu_cxx::new_allocator::destroy(unsigned long long*) [22]

----------------------------------------------- 0.00 0.00 1/16000001 main [1] 0.01 0.00 16000000/16000001 std::_List_base::_M_get_Tp_allocator() const [7] [23] 1.0 0.01 0.00 16000001 __gnu_cxx::new_allocator::new_allocator() [23]

----------------------------------------------- 0.00 0.00 1/16000001 main [1] 0.01 0.00 8000000/16000001 std::_List_base::_M_clear() [3] 0.01 0.00 8000000/16000001 std::list::_M_create_node(unsigned long long const&) [6] [24] 1.0 0.01 0.00 16000001 __gnu_cxx::new_allocator::~new_allocator() [24]

----------------------------------------------- 0.01 0.00 8000001/8000001 main [1] [25] 1.0 0.01 0.00 8000001 std::_List_const_iterator::operator!=(std::_List_const_iterator const&) const [25]

----------------------------------------------- 0.01 0.00 8000000/8000000 main [1] [26] 1.0 0.01 0.00 8000000 std::_List_const_iterator::operator*() const [26]

----------------------------------------------- 0.01 0.00 2/2 main [1] [27] 1.0 0.01 0.00 2 time_Subtract(timeval*, timeval*) [27]

----------------------------------------------- 0.01 0.00 2/2 main [1] [28] 1.0 0.01 0.00 2 std::_List_const_iterator::_List_const_iterator(std::_List_iterator const&) [28]

----------------------------------------------- 0.01 0.00 1/1 main [1] [29] 1.0 0.01 0.00 1 std::allocator::allocator() [29]

----------------------------------------------- 0.00 0.00 1/16000001 main [1] 0.00 0.00 8000000/16000001 std::_List_base::_M_clear() [3] 0.00 0.00 8000000/16000001 std::list::_M_create_node(unsigned long long const&) [6] [35] 0.0 0.00 0.00 16000001 std::allocator::~allocator() [35]

----------------------------------------------- 0.00 0.00 1/8000002 std::list::begin() [53] 0.00 0.00 8000001/8000002 std::list::end() [8] [36] 0.0 0.00 0.00 8000002 std::_List_iterator::_List_iterator(std::_List_node_base*) [36]

----------------------------------------------- 0.00 0.00 8000000/8000000 list_Delete(List*) [40] [37] 0.0 0.00 0.00 8000000 node_Delete(Node*) [37]

----------------------------------------------- 0.00 0.00 8000000/8000000 main [1] [38] 0.0 0.00 0.00 8000000 node_GetNext(Node const*) [38]

----------------------------------------------- 0.00 0.00 1/2 main [1] 0.00 0.00 1/2 std::_List_base::_List_base(std::allocator const&) [52] [39] 0.0 0.00 0.00 2 __gnu_cxx::new_allocator::~new_allocator() [39]

----------------------------------------------- 0.00 0.00 1/1 main [1] [40] 0.0 0.00 0.00 1 list_Delete(List*) [40] 0.00 0.00 8000000/8000000 node_Delete(Node*) [37]

----------------------------------------------- 0.00 0.00 1/1 main [1] [41] 0.0 0.00 0.00 1 list_GetHead(List const*) [41]

----------------------------------------------- 0.00 0.00 1/1 main [1] [42] 0.0 0.00 0.00 1 list_New() [42]

----------------------------------------------- 0.00 0.00 1/1 std::_List_base::_List_impl::_List_impl(std::allocator const&) [49] [43] 0.0 0.00 0.00 1 __gnu_cxx::new_allocator::new_allocator(__gnu_cxx::new_allocator const&) [43]

----------------------------------------------- 0.00 0.00 1/1 std::_List_base::_List_base(std::allocator const&) [52] [44] 0.0 0.00 0.00 1 __gnu_cxx::new_allocator::new_allocator() [44]

----------------------------------------------- 0.00 0.00 1/1 std::_List_base::_List_base(std::allocator const&) [52] [45] 0.0 0.00 0.00 1 std::allocator::allocator(std::allocator const&) [45]

----------------------------------------------- 0.00 0.00 1/1 std::_List_base::_List_impl::_List_impl(std::allocator const&) [49] [46] 0.0 0.00 0.00 1 std::allocator::allocator(std::allocator const&) [46]

----------------------------------------------- 0.00 0.00 1/1 std::_List_base::_List_base(std::allocator const&) [52] [47] 0.0 0.00 0.00 1 std::allocator::~allocator() [47]

----------------------------------------------- 0.00 0.00 1/1 main [1] [48] 0.0 0.00 0.00 1 std::allocator::~allocator() [48]

----------------------------------------------- 0.00 0.00 1/1 std::_List_base::_List_base(std::allocator const&) [52] [49] 0.0 0.00 0.00 1 std::_List_base::_List_impl::_List_impl(std::allocator const&) [49] 0.00 0.00 1/1 __gnu_cxx::new_allocator::new_allocator(__gnu_cxx::new_allocator const&) [43] 0.00 0.00 1/1 std::allocator::allocator(std::allocator const&) [46]

----------------------------------------------- 0.00 0.00 1/1 main [1] [50] 0.0 0.00 0.00 1 std::_List_base::_List_impl::~_List_impl() [50]

----------------------------------------------- 0.00 0.00 1/1 std::_List_base::_List_base(std::allocator const&) [52] [51] 0.0 0.00 0.00 1 std::_List_base::_M_init() [51]

----------------------------------------------- 0.00 0.00 1/1 main [1] [52] 0.0 0.00 0.00 1 std::_List_base::_List_base(std::allocator const&) [52] 0.00 0.00 1/1 __gnu_cxx::new_allocator::new_allocator() [44] 0.00 0.00 1/1 std::allocator::allocator(std::allocator const&) [45] 0.00 0.00 1/1 std::_List_base::_List_impl::_List_impl(std::allocator const&) [49] 0.00 0.00 1/2 __gnu_cxx::new_allocator::~new_allocator() [39] 0.00 0.00 1/1 std::allocator::~allocator() [47] 0.00 0.00 1/1 std::_List_base::_M_init() [51]

----------------------------------------------- 0.00 0.00 1/1 main [1] [53] 0.0 0.00 0.00 1 std::list::begin() [53] 0.00 0.00 1/8000002 std::_List_iterator::_List_iterator(std::_List_node_base*) [36]

----------------------------------------------- 0.00 0.00 1/1 main [1] [54] 0.0 0.00 0.00 1 std::list::list(std::allocator const&) [54]

----------------------------------------------- 0.00 0.00 1/1 main [1] [55] 0.0 0.00 0.00 1 std::list::~list() [55]

-----------------------------------------------

This table describes the call tree of the program, and was sorted by the total amount of time spent in each function and its children.

Each entry in this table consists of several lines. The line with the index number at the left hand margin lists the current function. The lines above it list the functions that called this function, and the lines below it list the functions this one called. This line lists: index A unique number given to each element of the table. Index numbers are sorted numerically. The index number is printed next to every function name so it is easier to look up where the function in the table.

% time This is the percentage of the `total' time that was spent in this function and its children. Note that due to different viewpoints, functions excluded by options, etc, these numbers will NOT add up to 100%.

self This is the total amount of time spent in this function.

children This is the total amount of time propagated into this function by its children.

called This is the number of times the function was called. If the function called itself recursively, the number only includes non-recursive calls, and is followed by a `+' and the number of recursive calls.

name The name of the current function. The index number is printed after it. If the function is a member of a cycle, the cycle number is printed between the function's name and the index number.

For the function's parents, the fields have the following meanings:

self This is the amount of time that was propagated directly from the function into this parent.

children This is the amount of time that was propagated from the function's children into this parent.

called This is the number of times this parent called the function `/' the total number of times the function was called. Recursive calls to the function are not included in the number after the `/'.

name This is the name of the parent. The parent's index number is printed after it. If the parent is a member of a cycle, the cycle number is printed between the name and the index number.

If the parents of the function cannot be determined, the word `' is printed in the `name' field, and all the other fields are blank.

For the function's children, the fields have the following meanings:

self This is the amount of time that was propagated directly from the child into the function.

children This is the amount of time that was propagated from the child's children to the function.

called This is the number of times the function called this child `/' the total number of times the child was called. Recursive calls by the child are not listed in the number after the `/'.

name This is the name of the child. The child's index number is printed after it. If the child is a member of a cycle, the cycle number is printed between the name and the index number.

If there are any cycles (circles) in the call graph, there is an entry for the cycle-as-a-whole. This entry shows who called the cycle (as parents) and the members of the cycle (as children.) The `+' recursive calls entry shows the number of function calls that were internal to the cycle, and the calls entry for each member shows, for that member, how many times it was called from other members of the cycle.

Index by function name

[40] list_Delete(List*) [23] __gnu_cxx::new_allocator::new_allocator() [51] std::_List_base::_M_init() [37] node_Delete(Node*) [24] __gnu_cxx::new_allocator::~new_allocator() [3] std::_List_base::_M_clear() [41] list_GetHead(List const*) [18] __gnu_cxx::new_allocator::max_size() const [52] std::_List_base::_List_base(std::allocator const&) [15] node_GetData(Node const*) [7] std::_List_base::_M_get_Tp_allocator() const [4] std::_List_base::~_List_base() [38] node_GetNext(Node const*) [26] std::_List_const_iterator::operator*() const [36] std::_List_iterator::_List_iterator(std::_List_node_base*) [21] node_SetData(Node*, int) [25] std::_List_const_iterator::operator!=(std::_List_const_iterator const&) const [28] std::_List_const_iterator::_List_const_iterator(std::_List_iterator const&) [27] time_Subtract(timeval*, timeval*) [45] std::allocator::allocator(std::allocator const&) [14] std::_List_const_iterator::operator++(int) [16] list_Add(List*, Node*) [46] std::allocator::allocator(std::allocator const&) [6] std::list::_M_create_node(unsigned long long const&) [42] list_New() [47] std::allocator::~allocator() [8] std::list::end() [13] node_New() [48] std::allocator::~allocator() [53] std::list::begin() [17] __gnu_cxx::new_allocator::deallocate(std::_List_node*, unsigned int) [29] std::allocator::allocator() [5] std::list::_M_insert(std::_List_iterator, unsigned long long const&) [9] __gnu_cxx::new_allocator::allocate(unsigned int, void const*) [20] std::allocator::allocator(std::allocator const&) [2] std::list::push_back(unsigned long long const&) [43] __gnu_cxx::new_allocator::new_allocator(__gnu_cxx::new_allocator const&) [35] std::allocator::~allocator() [54] std::list::list(std::allocator const&) [44] __gnu_cxx::new_allocator::new_allocator() [49] std::_List_base::_List_impl::_List_impl(std::allocator const&) [55] std::list::~list() [39] __gnu_cxx::new_allocator::~new_allocator() [50] std::_List_base::_List_impl::~_List_impl() [19] operator new(unsigned int, void*) [22] __gnu_cxx::new_allocator::destroy(unsigned long long*) [10] std::_List_base::_M_get_node() [1] main [11] __gnu_cxx::new_allocator::construct(unsigned long long*, unsigned long long const&) [12] std::_List_base::_M_put_node(std::_List_node*)
--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Am I not yet awake, or should this be:

struct Node { struct Data array[MAX]; struct Node *next; };

Mark Borgerson

Reply to
Mark Borgerson

You seem to be quite awake, even alert.. :)

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Two points:

1) a "big application" has something wrong in it, if it is memory allocation bound, not I/O bound nor CPU bound

2) you can write a custom allocator in STL, too:

formatting link

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

I think I was dreaming of hundreds of nested little boxes-- each containing a bit of data and another box. Seemed like a good way to stress out the memory allocator! ;-)

Mark Borgerson

Reply to
Mark Borgerson

You're quite easily memory allocation bound if you use data structures with many small nodes. If you end up with fragmentation or simply use memory inefficiently everything will slow down due to worse cache/TLB/ paging behaviour. GCC is a great example of how this sort of thing can cost you dearly even if profiling shows you spend almost no time in the allocator.

A compiler is a good example of a big application that is allocation bound - in fact it is hard to find anything in a compiler that isn't allocation bound (perhaps the scanner as it spends a rather large amount of time skipping spaces and comments...). Reverting the custom allocators in a compiler I used to work on to malloc/free slowed things down by a factor of 2.

Sure it is possible to do that. Many custom allocators work on the same principles. It is not easy to get a good balance between memory use, fragmentation and allocation performance though. One trick you can do is to get rid of deallocation altogether and deallocate whole datastructures in one step.

Wilco

Reply to
Wilco Dijkstra

Why in the hell should a compiler be allocation bound when it should be able to simply start off by allocating enough space in a single chunk for a table of about 500,000 symbols? That would require perhaps 64megabytes out of the gigabyte of memory in the computer.

Allocation problems should have disappeared when we moved to a 32-bit OS and onto PC with 512MBytes or more of memory. I generally don't try to compile and play Far Cry at the same time, so lack of physical memory shouldn't be a problem.

Do cross compiler writers really worry about memory usage these days? That sounds so DOS 3.0!

Mark Borgerson

Reply to
Mark Borgerson

allocation

A symbol table doesn't use much memory, I certainly have never seen a program with that many unique symbols! Reserving memory in bigger chunks (~64KBytes) is indeed how most compilers reduce the cost of memory allocation.

While you could easily use ~200MBytes on current PC's, things won't be going fast if you do. Current L2 caches are around 512KB, so keeping the working set small still matters. Main memory latency hasn't improved much over the last 10 years.

Yes, if you want to compile your applications in a reasonable time. Even if you had 4GBytes, if your working set doesn't fit in the L2, things are going to take forever (main memory is at least 10 times as slow as the L2).

In my previous job the time to build large applications (think many millions lines of code) was essential to many customers. People wanted things to compile as fast as VC++ but with full optimization...

Wilco

Reply to
Wilco Dijkstra

Mark Borgerson writes: [snip]

How many people have a Gbyte of memory? Some of us still get lots of work done on machines that have 8 Mbytes and find that to be more than enough (obviously not using any M$ products).

Your approach just moves the problem from the allocator to the virtual memory manager.

There are still applications around that run on machines using MS-DOS.

Your approach sounds so Microsofty.

Reply to
Everett M. Greene

Every new PC on the shelf at Staples (starting at about $400) seems to have a gig of RAM. If you are still working on a machine that came with 8MB of RAM, I'll bet that machine cost you a lot more than $400! The TRITON-LP embedded linux boards I've used have 16 to 32MB of RAM, but I still run the compiler for them on a laptop with 512MB of RAM.

In any case, I don't think you need a gig of RAM to allocate a 64MByte symbol table----unless you are running Vista, of course! ;-)

Which should have little problem if the OS allows you to specify non-swappable memory for that 64MBytes---or at least to not have the memory swapped out while the compiler is running.

Just as a test, I inserted the following into one of my PC host programs written with Borland C++ Builder:

// at the beginning #define BUFSIZE 64000000l static char bigbuf[BUFSIZE]; // static 64MByte buffer

// later, in startup code

for(i= 0l i

Yes, but how many of them are cross compilers for the embedded processors we are using today?

And new() and free() sound so Computer-Sciencey (?????) My experince with the CS community is that they like to abstract away such things as memory allocation and I/O and leave the details to someone else. (I got this impression when I taught CS at the university level back in the 80s).

Microsoft may be a large part of the reason new computers come with a lot of RAM, but I don't feel at all bad about taking some of that RAM away from the OS to do the jobs I need to do.

Mark Borgerson

Reply to
Mark Borgerson

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.