Benchmarks: STL's string vs. C string

Ok, this the std::string vs. strcat() benchmark.

g++-4.1.3 opts: -O3 -march=athlon-xp -Wall -pedantic

Results:

jyrki@pulteri:~$ ./a.out |grep seconds strcat() impl. took 0.149899 seconds. STL impl. took 0.000599 seconds.

The STL string seems to be a shitload of faster than strcat(). The program seems to be working ok, since with 10000 iterations:

jyrki@pulteri:~$ ./a.out |grep bla |wc -l

20000

The source:

#include #include #include #include #include #include

#define LOOP_TIMES 10000

static double time_Subtract(struct timeval *x, struct timeval *y) { /* Perform the carry for the later subtraction by updating y. */ if (x->tv_usec < y->tv_usec) { int nsec = (y->tv_usec - x->tv_usec) / 1000000 + 1; y->tv_usec -= 1000000 * nsec; y->tv_sec += nsec; } if (x->tv_usec - y->tv_usec > 1000000) { int nsec = (x->tv_usec - y->tv_usec) / 1000000; y->tv_usec += 1000000 * nsec; y->tv_sec -= nsec; } return (double)(x->tv_sec - y->tv_sec) + (x->tv_usec - y->tv_usec) / (double)1000000.0; }

int main() { struct timeval t1, t2; const char filler[] = "bla..\n"; gettimeofday(&t1, NULL); size_t bufferLength = LOOP_TIMES * (strlen(filler) + 1); printf("buffer length will be %d bytes.\n", bufferLength); char *buffer = (char *)malloc(bufferLength); if (buffer == NULL) perror("malloc() failed.");

for (int i = 0; i < LOOP_TIMES; i++) { strcat(buffer, filler); } printf("%s\n", buffer); free(buffer); gettimeofday(&t2, NULL); double dt = time_Subtract(&t2, &t1); printf("strcat() impl. took %f seconds.\n", dt); gettimeofday(&t1, NULL); std::string *stlBuffer = new std::string(); stlBuffer->reserve(bufferLength); std::string *stlFiller = new std::string(filler); for (int i = 0; i < LOOP_TIMES; i++) { *stlBuffer += *stlFiller;

} delete stlFiller; gettimeofday(&t2, NULL); printf("%s\n", stlBuffer->c_str()); delete stlBuffer;

dt = time_Subtract(&t2, &t1); printf("STL impl. took %f seconds.\n", dt);

return EXIT_SUCCESS; }

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen
Loading thread data ...

I really started to wonder about these figures, what might be the cause of this orders of magnitude difference in performance, but couldn't come with any reason. Anyone else?

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

My guess is that strcat() finds the place to start adding the new data by starting at the beginning of the string and incrementing a pointer into the string until it finds the 0x00 that marks the end of the string. It then starts placing the new data at that position. That is what is done in the implementation of strcat() in the library that comes with the IAR Embedded Workbench for ARM.

Being a hardware-software guy, I would set a scope output bit before the strcat() call and clear it afterward. If I then saw the strcat() time increase with the length of the string, I would have the evidence I needed.

The STL implementation may include either a pointer to the end of the string, or the length of the string as part of the structure. In that case, adding to the string doesn't involve stepping through the destination string. The CPU can compute the starting position for the concatenation in a few cycles.

Mark Borgerson

Reply to
Mark Borgerson

Seems to be the case with glibc too. Using strcpy() removed this orders of magnitude of difference, STL and strcpy() being in the same class, as one would expect:

strcat() impl. took 0.004748 seconds. STL impl. took 0.007236 seconds.

(this is for 100000 iterations)

Now the loop looks like:

char *tmp = buffer; int fillerLength = strlen(filler); for (int i = 0; i < LOOP_TIMES; i++) { strcpy(tmp, filler); tmp += fillerLength; }

Anyhow, this doesn't back up Dijkstra's claim that STL's string would be

50x slower than C-style string operations.
--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

This was the reason, indeed. Not being programmed C strings for a while, I didn't think of this at all. I had the intuition that strcat() has some wisdom about the terminating byte in the buffer, which is of course more than ridiculous.. :)

Anyway, the point of this little excercise was to prove Dijkstra's claim about STL being 50x slower in string concatenation, false.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Perhaps STL keeps track of the length of each string object, so that the catenation operator does not have to scan for the final null byte of the destination string, and can just copy the source string into the right position.

In your test, the strcat loop has a quadratic complexity in the number of iterations because the "buffer" string grows by a fixded amount on each iteration, so the scan for the final null grows longer in proportion to the iteration number.

If STL knows the length of "stlBuffer", each iteration of the STL loop takes the same time and the overall loop has linear complexity in the number of iterations.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .
Reply to
Niklas Holsti

Anybody access to MS VC++ could try this benchmark to see, how MS's STL implementation compares to the libstdc++. I wouldn't expect a 50x decrease in performance still..

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Such a claim is meaningless, because the outcome of such a comparison depends very much on how the C and std:string version of the test are implemented, the compiler being used and last but not least which standard library implementation is being used (the C++ standard does not dictate how the standard library is to be implemented, multiple implementations do exist). It is easy to prove that the std:string is faster than C style string manipulation, or the other way around.

Reply to
Patrick de Zeester

I think you are testing something different than the original claim. The original claim IIRC was for concatenating a small number of small strings together (ie two or three strings into a single string), not concatenating a large number of strings into a single string. And then doing this for multiple independent destination strings. That's quite a different scenario.

Robert

Reply to
Robert Adsett

Well... your test program gives STL some help by using "reserve" to allocate a sufficiently large buffer for the string. It is true that you must do that also for the strcat solution, but perhaps it wasn't done in the application where STL was observed to be slower than strcat. The programmer perhaps expected STL to handle the buffer-size automatically -- and of course it can do that, but it takes some time.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .
Reply to
Niklas Holsti

Well - I wanted to give both options an equal chance. That is why I did "reserve" for the C-style solution also. Besides, std::string:reserve cannot explain a 50x difference in run-time performance in it self.

You have a point: one must understand what the library calls do in the low level (which I clearly didn't understand when I first used strcat()..).

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Well, you can modify the code to use these parameters, shoudn't be that many lines to modify?

"then doing this for multiple independent destination string"? Either you didn't read the code, or you didn't not read the code..

There one buffer for 1) C-style string concatenation and 2) for STL string. Both were allocated to the maximum required, before starting the concatenation loop, and this allocation was by the way included in the CPU time used also.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

And for one of several million usage scenarios you have achieved your goal

- bravo.

Reply to
nospam

Not having a C++ compiler to hand it seems rather pointless.

I read the code, it repeatedly adds to a single destination string.

You would want something more like

strcat( string1, post); strcat( string2, post); strcat( string3, post); strcat( string4, post);

...

Thus the independent destination strings (string1 .. stringn). As I said a quite different scenario. I also wouldn't be surprised if the original C version did no dynamic memory allocation.

My point exactly

Robert

Reply to
Robert Adsett

VC++ 8.0 -Ox gives (using rtdsc):

LOOP_TIMES = 100000 buffer length will be 700000 bytes. strcat() impl. took 2369995 cycles. STL impl. took 9826962 cycles.

LOOP_TIMES = 1000000 buffer length will be 7000000 bytes. strcat() impl. took 25410537 cycles. STL impl. took 99395512 cycles.

So with VC++ this STL benchmark is about 4 times as slow.

Wilco

Reply to
Wilco Dijkstra

OK, here is my benchmark based on the original application. I'm using a different PC and a newer compiler so getting the exact same numbers would be hard. Nonetheless it shows a factor 33 difference between the obvious (but naive) STL string concatenation and a well written C version...

That's just the performance difference. There is also a big codesize difference...

Cycle times on Athlon64 2800+ for 100000 iterations:

Cstring1 time: 35314606 (20.7 times faster than STL1, 2.9 vs STL2) Cstring2 time: 22238073 (32.9 times faster than STL, 4.7 vs STL2) STL1 time: 732415651 STL2 time: 103694936

// Simple C vs STL string concatenation example (c) 2007 Wilco Dijkstra

#define _CRT_SECURE_NO_DEPRECATE #include #include #include

using namespace std;

typedef unsigned int int32; typedef unsigned long long uint64;

#define TIMES 100000

static inline uint64 __declspec(naked) CycleCount() { __asm { rdtsc // returns 64-bit count in edx(hi) and eax(lo) ret 0 } }

void TestSTLString(void) { uint64 tstart; const char *s1 = "this is"; const char *s2 = "an example of"; const char *s3 = "string"; const char *s4 = "concatenation";

string s; tstart = CycleCount(); for (int32 i = 0; i < TIMES; i++) { // STL written the obvious (but slow) way... s = (string)s1 + " " + s2 + " " + s3 + " " + s4; } printf("STL1 time: %llu\n", CycleCount() - tstart); printf("Output: %s\n", s.c_str());

string s0; tstart = CycleCount(); for (int32 i = 0; i < TIMES; i++) { // Optimized STL needing no temporaries or allocation s0.reserve(strlen(s1) + strlen(s2) + strlen(s3) + strlen(s4) + 3); s0 = s1; s0 += " "; s0 += s2; s0 += " "; s0 += s3; s0 += " "; s0 += s4; } printf("STL2 time: %llu\n", CycleCount() - tstart); printf("Output: %s\n", s0.c_str()); }

void TestCString(void) { uint64 tstart; const char *s1 = "this is"; const char *s2 = "an example of"; const char *s3 = "string"; const char *s4 = "concatenation";

char buf[1024]; char *s; tstart = CycleCount(); for (int32 i = 0; i < TIMES; i++) { // strcpy and strcat written the obvious way s = &buf[0]; strcpy(s, s1); strcat(s, " "); strcat(s, s2); strcat(s, " "); strcat(s, s3); strcat(s, " "); strcat(s, s4); } printf("Cstring1 time: %llu\n", CycleCount() - tstart); printf("Output: %s\n", buf);

tstart = CycleCount(); for (int32 i = 0; i < TIMES; i++) { // Optimized by expanding strcat into strlen and strcpy s = &buf[0]; strcpy(s, s1); s += strlen(s1); *s++ = ' '; strcpy(s, s2); s += strlen(s2); *s++ = ' '; strcpy(s, s3); s += strlen(s3); *s++ = ' '; strcpy(s, s4); s += strlen(s4); } printf("Cstring2 time: %llu\n", CycleCount() - tstart); printf("Output: %s\n", buf); }

int main(void) { TestCString(); TestSTLString(); return 0; }

Reply to
Wilco Dijkstra

The following is a quotation from strncpy.zip, on my page:

Why don't you repeat those timing tests using the routines in that file (purely standard C) which have been expressly designed to require no further support, and thus are very suitable for embedded use. Note that my routines have been put in the public domain.

--
 Chuck F (cbfalconer at maineline dot net)
   
   Try the download section.
Reply to
CBFalconer

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

(I managed to send an empty post - sorry for that - but the ISP's news server doesn't seem to support canceling)

Well - since you have to source - you can try to find a case where strcpy() solution would be 50x faster than std::string::append() (which the overloaded operator += uses).

I halved the string size what is concatenated to the buffer, and the results are quite similar what they used to be:

strcpy() impl. took 0.073709 seconds. STL impl. took 0.084334 seconds.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

How your implementation of, let's say strcpy(), is different from an obvious one with a for loop?

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.