Benchmarks: STL's string vs. C string

(the output is wrong by the way, it should say "strcpy() impl. took..", I didn't remember to change it when I changed the code to use strcpy() instead of strcat()).

Ok, this is great that you did this test. Still quite far from 50x.. :)

What optimizations does -Ox enable (too lazy to do a lookup to msdn.microsoft.com), does it include inlining? Anyhow, it seems that there's something wrong with MS's STL implementation.

Could you try two things:

1) profile 2) try with SGI's STL implementation, for example
--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen
Loading thread data ...

If you are having MS Windows on your desktop, you can download Visual Studio Express from MS for free:

formatting link

(don't know if it includes STL or optimizations, at least some years ago MS distributed a free version of VC++ without an IDE and optimizations disabled in the compiler)

Ah, ok. I misunderstod you, sorry.

What are you referring as "the original C version"?

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Which compiler did you use? I'll post results with g++ and libstdc++ when get off from work, now I have to go and earn some money.

Did you strip the binaries? How large size difference are we talking about here?

Now, you can calling std::string::reserve() in the loop, as strlen(). What happens if you move those out of the loop?

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

That's because you didn't benchmark the problem I stated.

-Ox is maximum optimization and typically gives best performance. It includes aggressive inlining of course, but STL code is usually too big to be fully inlined. Looking at the disassembly it can't inline the calls from the inner loop as they are quite large and make numerous calls themselves. Due to being generic, STL always produces large amounts of needless code that most compilers find hard to optimise.

Let's compare our timings in cycles on your benchmark (since we both have Athlons the cycles should be close):

GCC C: 10.1M cycles GCC STL: 15.4M cycles VC++ C: 2.4M cycles VC++ STL: 9.8M cycles

From this I conclude there is nothing wrong with VC++ STL - it's faster than GCC's STL. It looks like GCC has a problem with strcpy/strcat...

Wilco

Reply to
Wilco Dijkstra

Define "needless" code. Besides, the linker will usually remove unneeded object code anyway. When you instanciate a template, let's say std::list, while the whole code resulting from instancation is compiled, unneeded methods are removed in the linking phase.

An analogy from C world: I could say that using standard C library always produces large amounts of needless code, since the C library is linked to the executable, if you link statically. As in the C++ standard library case, code that is not needed is stripped out by the linker.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Here's output from your version (g++-4.1.3, -O3 -march=athlon-xp):

jyrki@pulteri:~$ ./str2 100000 Cstring1 time: 0.011409 seconds Output: this is an example of string concatenation Cstring1 time: 0.000192 seconds Output: this is an example of string concatenation STL1 time: 0.217633 seconds Output: this is an example of string concatenation STL2 time: 0.024909 seconds Output: this is an example of string concatenation

So, a new instance of std::string, "s" is created for each loop iteration, and so on.

Your benchmark is comparing apples and oranges, you should make C and STL versions with an assumptation, that the user knows both C string handling and STL string handling. Now your code assumes that the user only knows only C string handling properly.

By the way, let's take Java to this game:

jyrki@pulteri:~$ java str 2000000

2000000 iterations using StringBuilder took 0.127 seconds. 2000000 iterations using StringBuffer took 0.182 seconds. jyrki@pulteri:~$ ./a.out 2000000 |grep seconds strcpy() impl. took 0.081285 seconds. STL impl. took 0.071333 seconds.

Here's the straightforward Java implementation:

public class str { public static void main(String[] args) throws Exception { if (args.length == 0) { System.out.println("Usage: N iterations"); System.exit(1); } int n = Integer.parseInt(args[0]); String filler = "bl\n"; long t1 = System.currentTimeMillis(); StringBuilder stringBuilderBuffer = new StringBuilder(n * filler.length()); for (int i = 0; i < n; i++) { stringBuilderBuffer.append(filler); } long t2 = System.currentTimeMillis(); double dt = (t2 - t1) / 1000.0; System.out.println(n + " iterations using StringBuilder took " + dt + " seconds.");

t1 = System.currentTimeMillis(); StringBuffer stringBufferBuffer = new StringBuffer(n * filler.length()); for (int i = 0; i < n; i++) { stringBufferBuffer.append(filler); } t2 = System.currentTimeMillis(); dt = (t2 - t1) / 1000.0; System.out.println(n + " iterations using StringBuffer took " + dt + " seconds."); } }

Not that bad, considering that Java uses UTF-16 encoding of Unicode, which is 16 bits per character (simplified case). (The difference between StringBuilder and StringBuffer is that the first is not thread-safe, while the latter is.)

A naive way in Java would be using the immutable String objects, where each concatenation results in two pieces of garbage. This can be seen quite often. Your first STL version is implemented - sort of - in the same way.

Maybe the point of this discussion, is that one has to know his/hers libraries, no matter whether it is C standard library, C++ standard library or J2SE standard library. Very much like I was mislead when using strcat() instead of strcpy() by not remembering what is the difference between the two (not been using string.h functions for several years), neither did man pages of the functions describe this difference.

Can we agree on this? ;)

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

.. even the rhs of assignment to "s" is copied to the stack to replace the old instance of "s".

In your Cstring1, you allocate the destination buffer before the loop, so there's no overhead because of that.

In your optimized STL code, you again copy the "s1" instance of std::string to "s0" (*). But since "s0" is quite small of length, this probably doesn't matter that much.

I think you should have used pointers like you did in the C case, to have an equal comparison.

string s0; for (int32 i = 0; i < n; i++) { // Optimized STL needing no temporaries or allocation s0.reserve(strlen(s1) + strlen(s2) + strlen(s3) + strlen(s4) + 3);

  • s0 = s1; s0 += " "; s0 += s2; s0 += " "; s0 += s3; s0 += " "; s0 += s4; }

(calling s0.reserve() in the loop probably doesn't have that much effect and those strlen()s inside reserve() the compiler is probably moved outside the loop, since your s1, s2, s3 and s4 were const)

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

.. should have been "the compiler is probably _moving_".. (actually the erroneous sentence is quite funny.. :))

Reply to
Jyrki Saarinen

could be optimized further, cause strcpy already scans once through the string till end-of-string. If you use stpcpy (afaik its no standard C function, but in linux its part of the library) you could do it stpcpy(stpcpy(stpcpy(stpcpy(s, s1), s2), s3), s4)

Florian

Reply to
Florian Stock

Note that there is a small problem with your code:

strcat() is used on an uninitialized buffer, which may lead to unpredictable behavior since strcat() searches for the zero terminator, which may or may not be present in the buffer allocated by malloc(). The fix would be to insert buffer[0] = 0; before the for loop.

Reply to
Patrick de Zeester

No, s is declared outside the loop - remember objects are only constructed once. What happens is that temporaries are created for s1 and each of the

  • operators. Each of the temporaries allocates its own string buffer. The final temporary is then assigned to s via the copy constructor.

Indeed. That's how people normally write C. The optimized STL case does a reserve to get the same effect.

It's the copy constructor so it uses memcpy just like the += operator.

Both already use the same strings as inputs and compute the result in a local buffer/string object. Why do you think pointers make a difference?

The reserve doesn't do much except in the first iteration. It's there since the strings s1..s4 could potentially change every iteration (if they did change the STL code would slow down further as it may have to allocate more memory every iteration).

Wilco

Reply to
Wilco Dijkstra

Or simply allocate the buffer with calloc(); That would take a few more CPU cycles than your solution, though.

Mark Borgerson

Reply to
Mark Borgerson

Those results look very similar to mine. The optimized C version looks a bit too fast though - I presume GCC managed to move most of it outside the loop...

That's not true. The first C version is naive by using strcat. The second uses the optimized approach of keeping track of the end of the string. The STL code has a similar naive approach and an optimized one. The interesting thing is that the naive C approach is still faster than the optimized STL...

Sure. But knowing how to use strcat/strcpy is much easier to learn than C++ and STL. So the chances of someone using naive STL are much higher than naive C. As this benchmark proves, the effect of naive STL can be bad.

Wilco

Reply to
Wilco Dijkstra

... snip ...

I have great difficulties whenever I see this sort of reference to "Dijkstra's claims". :-) You might include his first name.

--
 Chuck F (cbfalconer at maineline dot net)
   
 Click to see the full signature
Reply to
CBFalconer

It's not strcpy(), it's strlcpy(), together with strlcat(). The URL I specified leads you to the complete release, complete with documentation, tests, etc. At any rate, here is a copy of the heart of it:

size_t strlcpy(char *dst, const char *src, size_t sz) { const char *start = src;

if (src && sz--) { while ((*dst++ = *src)) if (sz--) src++; else { *(--dst) = '\0'; break; } } if (src) { while (*src++) continue; return src - start - 1; } else if (sz) *dst = '\0'; return 0; } /* strlcpy */

/* ---------------------- */

size_t strlcat(char *dst, const char *src, size_t sz) { char *start = dst;

while (*dst++) /* assumes sz >= strlen(dst) */ if (sz) sz--; /* i.e. well formed string */ dst--; return dst - start + strlcpy(dst, src, sz); } /* strlcat */

Try

--
 Chuck F (cbfalconer at maineline dot net)
   
 Click to see the full signature
Reply to
CBFalconer

No problem.

The one that sparked this quest.

Robert

--
Posted via a free Usenet account from http://www.teranews.com
Reply to
Robert Adsett

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.