The run time in C is 13 seconds here on a 1GHz processor. Can you specify your 'old HP computer' ?
I can win maybe 1 second by writing the code a bit different. And a 3GHz would do it in 12 / 4 = 4 seconds... A bigger cache would help a bit perhaps. A Cray would be even better.
What does you C code look like? Mine is in the other posting.
a =3D malloc(ARRAY_SIZE * sizeof(short)); s =3D malloc(ARRAY_SIZE * sizeof(int)); if (a =3D=3D NULL || s =3D=3D NULL) { printf("Memory allocation failed.\\n"); return -1; }
for (i =3D 0; i < ARRAY_SIZE; i++) { s[i] +=3D a[i]; }
endTime =3D GetTickCount();
printf("Total time taken adding %i array entries: %f seconds.\\n", ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);
free(a); free(s);
return 0; }
-=3D-=3D-
and saving as test.c and compiling, I get the typical output:
-=3D-=3D- E:\\WATCOM\\Projects>test Starting... Total time taken adding 64000000 array entries: 1.453000 seconds.
E:\\WATCOM\\Projects>test Starting... Total time taken adding 64000000 array entries: 1.546000 seconds.
-=3D-=3D-
My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently
472MB free, so no problems allocating the test), running Windows XP SP2. Basically state-of-the-art way back in the year 2001. If your computers are taking more than a couple seconds, either your compiler really sucks or your computers suck even more. :-)
On a sunny day (Fri, 15 May 2009 08:35:25 -0700 (PDT)) it happened " snipped-for-privacy@fonz.dk" wrote in :
Yes, what I think happens is that those core2 duo execute those intructions a lot faster then my Celeron or whatever it is, so that would gain an other
200%, so Larkin's '''Old''' HP' must be a 3 GHz core?
Maybe I should upgrade to a more recent processor, but luckely I do not need to add 64M integers :-)
So multiply your result by 10, and you got 15 seconds, even slower then me on the eeePC with 512 MB ram and 900 MHz celeron in Linux with gcc-4.0 :-) Sorry 'bout that ;-)
The only way I can get it to run that slow on my 3GHz old P4 HT chip is with full debugging enabled and the optimiser completely disabled in MS C++ Win32 console environment. I would hope for nearly an order of magnitude faster using SSE.
One of my guys did a C version (I refuse to program in C) to run on the Kontron under Linux, a slightly slower CPU, 2G ram. I asked him for his source code, and he spent about a half hour cleaning it up to be presentable... which I asked him NOT to do. Anyhow, here it is:
mathsmash.c - a VERY crude benchmark * * time the sum of 64-million 16-bit integers into 64-million
32-integer sums. * * gcc -O3 mathsmash.c -o mathsmash.o * * NOTE: The loop is performed 10 times to make the measurement duration more reasonable. * * Timing is done by observation or including the system("date") functions. * * */
for (multiply = 0; multiply < 10; multiply ++) // 10 x { for ( index = 0; index < DATA_ARRAY_SIZE; ++index ) sum_data[index] += inbound_data[index]; }
printf ("\n END sum operation...\n");
// system ("date");
}
=================================================== He commented out the system date things because they're buggy or something, and timed it with his wristwatch at about 0.25 seconds per
64M add, about the same as the PowerBasic.
He used subscripts, not pointers, as I did. The inner loop compiles to five instructions.
On a sunny day (Fri, 15 May 2009 11:27:55 -0700) it happened John Larkin wrote in :
1E9*E!/(10*64E6))
32-integer sums.
duration more reasonable.
functions.
Thank you for that code John, but unfortunately I do not have Power BASIC. But you did mention you tried it in C. I wonder if compiling your C code with -O4 in Linux would make it faster then the power BASIC version, as it gains 2x speed here.
-=3D-=3D- '$DYNAMIC DIM a(64000000) AS SHORT DIM s(64000000) AS INTEGER DIM i AS INTEGER
Start! =3D TIMER FOR i =3D 0 TO 63999999 s(i) +=3D a(i) NEXT
PRINT USING "One pass in ##.### seconds."; TIMER - Start!;
-=3D-=3D-
Typical output:
E:\\PROGRA~1\\FreeBASIC>test One pass in 1.439 seconds. E:\\PROGRA~1\\FreeBASIC>test One pass in 1.818 seconds.
So it offers fairly similar performance. I would suppose PowerBasic is also comparable.
Now, I could write the 16 bit assembly version and test that, too, but the 16 bit part would be kind of tricky given the dataset size. ;-) I could test a million loop iterations, but the whole thing would still fit inside processor cache, so it's not a fair comparison.
On a sunny day (Fri, 15 May 2009 12:09:38 -0700 (PDT)) it happened Tim Williams wrote in :
Ah, a free-basic! I did a google, and downloaded the Linux version. But: test9.bas(1) error 135: Only valid in -lang deprecated or fblite or qb, found 'DYNAMIC' in ''$DYNAMIC' test9.bas(2) warning 24(2): Array too large for stack, consider making it var-len or SHARED test9.bas(3) warning 24(2): Array too large for stack, consider making it var-len or SHARED test9.bas(6) error 137: Suffixes are only valid in -lang deprecated or fblite or qb, found 'Start' in 'Start! =3D TIMER' test9.bas(7) error 12: Expected 'TO' in 'FOR i =3D 0 TO 63999999' test9.bas(7) error 3: Expected End-of-Line, found 'TO' in 'FOR i =3D 0 TO
63999999' test9.bas(8) error 3: Expected End-of-Line, found 'a' in 's(i) +=3D a(i)' test9.bas(11) error 137: Suffixes are only valid in -lang deprecated or fblite or qb, found 'Start' in 'PRINT USING "One pass in ##.### seconds."; TIMER - Start!;'
On a sunny day (Fri, 15 May 2009 13:02:18 -0700) it happened John Larkin wrote in :
seconds on my eeePC.
From 'man gcc' -O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.
-O4 seems to have disappeared from my man gcc, but gcc accepts it nevertheless. IIRC that was the most severe for speed optimisation known, but I could be wrong. I used tgcc-4.0, but also have 2.95 and 3.3 o nthsi system, some things may have changed and not the mamual... it does exists, people talk about it too,
formatting link
No, your code, or his code rather then, 10 x loop.
So, on a Celeron 900 MHz with 512 MB RAM, but this Celeron is clocked down to
670 or something like that on the eeePC to save power.... So if you multiply 670 * (7 / 2.2) = 2.13 GHz if the processor worked the same. I think that is about in the same ballpark as you have. I must admit pretty good for power-BASIC.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.