Larkin, Power BASIC cannot be THAT good:

The run time in C is 13 seconds here on a 1GHz processor. Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different. And a 3GHz would do it in 12 / 4 = 4 seconds... A bigger cache would help a bit perhaps. A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :-)

Reply to
Jan Panteltje
Loading thread data ...

I just tried in Matlab, on a 2GHz core2-duo with 2GB

with 32bit signed ints: ~2.5 second with 16bit signed ints: ~1.0 second with 64bit floats: ~4.0 second

-Lasse

Reply to
langwadt

Typing the following into Open Watcom,

-=3D-=3D-

#include #include #include

#define ARRAY_SIZE 64000000

int main(void) {

short *a; int *s; int i; int startTime, endTime;

a =3D malloc(ARRAY_SIZE * sizeof(short)); s =3D malloc(ARRAY_SIZE * sizeof(int)); if (a =3D=3D NULL || s =3D=3D NULL) { printf("Memory allocation failed.\\n"); return -1; }

printf("Starting...\\n"); startTime =3D GetTickCount();

for (i =3D 0; i < ARRAY_SIZE; i++) { s[i] +=3D a[i]; }

endTime =3D GetTickCount();

printf("Total time taken adding %i array entries: %f seconds.\\n", ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);

free(a); free(s);

return 0; }

-=3D-=3D-

and saving as test.c and compiling, I get the typical output:

-=3D-=3D- E:\\WATCOM\\Projects>test Starting... Total time taken adding 64000000 array entries: 1.453000 seconds.

E:\\WATCOM\\Projects>test Starting... Total time taken adding 64000000 array entries: 1.546000 seconds.

-=3D-=3D-

My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently

472MB free, so no problems allocating the test), running Windows XP SP2. Basically state-of-the-art way back in the year 2001. If your computers are taking more than a couple seconds, either your compiler really sucks or your computers suck even more. :-)

Tim

Reply to
Tim Williams

On a sunny day (Fri, 15 May 2009 08:35:25 -0700 (PDT)) it happened " snipped-for-privacy@fonz.dk" wrote in :

Yes, what I think happens is that those core2 duo execute those intructions a lot faster then my Celeron or whatever it is, so that would gain an other

200%, so Larkin's '''Old''' HP' must be a 3 GHz core?

Maybe I should upgrade to a more recent processor, but luckely I do not need to add 64M integers :-)

Reply to
Jan Panteltje

On a sunny day (Fri, 15 May 2009 09:04:43 -0700 (PDT)) it happened Tim Williams wrote in :

Tim, you forgot that I was running >2 loops< inside each other, as Larkin's original post mentions:

for(j = 0; j < 10; j++) { for(i = 0; i < BIG_SIZE; i++) { mem[i] += b[i]; } }

So multiply your result by 10, and you got 15 seconds, even slower then me on the eeePC with 512 MB ram and 900 MHz celeron in Linux with gcc-4.0 :-) Sorry 'bout that ;-)

Reply to
Jan Panteltje

Just tried Tim's code (with another loop) compiled with Visual C++ 2008 express edition and run on a laptop with a 1.7GHz T2250 and 1GB Ram

2.23s

Reply to
IanM

On a sunny day (Fri, 15 May 2009 17:37:47 +0100) it happened "IanM" wrote in :

Tim's code does not have the 10 x outside loop, so that makes it 22.3 seconds.

Reply to
Jan Panteltje

The only way I can get it to run that slow on my 3GHz old P4 HT chip is with full debugging enabled and the optimiser completely disabled in MS C++ Win32 console environment. I would hope for nearly an order of magnitude faster using SSE.

NB You really should initialise the arrays first.

Regards, Martin Brown

Reply to
Martin Brown

(with another loop) mean't I added the 10 * loop so still 2.23s

Reply to
IanM

On a sunny day (Fri, 15 May 2009 18:19:18 +0100) it happened Martin Brown wrote in :

Cool, I just tried with gcc -O4 and it runs in .56 the time :-)

That will not affect timing of the loop.

Reply to
Jan Panteltje

On a sunny day (Fri, 15 May 2009 18:33:11 +0100) it happened "IanM" wrote in :

Yea, I am down to 7 seconds now compiling with -O4 on my eeePC.

Reply to
Jan Panteltje

Here's my PowerBasic code:

=================================================== #COMPILE EXE

' SUM.BAS ' TRY SUMMING A LOT OF INTS INTO AN ARRAY OF LONGS...

' JL MAY 14, 2009 PBCC4

FUNCTION PBMAIN () AS LONG

COLOR 15,9 CLS

DIM A(64000000) AS INTEGER ' INPUT ADC SAMPLES DIM S(64000000) AS LONG ' SUMMING ARRAY DIM X AS LONG DIM Y AS LONG DIM Z AS LONG

' INIT INPUT ARRAY TO RANDOM-ISH VALUES...

FOR X = 1 TO 64000000 ' THIS IS MUCH FASTER A(X) = X AND 32767 ' THAN CALLING RND()! NEXT

T! = TIMER

PRINT "Start... ";

FOR Y = 1 TO 10

FOR X = 1 TO 64000000 S(X) = S(X) + A(X) NEXT

NEXT

PRINT "Done"

E! = TIMER - T! PRINT USING$("Time per loop ##.### sec ##.## ns/add", E!/10,

1E9*E!/(10*64E6)) PRINT

' DISPLAY SOME RESULTS TO MAKE SURE IT REALLY WORKED...

FOR X = 1 TO 10 PRINT X, A(X), S(X) NEXT

PRINT

FOR X = 63999001 TO 63999010 PRINT X, A(X), S(X) NEXT

INPUT A$

END FUNCTION

===================================================

On my computer, a 1.9 GHz Xeon with 2G ram, winXP, I get this result...

Start... Done Time per loop 0.231 sec 3.61 ns/add

1 1 10 2 2 20 3 3 30 4 4 40 5 5 50 6 6 60 7 7 70 8 8 80 9 9 90 10 10 100 63999001 3097 30970 63999002 3098 30980 63999003 3099 30990 63999004 3100 31000 63999005 3101 31010 63999006 3102 31020 63999007 3103 31030 63999008 3104 31040 63999009 3105 31050 63999010 3106 31060

===================================================

One of my guys did a C version (I refuse to program in C) to run on the Kontron under Linux, a slightly slower CPU, 2G ram. I asked him for his source code, and he spent about a half hour cleaning it up to be presentable... which I asked him NOT to do. Anyhow, here it is:

  • mathsmash.c - a VERY crude benchmark * * time the sum of 64-million 16-bit integers into 64-million
32-integer sums. * * gcc -O3 mathsmash.c -o mathsmash.o * * NOTE: The loop is performed 10 times to make the measurement duration more reasonable. * * Timing is done by observation or including the system("date") functions. * * */

#define SIXTYFOURMILLION (0x100000 * 64) #define DATA_ARRAY_SIZE SIXTYFOURMILLION

#include

int main() { unsigned short *inbound_data;

unsigned int *sum_data;

int multiply;

unsigned long index = 0;

#if 0 /* Initialize data */ printf ("Zeroing data\n"); #endif

inbound_data = (unsigned short *) malloc (sizeof ( short ) * DATA_ARRAY_SIZE); sum_data = (unsigned *) malloc ((sizeof ( int )) * DATA_ARRAY_SIZE);

printf ("inb_ptr = 0x%08x, sum_ptr= 0x%08x\n", inbound_data, sum_data);

printf ("\n START sum operation...\n");

// system ("date");

for (multiply = 0; multiply < 10; multiply ++) // 10 x { for ( index = 0; index < DATA_ARRAY_SIZE; ++index ) sum_data[index] += inbound_data[index]; }

printf ("\n END sum operation...\n");

// system ("date");

}

=================================================== He commented out the system date things because they're buggy or something, and timed it with his wristwatch at about 0.25 seconds per

64M add, about the same as the PowerBasic.

He used subscripts, not pointers, as I did. The inner loop compiles to five instructions.

My program is prettier.

John

Reply to
John Larkin

On a sunny day (Fri, 15 May 2009 11:27:55 -0700) it happened John Larkin wrote in :

1E9*E!/(10*64E6))
32-integer sums.

duration more reasonable.

functions.

Thank you for that code John, but unfortunately I do not have Power BASIC. But you did mention you tried it in C. I wonder if compiling your C code with -O4 in Linux would make it faster then the power BASIC version, as it gains 2x speed here.

Reply to
Jan Panteltje

Forget my remark about -O4, runs the same with your C code as -O3, about 7 seconds on my eeePC.

Reply to
Jan Panteltje

In other languages--

Compiled in FreeBASIC version 0.17b.

-=3D-=3D- '$DYNAMIC DIM a(64000000) AS SHORT DIM s(64000000) AS INTEGER DIM i AS INTEGER

Start! =3D TIMER FOR i =3D 0 TO 63999999 s(i) +=3D a(i) NEXT

PRINT USING "One pass in ##.### seconds."; TIMER - Start!;

-=3D-=3D-

Typical output:

E:\\PROGRA~1\\FreeBASIC>test One pass in 1.439 seconds. E:\\PROGRA~1\\FreeBASIC>test One pass in 1.818 seconds.

So it offers fairly similar performance. I would suppose PowerBasic is also comparable.

Now, I could write the 16 bit assembly version and test that, too, but the 16 bit part would be kind of tricky given the dataset size. ;-) I could test a million loop iterations, but the whole thing would still fit inside processor cache, so it's not a fair comparison.

Tim

Reply to
Tim Williams

On a sunny day (Fri, 15 May 2009 12:09:38 -0700 (PDT)) it happened Tim Williams wrote in :

Ah, a free-basic! I did a google, and downloaded the Linux version. But: test9.bas(1) error 135: Only valid in -lang deprecated or fblite or qb, found 'DYNAMIC' in ''$DYNAMIC' test9.bas(2) warning 24(2): Array too large for stack, consider making it var-len or SHARED test9.bas(3) warning 24(2): Array too large for stack, consider making it var-len or SHARED test9.bas(6) error 137: Suffixes are only valid in -lang deprecated or fblite or qb, found 'Start' in 'Start! =3D TIMER' test9.bas(7) error 12: Expected 'TO' in 'FOR i =3D 0 TO 63999999' test9.bas(7) error 3: Expected End-of-Line, found 'TO' in 'FOR i =3D 0 TO

63999999' test9.bas(8) error 3: Expected End-of-Line, found 'a' in 's(i) +=3D a(i)' test9.bas(11) error 137: Suffixes are only valid in -lang deprecated or fblite or qb, found 'Start' in 'PRINT USING "One pass in ##.### seconds."; TIMER - Start!;'

Not enough memory (385 MB) on this PC.

Nice BASIC anyways.

DIM i AS INTEGER

FOR i = 0 TO 10 print "HELLO" NEXT

grml: ~ # fbc test10.bas

grml: ~ # ./test10 HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO

LOL I once had some BASIC programs for inductors and stuff... Maybe all gone, CP/M and Sinclair times..

WOW! Have not used BASIC is ages...

Reply to
Jan Panteltje

seconds on my eeePC.

He used -O3, whatever that means.

Is that 7 seconds to add the array once, or for 10 times?

John

Reply to
John Larkin

seconds on my eeePC.

I suspect this is all limited by the system memory bandwidth.

So that as long as the data sizes are the same, everything that is at all efficient will produce the same results.

--

John Devereux
Reply to
John Devereux

On a sunny day (Fri, 15 May 2009 13:02:18 -0700) it happened John Larkin wrote in :

seconds on my eeePC.

From 'man gcc' -O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.

-O4 seems to have disappeared from my man gcc, but gcc accepts it nevertheless. IIRC that was the most severe for speed optimisation known, but I could be wrong. I used tgcc-4.0, but also have 2.95 and 3.3 o nthsi system, some things may have changed and not the mamual... it does exists, people talk about it too,

formatting link

No, your code, or his code rather then, 10 x loop.

So, on a Celeron 900 MHz with 512 MB RAM, but this Celeron is clocked down to

670 or something like that on the eeePC to save power.... So if you multiply 670 * (7 / 2.2) = 2.13 GHz if the processor worked the same. I think that is about in the same ballpark as you have. I must admit pretty good for power-BASIC.
Reply to
Jan Panteltje

For adding arrays, memory bandwidth will be the dominant factor. The ALU will spend most of its time idle, waiting upon memory I/O.

And if you don't have 512MB of RAM (64M * 2 * 4), then you're going to be swapping, which will totally kill performance.

Reply to
Nobody

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.