Larkin, Power BASIC cannot be THAT good:

J

Jan Panteltje 17 years ago

The run time in C is 13 seconds here on a 1GHz processor. Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different. And a 3GHz would do it in 12 / 4 = 4 seconds... A bigger cache would help a bit perhaps. A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :-)

Vote

L

langwadt 17 years ago

I just tried in Matlab, on a 2GHz core2-duo with 2GB

with 32bit signed ints: ~2.5 second with 16bit signed ints: ~1.0 second with 64bit floats: ~4.0 second

-Lasse

Vote

T

Tim Williams 17 years ago

Typing the following into Open Watcom,

-=3D-=3D-

#include #include #include

#define ARRAY_SIZE 64000000

int main(void) {

short *a; int *s; int i; int startTime, endTime;

a =3D malloc(ARRAY_SIZE * sizeof(short)); s =3D malloc(ARRAY_SIZE * sizeof(int)); if (a =3D=3D NULL || s =3D=3D NULL) { printf("Memory allocation failed.\\n"); return -1; }

printf("Starting...\\n"); startTime =3D GetTickCount();

for (i =3D 0; i < ARRAY_SIZE; i++) { s[i] +=3D a[i]; }

endTime =3D GetTickCount();

printf("Total time taken adding %i array entries: %f seconds.\\n", ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);

free(a); free(s);

return 0; }

-=3D-=3D-

and saving as test.c and compiling, I get the typical output:

-=3D-=3D- E:\\WATCOM\\Projects>test Starting... Total time taken adding 64000000 array entries: 1.453000 seconds.

E:\\WATCOM\\Projects>test Starting... Total time taken adding 64000000 array entries: 1.546000 seconds.

-=3D-=3D-

My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently

472MB free, so no problems allocating the test), running Windows XP SP2. Basically state-of-the-art way back in the year 2001. If your computers are taking more than a couple seconds, either your compiler really sucks or your computers suck even more. :-)

Tim

Vote

J

Jan Panteltje 17 years ago

On a sunny day (Fri, 15 May 2009 08:35:25 -0700 (PDT)) it happened " snipped-for-privacy@fonz.dk" wrote in :

Yes, what I think happens is that those core2 duo execute those intructions a lot faster then my Celeron or whatever it is, so that would gain an other

200%, so Larkin's '''Old''' HP' must be a 3 GHz core?

Maybe I should upgrade to a more recent processor, but luckely I do not need to add 64M integers :-)

Vote

J

Jan Panteltje 17 years ago

On a sunny day (Fri, 15 May 2009 09:04:43 -0700 (PDT)) it happened Tim Williams wrote in :

Tim, you forgot that I was running >2 loops< inside each other, as Larkin's original post mentions:

for(j = 0; j < 10; j++) { for(i = 0; i < BIG_SIZE; i++) { mem[i] += b[i]; } }

So multiply your result by 10, and you got 15 seconds, even slower then me on the eeePC with 512 MB ram and 900 MHz celeron in Linux with gcc-4.0 :-) Sorry 'bout that ;-)

Vote

I

IanM 17 years ago

Just tried Tim's code (with another loop) compiled with Visual C++ 2008 express edition and run on a laptop with a 1.7GHz T2250 and 1GB Ram

2.23s

Vote

J

Jan Panteltje 17 years ago

On a sunny day (Fri, 15 May 2009 17:37:47 +0100) it happened "IanM" wrote in :

Tim's code does not have the 10 x outside loop, so that makes it 22.3 seconds.

Vote

M

Martin Brown 17 years ago

The only way I can get it to run that slow on my 3GHz old P4 HT chip is with full debugging enabled and the optimiser completely disabled in MS C++ Win32 console environment. I would hope for nearly an order of magnitude faster using SSE.

NB You really should initialise the arrays first.

Regards, Martin Brown

Vote

I

IanM 17 years ago

(with another loop) mean't I added the 10 * loop so still 2.23s

Vote

J

Jan Panteltje 17 years ago

On a sunny day (Fri, 15 May 2009 18:19:18 +0100) it happened Martin Brown wrote in :

Cool, I just tried with gcc -O4 and it runs in .56 the time :-)

That will not affect timing of the loop.

Vote

J

Jan Panteltje 17 years ago

On a sunny day (Fri, 15 May 2009 18:33:11 +0100) it happened "IanM" wrote in :

Yea, I am down to 7 seconds now compiling with -O4 on my eeePC.

Vote

J

John Larkin 17 years ago

Here's my PowerBasic code:

=================================================== #COMPILE EXE

' SUM.BAS ' TRY SUMMING A LOT OF INTS INTO AN ARRAY OF LONGS...

' JL MAY 14, 2009 PBCC4

FUNCTION PBMAIN () AS LONG

COLOR 15,9 CLS

DIM A(64000000) AS INTEGER ' INPUT ADC SAMPLES DIM S(64000000) AS LONG ' SUMMING ARRAY DIM X AS LONG DIM Y AS LONG DIM Z AS LONG

' INIT INPUT ARRAY TO RANDOM-ISH VALUES...

FOR X = 1 TO 64000000 ' THIS IS MUCH FASTER A(X) = X AND 32767 ' THAN CALLING RND()! NEXT

T! = TIMER

PRINT "Start... ";

FOR Y = 1 TO 10

FOR X = 1 TO 64000000 S(X) = S(X) + A(X) NEXT

E! = TIMER - T! PRINT USING$("Time per loop ##.### sec ##.## ns/add", E!/10,

1E9*E!/(10*64E6)) PRINT

' DISPLAY SOME RESULTS TO MAKE SURE IT REALLY WORKED...

FOR X = 1 TO 10 PRINT X, A(X), S(X) NEXT

PRINT

FOR X = 63999001 TO 63999010 PRINT X, A(X), S(X) NEXT

INPUT A$

END FUNCTION

===================================================

On my computer, a 1.9 GHz Xeon with 2G ram, winXP, I get this result...

Start... Done Time per loop 0.231 sec 3.61 ns/add

1 1 10 2 2 20 3 3 30 4 4 40 5 5 50 6 6 60 7 7 70 8 8 80 9 9 90 10 10 100 63999001 3097 30970 63999002 3098 30980 63999003 3099 30990 63999004 3100 31000 63999005 3101 31010 63999006 3102 31020 63999007 3103 31030 63999008 3104 31040 63999009 3105 31050 63999010 3106 31060

===================================================

One of my guys did a C version (I refuse to program in C) to run on the Kontron under Linux, a slightly slower CPU, 2G ram. I asked him for his source code, and he spent about a half hour cleaning it up to be presentable... which I asked him NOT to do. Anyhow, here it is:

mathsmash.c - a VERY crude benchmark * * time the sum of 64-million 16-bit integers into 64-million

32-integer sums. * * gcc -O3 mathsmash.c -o mathsmash.o * * NOTE: The loop is performed 10 times to make the measurement duration more reasonable. * * Timing is done by observation or including the system("date") functions. * * */

#define SIXTYFOURMILLION (0x100000 * 64) #define DATA_ARRAY_SIZE SIXTYFOURMILLION

#include

int main() { unsigned short *inbound_data;

unsigned int *sum_data;

int multiply;

unsigned long index = 0;

#if 0 /* Initialize data */ printf ("Zeroing data\n"); #endif

inbound_data = (unsigned short *) malloc (sizeof ( short ) * DATA_ARRAY_SIZE); sum_data = (unsigned *) malloc ((sizeof ( int )) * DATA_ARRAY_SIZE);

printf ("inb_ptr = 0x%08x, sum_ptr= 0x%08x\n", inbound_data, sum_data);

printf ("\n START sum operation...\n");

// system ("date");

for (multiply = 0; multiply < 10; multiply ++) // 10 x { for ( index = 0; index < DATA_ARRAY_SIZE; ++index ) sum_data[index] += inbound_data[index]; }

printf ("\n END sum operation...\n");

// system ("date");

}

=================================================== He commented out the system date things because they're buggy or something, and timed it with his wristwatch at about 0.25 seconds per

64M add, about the same as the PowerBasic.

He used subscripts, not pointers, as I did. The inner loop compiles to five instructions.

My program is prettier.

John

Vote

J

Jan Panteltje 17 years ago

On a sunny day (Fri, 15 May 2009 11:27:55 -0700) it happened John Larkin wrote in :

1E9*E!/(10*64E6))

32-integer sums.

duration more reasonable.

functions.

Thank you for that code John, but unfortunately I do not have Power BASIC. But you did mention you tried it in C. I wonder if compiling your C code with -O4 in Linux would make it faster then the power BASIC version, as it gains 2x speed here.

Vote

J

Jan Panteltje 17 years ago

Forget my remark about -O4, runs the same with your C code as -O3, about 7 seconds on my eeePC.

Vote

T

Tim Williams 17 years ago

In other languages--

Compiled in FreeBASIC version 0.17b.

-=3D-=3D- '$DYNAMIC DIM a(64000000) AS SHORT DIM s(64000000) AS INTEGER DIM i AS INTEGER

Start! =3D TIMER FOR i =3D 0 TO 63999999 s(i) +=3D a(i) NEXT

PRINT USING "One pass in ##.### seconds."; TIMER - Start!;

-=3D-=3D-

Typical output:

E:\\PROGRA~1\\FreeBASIC>test One pass in 1.439 seconds. E:\\PROGRA~1\\FreeBASIC>test One pass in 1.818 seconds.

So it offers fairly similar performance. I would suppose PowerBasic is also comparable.

Now, I could write the 16 bit assembly version and test that, too, but the 16 bit part would be kind of tricky given the dataset size. ;-) I could test a million loop iterations, but the whole thing would still fit inside processor cache, so it's not a fair comparison.

Tim

Vote

J

Jan Panteltje 17 years ago

On a sunny day (Fri, 15 May 2009 12:09:38 -0700 (PDT)) it happened Tim Williams wrote in :

Ah, a free-basic! I did a google, and downloaded the Linux version. But: test9.bas(1) error 135: Only valid in -lang deprecated or fblite or qb, found 'DYNAMIC' in ''$DYNAMIC' test9.bas(2) warning 24(2): Array too large for stack, consider making it var-len or SHARED test9.bas(3) warning 24(2): Array too large for stack, consider making it var-len or SHARED test9.bas(6) error 137: Suffixes are only valid in -lang deprecated or fblite or qb, found 'Start' in 'Start! =3D TIMER' test9.bas(7) error 12: Expected 'TO' in 'FOR i =3D 0 TO 63999999' test9.bas(7) error 3: Expected End-of-Line, found 'TO' in 'FOR i =3D 0 TO

63999999' test9.bas(8) error 3: Expected End-of-Line, found 'a' in 's(i) +=3D a(i)' test9.bas(11) error 137: Suffixes are only valid in -lang deprecated or fblite or qb, found 'Start' in 'PRINT USING "One pass in ##.### seconds."; TIMER - Start!;'

Not enough memory (385 MB) on this PC.

Nice BASIC anyways.

DIM i AS INTEGER

FOR i = 0 TO 10 print "HELLO" NEXT

grml: ~ # fbc test10.bas

grml: ~ # ./test10 HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO

LOL I once had some BASIC programs for inductors and stuff... Maybe all gone, CP/M and Sinclair times..

WOW! Have not used BASIC is ages...

Vote

J

John Larkin 17 years ago

seconds on my eeePC.

He used -O3, whatever that means.

Is that 7 seconds to add the array once, or for 10 times?

John

Vote

J

John Devereux 17 years ago

seconds on my eeePC.

I suspect this is all limited by the system memory bandwidth.

So that as long as the data sizes are the same, everything that is at all efficient will produce the same results.

John Devereux

Vote

J

Jan Panteltje 17 years ago

On a sunny day (Fri, 15 May 2009 13:02:18 -0700) it happened John Larkin wrote in :

seconds on my eeePC.

From 'man gcc' -O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.

-O4 seems to have disappeared from my man gcc, but gcc accepts it nevertheless. IIRC that was the most severe for speed optimisation known, but I could be wrong. I used tgcc-4.0, but also have 2.95 and 3.3 o nthsi system, some things may have changed and not the mamual... it does exists, people talk about it too,

formatting link

No, your code, or his code rather then, 10 x loop.

So, on a Celeron 900 MHz with 512 MB RAM, but this Celeron is clocked down to

670 or something like that on the eeePC to save power.... So if you multiply 670 * (7 / 2.2) = 2.13 GHz if the processor worked the same. I think that is about in the same ballpark as you have. I must admit pretty good for power-BASIC.

Vote

N

Nobody 17 years ago

For adding arrays, memory bandwidth will be the dominant factor. The ALU will spend most of its time idle, waiting upon memory I/O.

And if you don't have 512MB of RAM (64M * 2 * 4), then you're going to be swapping, which will totally kill performance.

Vote

Larkin, Power BASIC cannot be THAT good:

Join the Discussion

Didn't find your answer?