Larkin, Power BASIC cannot be THAT good:

On a sunny day (Wed, 20 May 2009 15:36:47 -0400) it happened Phil Hobbs wrote in :

Well, I seem to do that just fine in C. For example in the subtitle editor I was referring to you have fonts, video display, audio track, and the sync between all those, many forms, many output formats, many input formats, I would see only more complexity in C++. You can download the code and look at if if you like.

Sure, but if I have a function that does something, with correctly specified interface like for example int draw_picture(struct screen *ps, struct image *pi) than I can change that function, make it faster for example, or add more screen types, more image types, whatever.

You talk about profiling, John Larkin would say; 'solve by adding more bloat'.... hell, if you know hat you are doing then WHERE is the problem? And if you do _not_ know what you are doing then not any extra tools will help you get the bugs out!

Exactly, same for C.

Scripts can be very powerful. I found out how powerful bash (scripting) can be when somebody wrote DVDwizard in bash... Of course he called every C program you can imagine.

Reply to
Jan Panteltje
Loading thread data ...

On a sunny day (Wed, 20 May 2009 21:49:22 GMT) it happened snipped-for-privacy@puntnl.niks (Nico Coesel) wrote in :

I am not sure what you mean here. What I often, if not always, do is use some global vars, namely those vars that are used all over the place so I do not want to pass these to every function.

In the main header file for example int debug_flag; will make that available in all source files, without needing further reference to 'extern' or whatever. In the source files you can use static int something; and it will only be known in those source files. etc.

related thread,

Yes, that would be weird, and it would be weird as it would show there was no communication. But, in your source file you could simply write static ... draw_circle(...) And he could do it in his source file, and no collisions! But that communication thing I referred to is very important, it can make you laugh (or cry), that NASA spacecraft to mars had software that had to check some micro switch on the landing gear to see if it touched ground to know if it could switch off the rocket engine. To bad the spec did not mention the micro switch also was activated when the gear went out.

So gear came out, engine stopped, and it smashed on mars into pieces. So communication, have meetings. discuss what you are doing. Else it is a ticket to disaster.

Well, maybe I have read the wrong books?

here, needed things...

But I do not NEED WxDinges :-)

Reply to
Jan Panteltje

It gets very hard to keep everything straight at 10-100+ kloc this way, at least if your program needs to be maintained. Scars accumulate much faster than in C++.

But you have to change everything at all levels if you make a nontrivial change at the bottom--unless you have done all the abstraction correctly, which is what C++ automates. If you find you haven't, it's spaghetti city.

you

Don't you profile your code? There's really no other way to find out where the bottlenecks are in a nontrivial, competently coded program. If they're obvious without profiling, they shouldn't have been coded in in the first place.

There are no copy constructors in C.

in bash...

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal
 Click to see the full signature
Reply to
Phil Hobbs

On a sunny day (Wed, 20 May 2009 23:45:53 +0200) it happened Frank Buss wrote in :

Ah Sun, OK thanks. But busybox is not LGPL I think. They sued some companies for not releasing source.

Reply to
Jan Panteltje

On a sunny day (Wed, 20 May 2009 18:22:22 -0400) it happened Phil Hobbs wrote in :

Like I said, this newsreader is 70k lines of C, no problem. As with any code, if I at any time needed to change anything, it would take me several hours to really dig in and see what is connected to what. It is linked list based. This digging in is important, some years ago somebody send me a 'fix' for some function, patch, I applied it after quick look, full faith the person had actually tested his code and understood mine, then it appeared some things stopped working.., But this is not unique to C, it happens in any language. You must see that from huge projects like the Linux kernel + drivers, that C works very well, in fact much better then C++, for projects that have many participants. I think C++ is unusable for such projects :-) You have been had.

That makes no sense to me. If I write (and I speak for myself only, now) a program, I start at the lowest level, and I test one piece at the time, starting at the lowest level. But I do not start at all before I have a clear picture of what I want to do.

If you do not have a clear picture of what you want to do, and start the top down approach, then you will have to repeatedly work backwards and re-write the code, no end to the rewrites. An architect who does not know about bricks and concrete will be in big trouble. He will have to learn, and 99.999 % certain have to re-start his dream project from scratch.

help you

That is, with all respect (I am not picking on you) bull. When writing your program, and especially in AV applications, thatt are *always* time bound and time critical, you know very well where the processor spends most of the time. Usually it is in some codec. Often these specific parts of the codec are written in asm for max performance speed wise. When writing such a codec you count instruction cycles, etc, on a piece of paper... work it out. Who needs a profiler if you know what you are doing? And if you need more info, then a couple of print statements will tell you everything you need to know. I never ever use a debugger either, never, I can read asm, and I use printf. There was a paper by uni Twenthe many years ago that argued against debuggers and for print statements in higher level languages. For me C is a higher level language, as is BASIC, I took that paper to heart, and it works. C++ is of course lower level crap ;-) so nothing you can buy will help you with that :-)

bull.

Would be nice if people understood the power of linked lists. I am not even good at that, some people do miracles with that. that is worth more then the whole ++ load of.

Reply to
Jan Panteltje

...

Ah, if you only knew the power of binary trees...

Michael

Reply to
mrdarrett

:-)

It's easy enough to implement an OO-style class hierarchy in C. The real kicker is construction and destruction. C++ lets you ensure that an object's constructor is called before anything else uses it, and its destructor is called when it goes out of scope, whether by a return or by an exception.

Reply to
Nobody

Clearly. C++ doesn't include any features specifically related to concurrency.

Reply to
Nobody

What does the "packet" thing do? I can't find it in any of the C or C++ keyword lists.

Does it round up the struct size? The loop appears to be crunching

8-byte chunks, so two bytes are dead weight that wastes cache.

John

Reply to
John Larkin

It's a user-defined data structure. Basically, it takes two integers and stores them side-by-side.

Let's pretend that we're using 32-bit ints. sizeof(int) =3D 4 bytes. (It takes 4 bytes to store a 32-bit value... 4x8=3D32.)

By defining a packet, we just have two ints (x and y). 8 bytes to a packet.

The idea was just to have them closer together, so that the CPU cache could access them more easily (they're side-by-side, vs. two arrays in random areas of memory.) If the CPU wanted to download 256 bytes of data from memory to cache, it could get 256 divided by 8 packets, or

32 packets at a time sent to faster cache memory. But what surprises me is that this arrangement actually slows down the code. The assembly output isn't much different from Tim Williams' original - it still looks something like

movsx ecx,word ptr [somewhere] add dword ptr [somewhere_else],ecx

My assembly language experience ends somewhere with the 386, so cache hit/miss is beyond my area of expertise. Sorry. But I thought it might shine a light on a path worth of exploration.

Michael

Reply to
mrdarrett

or

and many if

in

written in

know

see

Sure,

but

is

learnt

future).

kind

er

numbers,

renumber routine.

allowed me nice

a

as

started

use

most

implementing

rather

=20

=20

Over 20 years ago a coworker of mine wrote a comefrom static analysis program to analyze his programs. He was rather shocked to see how much he had been abusing goto. And this was with a version of basic that had subroutines with parameters.

Reply to
JosephKK

,

That's odd... I'd expect the assembly output to be movsx ecx,Dword ptr [somewhere] instead of word ptr [somewhere]... double-word.

Hmm...

Michael

Reply to
mrdarrett

That is about right for a non-optimising naive native code compiler that saves every loop variable back to memory. When fully optimised to be all in registers the figure should come down to 2.2s or so.

There is a small difference between the BASIC and C. You are using signed INTEGERs he is using unsigned. It shouldn't affect the runtime though provided that the C compiler generates the right opcodes.

That is believable. Most compilers get between 0.22 and 0.3 depending on how fast the memory subsystem is under sustained sequential access.

Have you tested his code on your PC and your code on his embedded system? It could be that DMA transfer of raw data is robbing him of memory bandwidth. Or the Kontron board has other memory speed issues.

It is difficult to see how even the dumbest compiler could get this to take more than 0.5s per loop on modern hardware. Be interesting to see the generated code for this loop. If it looks sensible then we can establish that you are looking at a hardware problem.

Only to boneheaded BASIC hackers.

It is the memory subsystem that isn't performing. You could add additional computation to the loop and it should not affect the timing.

Regards, Martin Brown

Reply to
Martin Brown

I regularly use QuickBasic, which is quite capable of OO style subroutines (although calling QB OO, with its narrow "object" functionality, is a stretch), but which of course allows GOTOs. I find there are a number of places were GOTO is simply too useful to pass up. Here's a recent example:

. . . FOR i =3D 1 TO LEN(In$) k =3D ASC(MID$(In$, i, 1)) l =3D Ascii2Seven(k) m$ =3D Num2Hex$(l AND 255) GOSUB PrintByte IF l > 255 THEN m$ =3D Num2Hex$((l AND &HFF00&) \ &H100) GOSUB PrintByte END IF NEXT PRINT . . .

So it chews through In$, one byte at a time, doing whatever Ascii2Seven does (ooh, a user-defined function!). The rub is, for expandability reasons, Ascii2Seven may return a one-digit number (i.e., values 0 to 255) or a two-digit number (I believe the sign bit is set in this case). Printing the other byte requires a seperate call to PrintByte, for state reasons. SUB would be inelegant here because a number of variables would either have to be passed as operands or SHAREd globally, neither of which really make any sense. Printing one or both bytes at the same time (which could, in principle, be done with one call to, say, PrintSeven instead of PrintByte) just moves things around and meets the same problem.

Other times, I've used GOTO or GOSUB to call a routine which has a lot of common variables (i.e., its scope includes main) or to branch to a single routine from multiple exit points out of a SELECT CASE or somesuch.

Most of the time, I do write SUBs and FUNCTIONs, great for compact bits of code that might be useful in other places, or which are fairly specialized but have little effect on other parts of the program (for instance, a game engine, menu, physics, AI and other systems can all be sectioned into seperate subroutines, for the most part). On average I would suppose GOTO and GOSUB count as "rare" (

Reply to
Tim Williams

My first instinct was that it might have created an object where indexing was in multiples of 6 bytes, but it has correctly padded to 8.

The explanation is that with two distinct arrays you have something else to do inside the loop whilst waiting for the cache to load. The structure you have created hits the same block again far too quickly and so ends up waiting on both every time. Worse when the write through cache is active your read from the second chunk in the same dirty cache line are compromised.

Because data lengths are different part way through the old loop with the original code the movsx 16 bit fetch becomes available in cache. The

32 bit fetches and stores are always running ahead of the caches ability to satisfy them.

The standard method for cache aware algorithms is to work on as many distinct cache blocks simultaneously as the architecture will allow.

That way you are using the delay time of the first fetch constructively. I have to say on the newest Pentiums there is almost no difference. None of the tricks I know will speed it up significantly past 0.22s on my box. There isn't enough work being done inside the inner loop.

Creates a user defined type 8 bytes long containing a .x and a .y

Although it might waste cache space having matched size operands in separate arrays might still be faster - SIMD vector instructions are better for that case. SSE 128bit registers can do 4 parallel 32 bit adds moving 16 bytes at a time. You would need to test it.

Regards, Martin Brown

Reply to
Martin Brown

On a sunny day (Wed, 20 May 2009 16:14:39 -0700 (PDT)) it happened snipped-for-privacy@gmail.com wrote in :

Yes, right, and this newsreader uses one, when I started writing it, speed was an issue, think 60 MHz processor, sorting thousands of newsgroups... So for the sorting I used a binary tree. In the current version you can still disable sorting :-) (GROUPS->SORTED DISPLAY) that made it faster... These days with so much processor power and binary trees... things become almost instantaneous.

Reply to
Jan Panteltje

On a sunny day (Thu, 21 May 2009 01:19:01 +0100) it happened Nobody wrote in :

But then C++ is not really an OO language :-)

Reply to
Jan Panteltje

Yes. I write thousands-of-lines programs, in various languages, that are almost entirely flat, with every variable global and static, a big GOTO state machine, with most subroutines done as JSR/RTS or GOSUB/RETURN. If you do this carefully, you get fast, tight, clean code, no possibility of memory or stack problems, and *zero* bugs. If you do any programming carelessly, you get an ugly mess.

Anybody who ships embedded systems that have bugs is a bad programmer, in any language. I'm on the mailing list for one analytical instrument software package from a big company that uses all the latest languages and version/bug control tricks. They average about ten new reported bugs per week.

John

Reply to
John Larkin

DISPLAY)

Even linear searching can be almost instantaneous. In our material control system, at startup we read the entire database into a memory-resident array of records, and a parts search is brute-force linear string compares. It's so fast it doesn't matter.

John

Reply to
John Larkin

Since this is signed ADC data, my program is likely to produce better signal averaging.

The original C program summed the data in 0.7 seconds on the Kontron, as against my 0.22 on my Windows PC. They have similar clock rates and both have 2G ram. After a few iterations and mucking with the compiler, he has got his speed down close to mine. It's amusing that a thing this simple, as coded and compiled the obvious way in C, was slow by over 3:1 from my Basic.

My boneheaded BASIC program is both fast and correct, and is a model for my C programmer to aspire to. I suppose some people prefer C++ code that is elegant, slow, and wrong. But I'm just a simple engineer, so perhaps slow and wrong is a more sophisticated programming methodology these days.

Switching to down-counting got me close to 0.2 seconds, so that's about as good as we're going to do. We can move the summing into the acquisition FPGA (which has local DRAM) if we have to, but that would be a real pain to code.

John

Reply to
John Larkin

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.