Here is some new bad news, and i mean really bad news

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 3:03 AM

Interpreted languages generally compile to bytecode, while compiled languages compile to native machine code, which is a whole lot faster.

Well, there you are - isn't "user error" another word for performance bug?

The software world has periodic language wars.

I particularly recall Ada83 versus C. Both are compiled languages, but Ada is far more complex a language, as judged by the sizes of their respective compilers. We would read article after article where a world class Ada expert would produce Ada programs that ran circles around the C programs produced by some duffer C programmers, and declare that Ada was therefore the better language.

Ten or twenty years before, the spectacle was Fortran compiler vendors claiming that their compilers generated executable code that was faster than that produced by assembly programmers. Well, not if you get a real assembly programmer. But hardware got fast enough that we no longer had to care.

Joe Gwinn

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 3:55 AM

PB is a real compiler. It lets you do inline assembly, too. In assembly, you can refer to Basic variables and labels by name, so linkages are easy.

--

John Larkin                  Highland Technology Inc 
www.highlandtechnology.com   jlarkin at highlandtechnology dot com    

Precision electronic instrumentation

- J
- Jasen Betts
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 3:56 AM

Power basic (and quickbasic, turbo-basic and probably several others) compile to machine code.

Compilable basics often don't have those features.

--
umop apisdn

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 4:53 AM

QuickBasic compiled to byte code, which is opcodes for a pseudo-machine. TurboBasic was the predecessor of PowerBasic, both true compilers.

I think Pythyon is usually a byte-code compiler, roughly 1/3 as fast as a real machine-code compiler.

--

John Larkin                  Highland Technology Inc 
www.highlandtechnology.com   jlarkin at highlandtechnology dot com    

Precision electronic instrumentation

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 6:13 AM

"John Larkin" wrote in message news: snipped-for-privacy@4ax.com...

I've never heard that before; cite?

Perhaps you're thinking of compressed .BAS files, and presumably, QB's internal structure? Every time you write a keyword and switch lines, it parses the line and translates it into symbols, operations and keywords. Any changes detected in the string space (say you wrote xPOS instead of xPos) are detected by case insensitivity (all instances magically change to the latest capitalization). I'm guessing that same form is saved into their compressed format (possibly with some text compression as well, for the variable names).

From what little I've played with debugging QB programs, there doesn't appear to be some abstract instruction space it draws from. The output is clearly recognizable as the product of that sort of procedure, though: FAR CALLs *eeeeeverywhere*. It's like Minecraft for instructions. Before Minecraft was even a thing. (Minecraft, speaking of, is written in Java...)

Basic INTEGER arithmetic operations, I think, were at least done in "main", but woe unto you if you accidentally perform floating point arithmetic: it goes through something like 20 calls for one operation. If you had a math co, it was sort of tolerable. Amazingly, the typical savings (fixed over floating point) was something like double speed (including a math co). Needless to say, any kind of number crunching is

*slow*.

Tim

--
Seven Transistor Labs 
Electrical Engineering Consultation 
Website: http://seventransistorlabs.com

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 7:22 AM

If you have an existing Fortran program and give it to an assembler programmer and ask to write it in assembly, the compiler could well produce a better result.

However, if the assembler programmer starts from scratch with the functional requirements only, the assembly code might be better. For instance with global register assignment, you can get away with lots of the high level language parameter passing overhead or use some specialized instructions that can't be expressed in a HLL.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 8:10 AM

Although what you say is true of many interpreted language compilers it is not true of all of them. There are fully optimising Basic and Lisp compilers about that can do JIT compilation to native code and in some cases full global program optimisation.

This is a bit weird. I usually end up being rude about the "magical" claims that Larkin makes for his beloved PowerBasic but in this case he is right - it is a native code optimising compiler with a better grasp of optimising the sort of loops he needed than the C compiler that they were using as coded by their "senior C programmer".

Choice of compiler and pragmas as I recall.

Generally happens in these language wars and they generate a lot more heat than light. C optimisers have to be very careful what they do. Other languages lend themselves to easier global code optimisation.

Sometimes we do when transforming large arrays in realtime. Optimising the performance of the cache architecture and avoiding pipeline stalls can be absolutely critical to optimal performance.

An assembler programmer today would have to work extremely hard to beat a modern optimising compiler at avoiding pipeline stalls on a modern CPU. I doubt if more than a handful of people on the planet could do it instinctively without using the internal chip diagnostics to get feedback on how and where the stalls and bottlenecks are occurring.

The exact fastest code depends critically on the CPU model number and cache structure. Certain programs like FFTW are self tuning to optimises for a given CPU architecture once they have been trained.

--
Regards, 
Martin Brown

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 9:42 AM

Maybe you are misunderstanding me...

The AVR and the MSP430 are much "nicer" to work with than the 8051, the COP8, the HPC and the PIC devices (though the HPC wasn't too bad). The AVR and the MSP430 are RISC architectures, the others listed are CISC. And generally the AVR and the MSP430 have more compact code than the others - although I certainly don't think that's a fair meaning for "nice".

There is no fixed, absolute definition of what is RISC and what is CISC. There are a number of characteristics of processor design that are typical "RISC" characteristics, and a number that are typical "CISC" characteristics. Most processors have a mix from both groups, but often have a large enough proportion from one group to be able to classify it. There are some processors that are too mixed to be fairly called RISC or CISC.

Accumulators, special registers, and small register sets are CISC characteristics. Multiple identical registers with an orthogonal instruction set are RISC characteristics. So yes, having an accumulator and a small number of registers is one reason for classifying the 8051, COP8, PIC as "CISC". Other characteristics are instructions and addressing modes for operating directly on memory (rather than a load-store architecture common to most RISC cpus), varied instruction lengths, complex instructions (relative to the size of the core), instructions that do multiple tasks, and very varied instruction timing.

"RISC" is parsed as "(Reduced-Instruction) Set Computer", not "Reduced (Instruction-Set) Computer". In other words, a RISC cpu has a set of relatively simple "reduced" instructions. RISC does /not/ mean that the size of the instruction set is reduced.

Big RISC processors like the PowerPC have a very large instruction set, and while some of them appear quite complex they are actually almost all very simple. In a "pure RISC" architecture (to the extent that such a thing exists), all instructions have the same size, and operate with the same timing - usually 1 pipelined cycle.

It's been a while since I have worked with a Z80 (about 25 years), so I don't remember all the details. But the Z80 is CISC.

You can write working x86 code fairly easily, but writing good, fast x86 assembly code for modern x86 chips is a serious pain. ARM assembly takes a bit of getting used to as well. I would pick ARM, if given the choice, but on such devices you can normally write much faster C code than assembly code.

The ARMs these days have several different instruction sets ("old" ARM, Thumb, Thumb2) with their pros and cons. Conditionals per instruction are certainly nice, but they are costly in terms of instruction code bits for their usage - so the Thumb instruction sets have replaced them with a sort of if-then-else-endif construction.

Big CISC processors have traditionally used microcoding - the complex instructions are run as a series of very wide microcode instructions that are at a lower level. The translation of x86 instructions into RISC microops is not much different, except that these microops are scheduled and pipelined in a different way.

CISC instructions try to do a lot of different things within the same instruction - in particular, they often use multiple complicated addressing modes. So breaking them into separate RISC instructions that do one thing at a time makes a lot of sense.

There is seldom any reason to write assembly for "inner loops" any more

- on most processors, a decent compiler will generate pretty close to ideal code for that sort of thing. And for complex processors, the compiler will generally do a better job than hand-written assembly, because there are often subtle issues with scheduling, instruction ordering, etc., that can make a big difference but be difficult to track by hand. This is particularly important if you want to target several cores - the ideal code can be significantly different between two x86 devices from different companies, or different generations. And for RISC devices you have lots of registers to track instead as well.

Where hand assembly still makes a big difference in these kinds of chips is for vector and SIMD processing.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 9:57 AM

One thing to remember when getting algorithms from some old Fortran math library and then rewrite it in e.g. in C, is that Fortran stored two (and multiple dimension) arrays in a different way than for instance C.

If the code had been optimized for Fortran array storage mode to minimize cache/virtual memory misses, the C code would have a huge number of cache/virtual misses, unless the array indexes are swapped:-).

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 2:17 PM

I'm pretty sure modern optimizing compilers fix that for you. It would be a pretty obvious thing to do.

Learning how to write loops so that your compiler can vectorize them is the big win. Intel C++ is the bomb at that, but gcc is learning.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC 
Optics, Electro-optics, Photonics, Analog Electronics 

160 North State Road #203 
Briarcliff Manor NY 10510 

hobbs at electrooptical dot net 
http://electrooptical.net

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 2:19 PM

MS made a bunch of BASICs. GW BASIC was interpreted directly, QuickBasic I'm pretty sure was byte code, and the MS BASIC compiler was a real compiler. They did bundle QuickBasic along with the compiler, so it was fairly easy to confuse.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC 
Optics, Electro-optics, Photonics, Analog Electronics 

160 North State Road #203 
Briarcliff Manor NY 10510 

hobbs at electrooptical dot net 
http://electrooptical.net

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 4:01 PM

I think PDS Basic, the "pro" version of QuickBasic, was a machine code compiler.

--

John Larkin                  Highland Technology Inc 
www.highlandtechnology.com   jlarkin at highlandtechnology dot com    

Precision electronic instrumentation

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 6:03 PM

JIT compilation can work if the algorithm doesn't jump around too much. Realtime systems tended to violate this, quite floridly. In the old days, it was often faster to turn the cache hardware off.

Compilers differ in how well the optimize, and what assumptions they make about the typical program.

But they all have to get the right answer, so I don't understand this comment.

I have had to do such things in the past, but not in the last decade or two.

What does happen is the compiler writers will cut corners in areas that they think are rarely used by their typical customer. We had a florid case of this in an Ada83 compiler. Use of a Rep spec with some kind or Record definition, an odd corner but one that's essential for handling messages between different machines (the bits all gotta line up), cause the program to run something like a factor of one hundred slower than necessary. Staring at the Ada code was no help - the Ada code was correct, and did get the correct answer.

A profiler found the problem in a day - the generated code implemented a critical and widely used bit of the Ada in a big subroutine library, rather than by spitting out a few lines of assembly. Ouch.

The solution was to use a different Ada compiler.

Joe Gwinn

- J
- jeroen Belleman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 6:26 PM

No such thing as JIT compilation in an RT system. JTL would be closer to the mark. OK, it depends a bit on what you mean by 'RT'.

Jeroen Belleman

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 22, 2014 7:40 AM

No-one who moved from FORTRAN to C and wanted optimum speed ever used C's poxy way of implementing 2D arrays as an array of pointers to 1D arrays. The 2D indexing for example was implicitly done as a macro

IDX(i,j,N) which expanded to ((i)*(N)+(j))

All brackets here being essential or algorithms can get scrambled.

Most years some graduate student would bring the VAX to its knees by trying to transpose a (large for the time) 512x512 image with the naive

for (i = 0; i minimize cache/virtual memory misses, the C code would have a huge

The only thing C had going for it over FORTRAN was that the indexing of FFTs was a lot more natural with 0 based arrays instead of 1 based.

--
Regards, 
Martin Brown

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 22, 2014 10:19 AM

One other way of trashing the performance, is the bad habit of some programmers using fseek/ftell to get the file size, then uses malloc (or even calloc) to allocate a buffer and then read the whole file into the buffer in a single read.

A decade ago an image/video display utility did this. With physical memory sizes much smaller than today, imagine what happens when the simple load code (a few statements) encounters a big MPEG2 file (a gigabyte or more).

Everything goes well, until the physical memory/working set is full. Then the OS starts to write out dirty pages (the beginning of the video file) into the page file. This continues until the file is finally "loaded" into memory. Now, perhaps an hour later, the actual file processing starts by reading pages from the page file.

At least they should have used memory mapped files (available in most virtual memory operating systems), in which only the required parts of the read only file is loaded into memory (no need for writing dirty pages into a page file).

Thus, a few innocent looking statements can have extreme impact on the performance.

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 22, 2014 1:07 PM

And no COMMON blocks, DATA cards, computed GOTOs, arithmetic IFs, .....

I learned Fortran in the late 70s attempting to debug a radiative transfer astrophysics code that I didn't understand very well at all. I sure don't miss it.

Cheers

Phil Hobbs

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC 
Optics, Electro-optics, Photonics, Analog Electronics 

160 North State Road #203 
Briarcliff Manor NY 10510 

hobbs at electrooptical dot net 
http://electrooptical.net

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 22, 2014 1:33 PM

What is wrong with COMMON blocks ?

These are just structs. Even if the compiler didn't support any kind of "include" statements, just make several copies of the COMMON cards (preferably on cards with different color) and put the cards into the card deck for each subroutine, this works nicely.

switch/case statements in more recent languages ?

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 22, 2014 6:15 PM

They're unstructured, is what's wrong with them. You can't figure out where they're visible without an exhaustive search.

Order sensitive, unreadable, and unstructured. Switch statements have none of those problems.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC 
Optics, Electro-optics, Photonics, Analog Electronics 

160 North State Road #203 
Briarcliff Manor NY 10510 

hobbs at electrooptical dot net 
http://electrooptical.net

- D
- David Platt
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 22, 2014 7:58 PM

Hey, half a dozen COMMON blocks, with an EQUIVALENCE statement or two to make them overlap in memory in interesting ways, was a sure-fire recipe for excitement, thrills, and desperate 4 AM debugging sessions.

Computed GOTOs weren't really so bad - they're just sort of a switch() full of gotos, and you could see precisely which variable or expression was evaluated to control the jump.

It was "assigned" GOTOs (first cousin to the COBOL "ALTER" verb) which really made things fun, fun, fun. Self-modifying branch logic of the finest kind (>>shudder