Here is some new bad news, and i mean really bad news

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 8:09 AM

There is little doubt that with a language that has no pointers (or strongly discourages pointers) and has run-time checks on array and buffer access will have far fewer problems with buffer overruns and similar issues than with a language that allows free and rampant access.

But let me give you a few specific points here, to avoid help avoid going round in circles:

There is /nothing/ in C that stops you checking your arrays and buffers. People who are experienced in reliable and secure programming write their C code carefully in order to avoid any risk of overflows.
With C, there is a lot more specified at compile-time than with dynamic languages. So if you have written your C code well, and use appropriate static error checkers (there are many such tools for C and C++), a great many potential bugs are caught at compile time. With dynamic languages, bugs often do not appear until your code is running - and if you don't have good tests covering all code paths, you will not see the bugs until after the system is in use.
High-level languages make it much easier to avoid memory leaks and issues due to unclear resource ownership. But they don't avoid such problems entirely, and they use far more resources in order to achieve this automation.
High-level languages make it much easier and safer to work with strings. C is crap at strings.
With C, when you get buffer overflows and similar problems, the result is usually a hard crash. With dynamic languages, the result is usually a run-time error. People often write error-handling code for the errors they expect, but fail to do so for errors they don't expect. So a run-time error or exception when you try to go out of bounds in your dynamic language will lead to improperly handled errors - you'll get weird error messages, program halts, silent incorrect operation, etc. It is unlikely that you will get the same kind of read or write of random memory that you can get with C, but injection attacks (popular with SQL) can be easier to exploit, and unexpected errors can easily lead to skipping security checks and other protection.
Regardless of the language, you have to /think/ securely if you want to keep your system secure and reliable. You have to check /every/ assumption about the incoming data, and sanitise everything. No programming language does that for you - you always have to think about it. But /frameworks/ and libraries can help, and make sure that the data delivered to your code is safe. Choosing a good framework is far more important than choosing the programming language.
Regardless of the language, you need to test /everything/. And you need a development process in place to ensure everything is tested, that code is reviewed, that test procedures are reviewed, etc. - all by different people.

As you can see, there are pros and cons to high and low level languages. You can write secure and insecure code in either. I don't know of any statistics showing some languages to be "safe" and others "unsafe", taking into account the amount of times the code is run, the number of attempted attacks, the level of expertise of the people writing the code, and the amount of time and effort spent writing and testing the code.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 8:31 AM

And the language here was SQL, not C. Probably the underlying application was in Perl or Python - it's highly unlikely it was in C.

It turns out that C does not have a monopoly on insecure code.

Correct.

And the PICs are famous for the programming friendliness...

In the real world, separate memory spaces for data and code is not /too/ bad, as long as read-only data is in the same memory space as read-write data. (I don't mean you should be able to write over the read-only data

- it's fine for it to be protected in some way.) Harvard architecture micros like the PICs and the AVRs are a serious pain to work with, and the separate memory spaces means you have to jump through hoops to make read-only data work. Slow, inefficient, and error-prone.

Intel's segmented memory was widely considered to be crap. It was a painful hack on a limited architecture that was out of date before the first 8086 designs were made. Flat memory models are a /far/ more efficient design.

Note that the memory model here (segmented or flat) has /nothing/ to do with memory protection or virtual memory mapping. There are lots of advantages in having memory areas with different access rights (read-only, no-execute, etc.) and having flexible virtual-to-physical address mapping.

But there are /no/ advantages to a system where you have lots of real memory, but you can only access it in small bits (such as 64K lumps in older x86 chips).

There are many reasons why OS/2 was a good system - good memory management and process separation (especially compared to Windows at the time) was part of it. But segmentation, and the segmentation registers, was not an essential issue - they were only used because that's the only way 80386 had of getting the protection needed. Alternative good processor designs (and later x86 chips) had proper memory management units that gave protection without needing messy segments.

There are all sorts of reasons why OS/2 lost out (including, but not limited to, a crap marketing department). Windows did not have a flat memory model at the time - Win9x had no proper memory model at all. It used Intel's segments but without any decent protection between processes.

Fortunately, the world has moved on and stabilised on flat memory models with protection handled by the MMU.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 8:37 AM

A mixture of home grown over the years and compiler tools for Modula 2 (some translated/transmuted to work on C). I may yet get around to making a version of McCabes CCI for C available publicly either free or for a nominal charge. Trouble is a chronic shortage of roundtuits.

I find it is a very good heuristic for legacy code that if the CCI complexity of a procedure exceeds certain bounds then it will almost certainly contain bugs and it is just a case of finding them!

The latest version of PCLint apparently supports MISRA C restrictions (I don't have that version myself either). I do have an older copy.

This paper (sadly no longer online) describes some of the philosophy behind this new generation of satic dataflow analysis checkers.

formatting link

(link is to wayback machine M2 code analysis stuff was integrated into the XDS M2 compiler - no idea how good their Java checker is at all).

I don't actually have this tool but if I was in the market today I think I would be looking at something like Red Lizards Goanna.

formatting link

Simple way to find out if it is for you is to take a fairly sizeable codebase you think is OK. Download the evaluation version and see how many things it can find. Mostly they will be fence post errors at extremes of possible input data or paths where a variable manages not to be initialised (but might work most of the time anyway). Dataflow analysis across whole programs is one of the big steps forward.

Any bug you can remove without running the software is worth killing. (it is very often in the seldom traversed error recovery paths that serious flaws lurk - the routine frequently traversed code is mostly OK)

I am also a great fan of linking production code so that it will save a traceback that shows the stack at the time of failure - who called who with what. Actually it saves a bunch of hex numbers and you need the right MAP files for the production build to get back to code.

I grew up with tools that provided a fabulous post mortem debugger that pretty much guaranteed that you could find and fix any in service bug after a single incident. These days it is a bit harder since optimised production code gets reordered so you have to hunt about a bit more.

Still worth doing if your compiler/linker provides such an option. Less useful in embedded but you can still have the exception handler save registers & trap address to read back later from an external computer.

--
Regards, 
Martin Brown

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 8:53 AM

Actually it did have pointers but FORTRAN programmers tended not to be aware of them. The lack of any strong typing meant that an inordinate amount of time was wasted by physicists calling NAGLIB routines that expected 8 byte DOUBLE PRECISION REAL arrays with 4 byte REAL ones.

SUBROUTINE SWAP(I,J) K=I I=J J=K RETURN END

When called with arguments like SIN and COS could have very interesting side effects on subsequent use of trig functions. A pointer to an array of unknown length was declared by convention as length 1 eg.

INTEGER TRICKY(1)

It did have character arrays but only a handful of custom dialects allowed easy string manifest constants in quotes. 6HSTRING was always portable but heaven help you if you miscounted the string length.

It would after F66 let you assign Hollerith character constants to arrays in DATA statement. I think this illustrates my point perfectly.

PROGRAM HELLO C INTEGER IHWSTR(3) DATA IHWSTR/4HHELL,4HO WO,3HRLD/ C WRITE (6,100) IHWSTR STOP 100 FORMAT (3A4) END

Believe it or not that was an improvement on what went before!

The lack of reserved words made the language interesting with the Chinese usage. It wasn't hard to break compilers back then. Indeed FORTRAN G was so unsure of itself that all successful compilations ended with the message "NO DIAGNOSTICS GENERATED?".

Ironically they were integrated too late. C had its nul terminated strings but they were just that and so intrinsically dangerous.

Had there been a string length at the start (as occurred in some other languages of the era) the world be a very different place. That was probably one of the most destructive peephole optimisations of all time.

--
Regards, 
Martin Brown

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 2:11 PM

"'s a joke, son, jes' a joke."

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC 
Optics, Electro-optics, Photonics, Analog Electronics 

160 North State Road #203 
Briarcliff Manor NY 10510 

hobbs at electrooptical dot net 
http://electrooptical.net

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 2:18 PM

+1

Though it was really all the land mines in the library, e.g. strncpy _sometimes_ failing to copy the trailing null. The only really awful thing about the null is having to remember that it's there, e.g. that sizeof(string) != strlen(string).

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC 
Optics, Electro-optics, Photonics, Analog Electronics 

160 North State Road #203 
Briarcliff Manor NY 10510 

hobbs at electrooptical dot net 
http://electrooptical.net

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 5:25 PM

Regarding AVR8s at least, if you've only worked with them through GCC, I understand that. The last time I tried to do something involved (a menu datastructure stored in program memory), I found it was impossible to convince the compiler that A connects to B to C to A, not to mention trying to find what convoluted data type it wanted (is it a pointer to data to PGM to..??!).

The underlying instructions couldn't be simpler:

- Load Z with address you want to look at

- LPM Z[+] to retrieve half a WORD (postincrement optional) Hardly a rich operation (no immediate or indexed offset modes), but that's not inconsistent with that sort of thing anyway.

The main downside is, programming RISC in assembly is so boring because you need four instructions to get anything done.

Tim

--
Seven Transistor Labs 
Electrical Engineering Consultation 
Website: http://seventransistorlabs.com

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 6:02 PM

It was in fact always *possible*, but a real PITA as you say and it did make it impossible to write in a general purpose way. The ARM cortex parts are much nicer to work with.

What's that, $10 for a cortex M4 development system with a general purpose JTAG/SWD debugger included. Add ARM GCC + openocd + eclipse and you get a modern c/c++ development system, absolutely suitable for professional use, for $10.

Kids of today...

[...]

--

John Devereux

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 7:47 PM

What exactly are you referring to ?

Since all parameters were passed by reference, you could do a lot of pointer like operations in the subroutine.

Nothing special compared to void pointers in C.

Without any other definitions, I, J and K are integers.

Isn't this also done in C as well, if someone wants to avoid the pointer syntax for some reason ?

The Hollerith notation might have been usable in a FORMAT statement for constant strings, but the portability in general is quite questionable.

You might be able to store some characters into a _single_ integer, but depending of the size of integer, it could hold six 6 bit Hollerith characters on a 36 bit machines or four 7-8 characters in a

32 bit word. I do not remember, if five 7 bit characters were stored in a 36 bit word (any DECsystem 10/20 specialists here) ?

I have no idea for what platform the F66 was designed for, but at least on 36 bit platform, you could either store six 6 bit characters (uppercase letter, digits, punctuation) in A6 or 7-9 bit characters (including lower case etc.) in A4.

At least the DEC Fortran IV plus supported sensible string constants in FORMAT statements.

A far more ugly thing for compiler writers was that you could insert spaces wherever you wanted. Thus

V A RIAB LE = 12. 3 4 5

is equivalent to

VARIABLE=12.345

Trying to detect pseudo "reserved words" is a bit nasty.

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Apr 15, 2014 11:30 PM

Not to mention the increased SRAM. Allocate your arrays and let C figure it out, who needs Flash! :)

Tim

--
Seven Transistor Labs 
Electrical Engineering Consultation 
Website: http://seventransistorlabs.com

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Wed, Apr 16, 2014 11:27 AM

It is nearly always /possible/ to get the setup you want, but it is sometimes very difficult.

I've worked with AVRs with a few different compilers, and from assembler, though I use mainly gcc.

There is /no/ way to write good, clear code, with a standards-compliant compiler and no extra keywords, generating standard object files, and get proper control of read-only data in flash on an AVR or other microcontroller with separate memory spaces.

There are different tricks used, and different compromises, but you cannot write code as clearly, efficiently, and as portably as you can with single address space processors.

Some of the methods I have seen are "progmem" attributes and related macros (older gcc), flash memory spaces (newer gcc), "__flash" keywords (IAR), and abuse of "const" (Imagecraft). Other possibilities are complex C++ classes and templates, accessor functions, "universal" pointers (also in newer gcc), and full-program optimisation with a particularly smart compiler.

Yes, it /could/ be simpler - it could use the same instructions that are used to access other data. And therein lies the problem.

No, there is no problem - I've done plenty of RISC assembly programming (on AVRs and MSP430 in particular). It's not hard, and you don't have to use more instructions. Typically AVR and MSP430 code is a lot more compact than on small CISC devices I have worked with in assembly, including 8051, COP8, HPC, PIC. The code/data space split doesn't cause much of an issue in assembly programming, but it is a pain for C and other high-level languages.

RISC assembly programming on big cpus, such as PPC, is a pain because they are so complex. But so is assembly programming on big CISC cpus.

- D
- dagmargoodboat
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Wed, Apr 16, 2014 1:11 PM

Not necessarily a MFU.

In this TED talk (recorded in March), starting at ~18:00, Snowden describes a deliberate, concerted effort to compromise SSL:

SNOWDEN: "[Bullrun]... They're building in backdoors that not only the NSA can exploit, but anyone else who has time and research to find it [...] if we lose the trust of SSL--which was specifically targeted [...]"

formatting link

Since Google reported the "bug" April 1, the talk appears to predate the HeartBleed story.

"Never attribute to malloc() ..." --Hacker's Razor

Cheers, James Arthur

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Wed, Apr 16, 2014 2:17 PM

I think it probably was. Too much of blunderbuss to be spook stuff.

Although it is tempting to attribute this to deliberate intent I doubt that it is sufficiently focussed enough to be spook stuff. They tend to make the cryptography weaker than it should be so that there is some cunning way in using knowledge of a built in but not obvious weakness.

The objective being to be able to read specific intercepts from targets of interest. Snaffling random Mumsnet passwords isn't their style!

I think this was just a genuine SNAFU. And all the people who used the open source code without running it through a static vulnerabilities analysis are just as guilty of failing to use due diligence. This code was used in a very critical and sensitive security application.

It is when malloc fails or someone forgets to give back the allocated memory or retains and uses a pointer to it that things go haywire.

--
Regards, 
Martin Brown

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Wed, Apr 16, 2014 3:28 PM

Wat. You're shittin' me, right?..

None of those is close to being as nice. Taking 'nice' to mean, fewer assembly lines required to accomplish various tasks.

It's like you're calling them CISC just because they have no registers. Which is why they take as many instructions, you're always pulling stuff through the accumulator or whatever.

I want to say AVR has more instructions (arguably, many of which could be called addressing modes, Atmel just doesn't enumerate them as such) than PIC. (But it's been a while since I looked at the PIC instruction set.) What's "CISC" about PIC if that's the case?

Or compare 8051 to Z80, though you still don't get read-modify-write instructions, so arithmetic in memory still isn't any better. Does that make Z80 RISC too?...

Can't argue with that. From what I've seen, I'd rather do assembly on x86 than full-on ARM (having written 8086 before, but only looked at the ARM instruction set).

I'd probably change my mind once I learned enough to work with. Conditionals per instruction though, that's a compiler's dream. I suppose it's about time something like that has caught on; I want to say IA432 was supposed to do that, but that ended up a major flop for a variety of reasons. PCs to this day are still x86, though they're RISC inside. Go figure. The kinds of code-heavy roles where, yeah you can optimize the inner loops -- and should, once you've exhausted other means -- but you've just got so damn much code that you'd be mental to do any small fraction of it in assembly.

Tim

--
Seven Transistor Labs 
Electrical Engineering Consultation 
Website: http://seventransistorlabs.com

- J
- josephkk
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Thu, Apr 17, 2014 4:32 AM

based

formatting link

internalSearch

MFU!

[...]"

rnet

Yes, i am already finding reports about this vulnerability that go back as far as 4 years. It is a particularly pernicious backdoor.

?-/

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Thu, Apr 17, 2014 12:18 PM

Something is fishy here. Basic is an interpreted language. If the program has high locality, aggressive caching of repeated bits can make it only one tenth as fast as the same algorithm coded in a compiled language like C. If the program has low locality (like lots of realtime stuff), interpreted code is more like one 50th of the speed of compiled code.

I'd look at the C code with a profiler, and find the bug.

Joe Gwinn

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Thu, Apr 17, 2014 1:04 PM

Not necessarily.

PowerBasic is a decent optimising native code compiler. And in some ways it has more freedom to optimise its loop code than a C compiler!

Basic and Lisp are usually interpreted languages but there are optimising native code compilers for both of them on some platforms.

formatting link

From memory the data was just about big enough and involved words and integers to go I/O bound and their C code was decidedly non-optimal.

On the current crop of optimising compilers there is seldom much to choose between different ways of implementing vector dot products.

C code isn't quite as fast at some things as you might like to believe, but ISTR the slowness in this case was mostly down to user error.

--
Regards, 
Martin Brown

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Thu, Apr 17, 2014 4:11 PM

PowerBasic is a very good optimized compiler. It can run useful FOR loops at hundreds of MHz.

If the

It was, like, 20 lines of code. It didn't have a bug, it was just slow. As I noted, futzing with c compiler optimizations helped.

The PowerBasic compiler gives you one choice: size or speed. Compiled programs are so small that I always go for speed.

--

John Larkin                  Highland Technology Inc 
www.highlandtechnology.com   jlarkin at highlandtechnology dot com    

Precision electronic instrumentation

- J
- Jasen Betts
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Thu, Apr 17, 2014 9:30 PM

probably Perl or PHP, Python's database interface makes SQL injection hard to code.

--
umop apisdn 


--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Apr 18, 2014 2:53 AM

Be careful with that word "compiler". Interpreted languages compile to byte code, not to native machine code. The byte code is executed by a bit of software. In Java, this machine is called the JVM (Java Virtual Machine).

While one can compile some originally interpreted languages to machine code, it isn't common because some of the nicest features of interpreted languages cannot be compiled to machine code in advance.

Sure it had a bug, a performance bug. All languages have easy directions and hard directions, and if one tries to move along a hard direction, one will get the correct answer, but very slowly.

The standard approach is to profile the code and find out what it's doing. Most likely you'll groan when you find out.

In all truth, these optimizer mode switches often make little difference.

Joe Gwinn

(who made his living as an embedded realtime programmer for almost 30 years)