Microchip ICD4 ?

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sat, Aug 19, 2017 7:44 AM

But he is right that profiling the code is an essential step to figuring out what is actually wrong with it. Most code spends a disproportionate amount of time in a few very heavily executed sections.

Assuming here that there wasn't a silly error involving copying a large structure somewhere. You can easily get an order of magnitude slow down by doing stupid things in a HLL.

It is quite likely that there was something unfortunate about the algorithm implementation in the HLL that made the speed difference quite so bad. That or learning from the initial approach the assembler version used an entirely different way of simulating decoding the instructions.

For most of the scientific code I do the compiler can sometimes do unexpected things that make it go faster and hand optimisation of the resulting code using well known heuristics may slow it down.

I think it is more likely that the representation of the CPU being simulated was sub optimal in the first attempt (a common problem). They learned from the initial coding what not to do and didn't make the same mistakes when coding it in assembler.

--
Regards, 
Martin Brown

- A
- Albert van der Horst
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sat, Aug 19, 2017 11:03 AM

Or you combine things that have never been combined before.

If you're planning on using a safe subset of c++ you should use Ada in the first place. Please comment on the kind of C++ code you find on mbed.

And COBOL is obsolete... Do you think that declaring things obsolete means that large bodies of software are converted in a eye blink.

Some say that in view of lisp C/C++ was obsolete the day it was invented. That doesn't buy you much.

I've made some solid C++ code, but I shudder what remains of it after a few years of "maintenance" by a novice. See ssort.html on my site below as of 1993.

Groetjes Albert

--
Albert van der Horst, UTRECHT,THE NETHERLANDS 
Economic growth -- being exponential -- ultimately falters. 
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

- B
- bitrex
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sat, Aug 19, 2017 2:30 PM

It looks very 1994. ;-) Large classes, lots of methods which look like straight-C internally.

Nowadays if you wanted to use a custom sorting algorithm on a custom buffer you'd probably subclass or composite one of the STL containers that implemented most of the "concepts" you wanted like say "SequenceContainer", BackInsertable", etc.; if you wanted custom memory allocator behavior you'd write a class conforming to the "Allocator" template (you can hot-plug any allocator policy into just about any conformant Container), and your algorithm would follow the "Algorithm" template. Then apply the algorithm to the objects in the container.

- A
- Albert van der Horst
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sat, Aug 19, 2017 7:03 PM

Sure.

It would be interesting to look at it converted to modern C++.

The main difficulties are:

After reading, there is raw bytes. Then there must be imposed a record and field structure based on a regular expression given by the user. Then it must be again be output as raw bytes, that can be reinterpreted in the same way.
Reading must be in chunks, in MSDOS this was available memory (1 Mbyte), now it must be about an L2 cache worth.

My guess is that it amounts to a lot of frustration before the compiler does your bidding.

It would be an academic exercise though. The choice of algorithm results in poor locality on modern machines. Disk merging would be replaced by mapped files, or straight use of virtual memory, maybe even swapping would not be too detrimental. But then, variable length records that must be output in sorted order to a file will result in a lot of cache misses even in qsort. Now records are copied once for each merge phase. With 5-10 way merge there are few merge phases.

(On a 1 Gbyte file it looses a factor 10-40 compared to GNU sort. Yet, because of its capabilities, there is talk about adding it to Debian. )

Groetjes Albert

--
Albert van der Horst, UTRECHT,THE NETHERLANDS 
Economic growth -- being exponential -- ultimately falters. 
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

- T
- Terry Porter
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Aug 20, 2017 7:22 AM

I don't use 'a Forth Compiler' I use Mecrisp-Stellaris on CortexM0 and have never seen a way to do this, plus it's not mentioned anywhere in the Mecrisp-Stellaris documentation, but I'd love to know how. I'd be grateful if you could show me how to do that with Mecrisp- Stellaris so I can add your technique to my documentation site.

I can clone the whole system Flash, but that's just a copy which included the Forth Core as well.

I may do a DocSite for Mecrisp-Across once I understand how to use it, but I'm not interested in Forth on MSP430 as I told you in before.

However I'm definitely interested in Assembly on MSP430 as I love the orthagonal instruction set.

Even a $20 MCU is cheap compared to a National PACE CPU, but it's still four times more expensive than the STM32F051 I use, which also has a lot more peripherals than the MSP430.

I will probably never use 128KB of Flash for my Mecrisp-Stellaris Forth projects.

For me, 64KB of Flash is plenty.

Neither do I but I have ordered some MSP430 gear so I can play with Mecrisp-Across.

You're welcome.

--
Mecrisp-Stellaris Unofficial User Doc: http://128.199.141.78/index.html

- T
- Terry Porter
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sun, Aug 20, 2017 8:49 AM

53 inc.b r15

00 and #32,

;abs 0x2114 2110: d2 43

3f jmp $+0

Excellent, Thanks for such a complete and easy to reproduce followup :)

I'm waiting for some MSP430 boards and chips I ordered last week (I'm in Australia and these are coming from Hong Kong), but when they arrive, I promise to run your code, and if I'm not waiting for anyone else, I'll publish the MSP430 assembly code that binks in 14 bytes.

(It's not my code, I'm not a MSP430 Assembly Wizard)

P.S. I'm just curious, is "void __attribute__((naked)) xreset(void)" a assembly statement or a C statement ? ;-)

Cheers, Terry

--
Mecrisp-Stellaris Unofficial User Doc: http://128.199.141.78/index.html

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Aug 21, 2017 2:43 PM

You mean C compilers generate code that does not do what the programmer expected, when the programmer does not know how to write C code properly?

A programming language - any programming language - is a contract between the programmer and the compiler. The programmer promises to obey the rules of the language, and the compiler promises to give object code that implements the source code. If the C programmer accesses data through pointers of other types, without taking proper measures (using unions, memcpy, volatile, char* types, or implementation-specific techniques), then he is breaking his side of the bargain.

I don't think type-based alias analysis actually helps very much in optimisation, but on the other hand I don't see it as being a high risk of mistakes. Code that messes about accessing the same data as different types is usually wrong anyway - and if you really /do/ want to do it, there are perfectly good and efficient ways to do it correctly.

And if you want to write code that gets this wrong - such as the Linux kernel - then compilers that implement strong optimisations typically have options to control that behaviour. gcc has "-fno-strict-aliasing". And since it does not do alias analysis unless you ask it for optimisation, you can just add that in as an option along with "-O2" or whatever optimisation options you pick. (That's what the Linux kernel does.)

Lie to your compiler, and it will bite you. I have sympathy for people who make mistakes in their coding - no programmer is perfect (not even Forth programmers). But I have little sympathy for people who knowingly write bad code because they feel it /should/ work.

If you have an object originally defined as "const", and cast away that constness, then try to write to the object - then you get undefined behaviour. The consequences are therefore unpredictable.

In C, if you use a "const" pointer, you are saying "I promise not to change the thing pointed at, via this pointer". No more, no less. You are not claiming you will /never/ change the object, just that you won't do it with that pointer.

But if the object is originally defined with "const", then you /are/ promising that the object will never be changed. (If it is "volatile const", you are saying that something else might change it, but you never will.) The compiler can optimise on that assumption - it can use the const value directly in code without creating an object in memory, or it can put the object in read-only memory, or it can do both.

So if the original object is const and you cast away the constness (which is allowed), and then try to write to it (reading is fine), you are breaking your contract and anything can happen. In particular, some bits of code might read the new value, other bits might continue to use the old value. And the write itself can fail if the memory is protected against writing in some way.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Aug 21, 2017 2:49 PM

You can do that in Forth - it is not even a bug:

: 2 3 ; ok

2 2 + . 6 ok

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Aug 21, 2017 3:03 PM

Just to be clear, I am not recommending the above code as a good example of C programming! Basing your timing on watchdog resets is rarely considered good practice - this is just a minimal example.

And if you want to try it out, you will have to change the port used to match the board as well as changing the "-mmcu" flag to match the real device.

It does not look like anyone else is taking your challenge.

It is a C statement, for the start of a function called "xreset" that takes no parameters and has no return value. The "__attribute__((naked))" part is a gcc extension, telling the compiler not to generate function prologue or epilogue code to keep it minimal.

- P
- Przemek Klosowski
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Tue, Aug 22, 2017 3:24 AM

Brian Cantrell had a great line: "If you stick your hand into the lawnmower, it'll maim it. It doesn't hate you or anything---it's just being a lawnmower".

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Tue, Aug 22, 2017 4:08 AM

There *are* other Forths in the world than Mecrisp.

"Interested" in what sense? I don't program in assembly unless it is absolutely required. Much of the Mecrisp source code is assembly, so you can certainly look at that.

But you can't host the Mecrisp cross compiler on the STM32F051 can you? I thought it was only on the TI ARM eval board.

I don't understand. If you don't have the host board, what good will the target board do you for looking at the cross compiler?

I am not following the flow of this conversation *at all*.

--

Rick C 

Presently at Wintercrest Farms 
On the centerline of totality since 1998 
:)

- P
- Paul Rubin
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Tue, Aug 22, 2017 5:41 AM

Terry posted a few days ago:

Yes, I'm evaluating "Mecrisp-Across" at the moment. The binary runs on MSP430 targets, but "Mecrisp-Across" is written in Forth and runs on Mecrisp-Stellaris, hosted on ARM, be it a RaspberryPI, STM32, TI Tiva Connected Launchpad or about 5 other manufacturers ARM products.

I tried running Mecrisp-Stellaris on a remote ARM Linux server a while back and it had some problems at the time, but I might try again. Or maybe you could run it on an x86 emulating an ARM under QEMU.

- T
- Terry Porter
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Tue, Aug 22, 2017 7:28 AM

Yes, you can definitely run it under QEMU on Linux. I tried it with qemu- arm-static on Freebsd but while it ran, it did segfault under some conditions.

One advantage of the TI Tiva Connected Launchpad over the other alternatives is that Matthias has written a JTAG programmer Word for it so programs generated by Mecrisp-Across can be flashed to a MSP430 target chip.

I have my TI Tiva Connected Launchpad now and will try Mecrisp-Across on it soon.

At the moment I'm having a too much fun with Mspdebug on a MSP430 launchpad to try the TI Tiva Connected sysem ;-)

--
Mecrisp-Stellaris Unofficial User Doc: http://128.199.141.78/index.html

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Wed, Aug 23, 2017 8:52 PM

I thought you didn't mess with the MSP430?

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- T
- Terry Porter
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sat, Aug 26, 2017 2:33 AM

53 inc.b r15

00 and #32,

;abs 0x2114 2110: d2 43

3f jmp $+0

I compiled your code, but it doesn't blink the LED, so you'll have to make it work and submit a binary if you're interested.

This is the working 14 byte binary Blinky written by Matthias Koch, which I verified on a MSP-EXP430G2 board with a MSP430G2553 chip. ========================================================= cpu msp430

include "mspregister.asm" include "registers-classic.asm"

;

----------------------------------------------------------------------------- org 0FFF2h ; 14 Bytes to go. ;

-----------------------------------------------------------------------------

Blinky: ; Do not stop the Watchdog timer. Let the free running Watchdog do the timing :-) mov.b #1, &P1DIR ; Set LED pin as output add.b #8, r4 ; Delay a bit more. Watchdog triggers too fast to see. addc.b #0, &P1OUT ; Toggle LED on counter overflow by adding carry into: Led is on 1 :-)

- jmp - ; Wait for next reset

;

----------------------------------------------------------------------------- ; Vector table ;

-----------------------------------------------------------------------------

org 0FFFEh .word Blinky

end

; How to assemble: asl tinyblinky.asm && p2hex tinyblinky.p -r

0x0000-0xFFFF ; How to flash: mspdebug rf2500 "prog tinyblinky.hex"

Programming... Writing 14 bytes at fff2... Done, 14 bytes total

===========================================================

I think this shows that a hand written assembly program can be smaller than a compiler generated binary.

I know the op said 'non trivial', but if a Compiler can't beat hand Assembly at the *simple things*, how will it beat hand Assembly at the complex things ?

As a matter of interest, this no tricks, standard Forth code Blinky when compiled by the Mecrisp-Across Compiler produced a 96 byte binary. (verified on my MSP-EXP430G2 board with a MSP430G2553 chip.) : us 0 ?do i i + drop i i + drop loop inline ; : ms 0 ?do 998 us loop ;

: blinky ( -- ) 8MHz 1 p1dir c! begin 1 p1out cxor! 1000 ms again ;

Programming... Writing 80 bytes at f800... Writing 16 bytes at fff0... Done, 96 bytes total

--
Mecrisp-Stellaris Unofficial User Doc: http://128.199.141.78/index.html

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sat, Aug 26, 2017 5:55 AM

I think the point is that a human is capable of doing some very optimal things when the code is small as a lot of thought can be given to optimizing the small task relative the total effort. But with a large program it is much harder for a person to optimize the total program while it is still not such a hard task for a computer.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- P
- Paul Rubin
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Aug 28, 2017 7:22 AM

Is the code from the book available, or anyway your transliterations? Thanks.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Mon, Aug 28, 2017 2:25 PM

As I said, I guessed at a pin for the LED because I do not have the board. I can't test the code! But based on the code below, you should replace :

P2DIR = 0x01;

with

P1DIR = 0x01; P1OUT = 0x01;

It looks like you also have to change the linker script to fit the reset vector for the chip in question. But there is no point in fiddling with that - it is the principle that we are discussing.

It is possible to use a gcc extension to put the counter value in a register rather than in the ram (as I did), but I would say that is cheating when writing in C. The "addc" trick here is nice for getting minimal assembly, and cannot be generated in C.

It does, but in the same way as you can "prove" that walking is faster than using a car by comparing how long it takes to visit your neighbour.

The compiler will do far better at complex things precisely /because/ they are complex. As I have already said, in theory a compiler can never be better than hand assembly because the assembly programmer can write the same stuff as the compiler generates. But in practice - when you add requirements such as clarity, maintainability, flexibility and reasonable development time - C and a good compiler will beat the assembly programmer for real-life programming tasks on an msp430 or other sensible processor. It is no problem for an assembly programmer to track a register or two in a trivial task - it is a different matter entirely when things get serious.

- A
- Anton Ertl
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Sat, Sep 2, 2017 4:35 PM

formatting link

- anton

--
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html 
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html 
     New standard: http://www.forth200x.org/forth200x.html 
   EuroForth 2017: http://euro.theforth.net/