So: Anti software bloat

- D
- Didi
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 2:29 PM

lwedo with the free code

Can't fit that in 10k, but you don't need gigabytes to do that either. Just a little over 1 megabyte - and you have an OS, windowed environment, GUI, really low IRQ latency, tcp/ip, you name it.

In somewhat below 2M, even the spectroscopy application software fits. Most of the 2M flash are used as a ROM "disk".

formatting link

.

(the diskless version is sub-2M, has all functionality except disk storage - uses RAMdisk and well, no photo for background :-) ).

Written in VPA, of course. Try that in a HLL, C included.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- J
- Joel Koltner
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 4:38 PM

Hi Didi,

- J
- Joel Koltner
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 4:39 PM

Oops, sorry, the last line there was supposed to be:

What is VPA, Didi?

I can't imagine it's be Visual Python for Applications in this case. :-)

- D
- Didi
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 4:59 PM

LOL, my fault. Widely unknown as it is developed and used here by me :-).

Virtual Processor Assembly. Did it when I moved 68k -> power (PPC) some years ago (well, almost 10 now). Apart from being able to do 68k source -> power object, it has grown into something really powerful. Together with DPS (the OS running on my products) it does allow one (well, me at least :-) ) to vary the level of programming as needed, e.g. one line is a bitfield instruction and the other is a "do" something with some object primitive. I have been wanting to make the environment commercially available as well but making a usable documentation for it without opening all sources will be some work which I keep on being unable to find time for :-). Hopefully this year :-). But it is restricted to power (PPC) processors so far, so I cannot expect a very wide audience.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 4:59 PM

Actually, I always thought the original PIC's had the most godawful instruction sets (back in the "GI" days when they were used almost exclusively in STB's, etc.).

The 99000's instruction set (and programming model) was a bit difficult to get used to.

The 8x300 *really* was crippled -- but *fast* (for its day).

And I could never wrap my head around the i432. Just too "radical" for my sensibilities (at the time), I guess... :<

All time *favorite* was the NS32000 (nee 16032) -- though it was riddled with bugs :<

It is amusing to look back and see which features became prevalent and which *essential* as the designs evolved. E.g., many early machines didn't support a stack. Nowadays, even microchip has realized the benefit of this approach :-/ (some early machines didn't support IRQ's!)

OTOH, it is disappointing to see how many peripherals have

*lost* functionality -- functionality that really isn't costly to provide. :<

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 6:01 PM

RISC basically means, "we don't let you do anything in one, you always have to load-modify-store manually, but at least it goes screaming fast and the operands are simple to remember".

CISC means "we give you these powerful-assed instructions that'll solve field equations, but they take forever to compute and we only let you use some operands some of the time, see reference for what".

The 8086 did 16 bit multiply and divide, but it took up to 140 cycles to do it (tons of microcode). Tons of addressing modes, but these were only available on a few instructions. SRC and DST operands were exclusive to some extent, so for example, you can't MOV [mem2], [mem1], you need to pull it through a register to do mem-mem moves. Some instructions truely sucked, like OUT DX,AL, where DX and AL are absolute, no operands whatsoever (ok, there was OUT im8,AL, but that only works when you're e.g. fiddling with keyboard lights on the PC). It takes months to learn all the opcodes and their quirks, and after then, you'll still need the instruction set reference.

I've looked at Z80 and HC11, they have nice instruction sets, rich enough to do things with, not so burdensome that they are a pain to learn, and not so weak that you need more for the same work. The 8051 is nice, but original models took a whopping 12 cycles per instruction! Even the 8086 didn't go that slow. I understand there are streamlined versions available today, which get as low as 1 cycle. Z80 and HC11 are in the 4 cycle range, which isn't bad, but it's not single cycle, either.

One downside to RISC on the AVR is, the instructions are 16 bits, so by needing 2-3 times more of them, you need 4-6 times more RAM. Yes, making the false assumption that a CISC is able to do an average 8 bit instruction, which it can't. A more typical average is closer to 2-3 bytes, which balances things out more.

It's well enough suited to that. My understanding of C is, eeeeverything that isn't static is stack referenced. Enter function, build stack frame, push a whole ton of shit (man, disassemble the main interrupt vector in a program some day -- there's about 30 pushes in it -- almost the whole register set!), then finally get to work. Variables are referenced by pointers rather than absolutes (since it has to be relative to SP). The provision of three pointer registers, flat address space and "unlimited" stack make C go fairly well.

I understand there's PIC C, but that must be ugly.

Tim

--
Deep Friar: a very philosophical monk.
Website: http://webpages.charter.net/dawill/tmoranwms

- J
- Joel Koltner
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 6:25 PM

Many would argue that, when it comes to CISC, "only letting you use some operands some of the time" is limited to slightly brain-damaged implementations such as x86... ;-)

Yes, but it ran at 12MHz (1MIP for most instructions) in an era when plenty of other popular CPU were comparable (e.g., 6502 @ 1MHz gave you 1MIP as well for comparable instructions).

Although it wasn't much better -- a 4.77MHz PC, back in the day, might have gotten you 2MIPs or so... and often real-world non-number-crunching performance of, e.g., the IBM PC tended to be worse than, e.g., an Apple II in that the later's 6502 made all the peripherals memory-mapped, which was fast whereas the IN/OUT instructions under x86 were much slower.

This is true of many C compilers, although some keep a local variable frame that's separate from the call stack.

---Joel

- D
- Didi
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 6:31 PM

If we compare RISC to CISC we should perhaps compare the top architectures of both worlds. Choosing 68k for CISC and power (PPC) for RISC for the purpose would be reasonable, I guess.

Beautiful as the 68k instruction set & architecture were, power was a huge step forward. However, the power mnemonics as originally developed make assembly programming totally impractical, it would be insane to start even a small project on that. Once the 68k assembly evolved into VPA usable with power it became evident that it costs very little more memory to write for the RISC architecture. [Assembling original 68k - cpu32, actually - sources results into about 3.5 times more memory; however, this without optimization, more than half of the code does unnecessary maintenance of the correct condition code etc. compatibility stuff. If written with power in mind the code size gets very close to original 68k code size, certainly < twice that. Not so bad for a 32 bit instruction word vs. a 16 bit min. one.].

And for a comparison between different ages consider this: a 200MHz power (603e, 8240 precisely) emulating a 2 MHz 6809 system in a DPS window was measured to be about 40 times faster than the original. A penalty of just above 2 times for emulation, not so bad at all (IIRC I tried it assembling stuff which used to take a few minutes on the original system under MDOS09).

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 20, 2010 6:51 PM

Many claim that VAX-11 was the ultimate CISC architecture, however, each instruction could use all the available addressing modes.

In character string and packed decimal (BCD) instructions, up to six operands (SrcAddr1:Len1, SrcAddr2:Len2, DstAddr:DstLen) was used and each operand could occupy 6 bytes (indexing, addressing mode + offset/constant), so with the opcode, a single instruction could be 37 bytes long :-).

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:19 AM

ve

e

RISC really does mean "reduced". The rarely used or way complex instructions are left out in favor of using those same transistors to get more of the useful stuff done in a second.

RISC machines often have a fixed instruction length and a large number of registers but this need not be the case.

Back when CISC were not yet call "Complex Instruction Set Computers" because there was no movement to the RISC side, Intel did a fair amount of talking about "software on silicon" where common tasks were coded right into the CPU. About the highest extreme on this was the 432 machine that they made some of but later withdrew and burned all files about. It was reported that they also shot all the people involved, had them buried 12 feet down and then shot the ones who did the shooting, just to make sure that no memory of that dreadful machine remained.

do

o

ll

cked,

nd

I have written a fair bit of 8086 assembly code over the years. There actually is a pattern to the instructions and which registers they must use. Whatever register you already have some valuable value in, is the one that the next instruction must mung.

to

so

nal

go

Actually, the 8086 did go slower than that. A jump instruction on an

8086 took IIRC 21 clocks and the clock rate was only about 3MHz max. Compared to a 12MHz, this is really slow.

Newer versions of the 8051 do a machine cycle on every clock. When you are flinging bytes in a 100MIPS machine, you can get a lot of bytes flung.

ich

Internally, the Z80 had a 4 bit ALU that too two strokes to do each 8 bit operation. Since the fetch rate was clock/4 it wasn't a problem with speed.

If you knew some of the less well documented features of the Z80, you could do lots of other things with it. There were 65536 I/O ports if you used the BC register pair version of the I/O. The refresh register could be written to so you could do some tricky DMA things that involved it. Interrupts were only checked when the refresh and interrupt vector register were on the bus. This made it possible to use the refresh register to cause semi periodic interrupts.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:25 AM

ET

it

8052 has a MUL AB

qual

8051 has the CJNE A,#5,Line500 and a DJNZ R2,Line600

For many operations, a test and jump is just one instruction

y register

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:31 AM

On May 20, 9:59=A0am, D Yuniskis wrote: [...]

The IBM-360 had no real stack but did have a store-multiple and load multiple. If you wanted to you could make a stack.

- J
- Joel Koltner
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:45 AM

Stacks? Bah, that's way too fancy! You sure don't need them... in fact, you don't even need add instructions -- just use lookup tables, like the IBM CADETs ("Can't Add, Doesn't Even Try") did! -->

formatting link

(...which had a *one* level stack...)

I seem to recall that in early versions of FORTRAN you could redefine numeric literals as well, so that 2+2 didn't necessarily have to equal 4 anymore. :-)

---Joel

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 3:10 AM

Yes, they tried to make everything an "object". It was too wacky thinking for the time. I suspect it would still be too inefficient even with today['s technology.

That was true of lots of machines. E.g., BAL on old PDP's.

If you look at *big* machines (like the 360 -- I was initially limiting my discussion to just MCU's), there were some really clever architectures "in days past" (e.g., some of the "self timed" modules in Burroughs machines). Amusing given the problems logic had "back then".

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 5:38 AM

Sounds just like an extension of the 80286 segment concept with a separate segment descriptor for each object :-).

However, IBM had success with S/38 and later AS/400 systems that had also somewhat similar object protection model.

Who needs stacks, when the function return address is stored at the function label and the return from that function was just an indirect jump through that label :-).

Who needs recursion or re-entrant code :-).

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 7:30 AM

I think the 432 was just *way* too far ahead of the curve. Sort of like the "registers in memory" that the 99K used. Clever idea but just didn't make the cut.

I'm particularly *fond* of recursive solutions! :>

And, in addition to re-entrant code, I often write self-modifying code! ;-)

Yum, yum!

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 8:41 AM

You needed expensive fast local cache memory to make it fly, but I have to defend the honour of the 99k (and the older 9900 series that had the same architecture). I didn't realise quite how good it was at context switching until later when we ported the same functionality onto 68k hardware. Though on paper the 68k was quicker the 99k could beat it.

And hardly anyone even remembers poor old captain Zilogs Z8000 now. ISTR Ollivetti were about the only ones to make PCs M20s with it in.

Generally a bad idea. For security I favour Harvard architecture. Complete separation of code and data spaces prevents a lot of common forms of attack against system integrity. Flat linear address spaces which are in vogue now may one day be seen as a very bad idea.

Regards, Martin Brown

- J
- JW
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 9:33 AM

IIRC, the Intel 8086 topped out at 10MHz. AMD and NEC may have had faster versions.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 11:43 AM

IIRC 18+EA for complex jumps and lots of other instructions.

Fastest was JMP SI at about 11 cycles. These days speculative execution and multiple arithmetic units makes cycle counting pointless.

NEC V20 & V30 had improved microcode by peephole optimisation ISTR done mostly by a couple of fairly talented interns - obvious reverse engineering infringements but worthwhile for a 10% performance gain. I had one of the first Epson PCs in the UK which was NEC V30 based.

Regards, Martin Brown

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 12:57 PM

Yes, at least later ones, could go a lot faster than 3MHz. The completed system even with the 10MHz CPU clock still was extremely slow on a jump instruction. It had a prefetch buffer that would hold the next instruction or two. On a jump, it took a lot of clock cycles to change CS:IP and then a bunch more to do the (CS*16)+IP and then about 1uS to get the word from the external memory.