So: Anti software bloat

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:08 PM

n

Actually, you can still estimate the time it takes to do a routine by cycle counting. It now takes a lot more work to get very accurate but you can just assume that the cycles to fetch or the cycles to perform the operations will need to happen and take the larger of the two numbers over spans of about 10 instructions. If the processor does parallel instructions just divide by the number it can do in parallel.

Why would anyone bother doing this you may ask. You can quickly throw away the obviously bogus ideas for how to make a routine faster before actually testing them. These days, the table lookup almost always loses to the calculation methods on the basis of bus cycles.

er

I had a machine with IIRC, the V20. It was the 8088 look alike. I put it in the first "PC" I had. It did make the machine just a little faster. I wrote a program that took about an hour to finish. The NEC chip got it done in about 10% less time.

That machine had a 9Meg hard drive in it. It was supposed to be 10Meg but only about 90% of the tracks worked.

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:18 PM

On a sunny day (Thu, 20 May 2010 18:25:26 -0700 (PDT)) it happened MooseFET wrote in :

I know, I wrote a 8052 assembler:

formatting link

The original 8051 was powerful, but did not have a lot of special I/O, none actually. I think these days they have versions with ADC and stuff. Can be nice if you invested time in that architecture. Hope it has on board FLASH too.

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:23 PM

On a sunny day (Fri, 21 May 2010 12:43:13 +0100) it happened Martin Brown wrote in :

I still have an unused V20 or V30 chip, cannot remmeber which one, somewhere. In case anyone stil runs one and needs one :-)

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:25 PM

act, you

ki/IBM_1620=A0(...which had a *one* level stack...)

I programmed an ATT DSP which had a stack for one level and a built in "DO" loop instruction. There were interrupts but you'd better not use the "DO" inside them. In the main line code, the advantage was that the next instruction to be fetched would be the top of the loop again if you were looping.

eric

=A0:-)

Fortran was a "pass by reference" language. It made it possible to some serious optimizations but it also means that the literal constant was only the initial value. If you passed 1.0 to a subroutine that changed it, the changed value was what was in memory from then on.

In the IBM360 case, the Fortran passed the address of the address list. Since the 360 could address up to 4K beyond a base register, the list of addresses could be selected from with just one instruction and the value loaded on the next.

There was a very interesting optimizing compiler for Fortran that could in some cases optimize your program completely out of existence. If you wrote a program that just computed the sqrt(2.0) and printed it, your program would be replaced with one that basically was:

print("1.414")

The optimizer knew that the sqrt(), sin() cos() etc of a constant was a constant and that the formatted printing of a constant was a fixed string.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:29 PM

Since it involves a lookup, I would expect it to be worse with todays technology. ALUs have sped up a lot more than memories. The ALU would end up waiting on the table look up.

[....]

The PDP-8 stored the return address into the word before the subroutine.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:33 PM

It is part of the same sort of brain dead concept group. The 286 was faster than the 432 because it used less of it. If they had instead put all those transistors to good use, they could have got about twice the throughput.

The PDP-8 was in many ways a nice machine. It got a lot done per transistor.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:38 PM

No, the 432 would be worse today than back then. Memory accesses haven't kept up with ALUs adding all those extra table lookups is just a very bad idea.

[....]

I am usually coding for a machine where the code is stored in FLASH or the like so self modifying code is just not an option. Many people don't like self modifications but seem not to understand that the ability to write into a variable is just as dangerous as a GOTO or self modification.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 1:45 PM

On May 21, 1:41=A0am, Martin Brown wrote: [....]

A flat address space doesn't mean that you can't defend the code against attack. Harvard has the problem that you often end up with two versions of your subroutines.

Since a store into memory and a jump are moderately rare operations, it is not much of a problem to check the protection level of the location on those operations.

If you have address space to burn, you can put the code in the lower half and the variables in the upper and prevent the operations you don't want with some simple logic. This makes it Harvard like.

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 2:55 PM

I wrote a tone generator for Z80 once. To generate rests (i.e., unvoiced notes), the best solution was simply:

ld a,(outbit) xorbits: xor 11b ; invert output (warning: adaptive instruction) out (5),a ld (outbit),a

Down where it's fetching the note/rest it says..

ld a,(outbit) and 1100b ; test zero-bits jr z,makerest ; if zero, don't play note

ld a,11101110b ; XOR A,n instruction (plays note) ld (xorbits),a jr outintr

makerest: ld a,11110110b ; OR A,n instruction (output remains high) ld (xorbits),a jr outintr

'Course, the whole operation is fully commented.

I also did this on an x86 raycaster routine. Saves having four copies of the thing (for the four cardinal directions).

It could be handy if C supported this type of construct, then again the language is obfuscated well enough on its own. Plus, lots of architectures don't allow self modifying code, or at least very quickly (cache lines, or worse yet, flash ROM for program space).

Tim

--
Deep Friar: a very philosophical monk.
Website: http://webpages.charter.net/dawill/tmoranwms

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 4:18 PM

Correct. Since you want to be able to more the set of working registers *anywhere*, it has to be a real cache and not just "a region of fast memory". This adds to the complexity of the system (hardware) at a time when MCU's were getting *cheaper* to deploy.

The military liked the Z8000.

After the Z80, Zilog couldn't do *anything* right! (88000, Z800, Z280, etc.)

Write good code and you don't have to worry about those "attacks". Corrupted data leaves you just as vulnerable to "attack" as corrupting the code space. Check all your inputs. Know what your code really *is* doing (not what you *hope* it's doing).

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 11:12 PM

With a serial ALU (like the PDP-8/s).

1MIPS x 16bit with a *lot* more memory.

Very few operations used IN/OUT. The display was memory mapped.

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 11:13 PM

The ultimate CISC was the /360.

- J
- Joel Koltner
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 11:26 PM

Oh yeah, my mistake -- I was just remembering that you had your choise of "really slow text output" on the old CGA cards or "fast but with snow." But that was due to there being no mechanism to deal with the CPU and video controller both accessing video memory at once (well, other than "CPU must win for the sake of data integrity, it's OK if it snows a bit on the screen"), not IN/OUT commands.

And I guess the BIOS text routines were slow as well, right? -- Everyone just bypassed them, writing directly to video memory, after a little bit?

I kinda suspect that for business users the Apple IIgs might have won out over PCs if it'd been released a few years earlier. Or maybe not -- perhaps the availability of clones is what really put PCs on top... and I suppose Apple was competing with itself as well, having released the Macintosh prior to the IIgs.

What computers did you have before PCs took over the world, Keith? :-)

---Joel

- J
- Jamie
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Fri, May 21, 2010 11:35 PM

I started with Punch card main frames, CBM's, Timx, CoCo, Vic 20, C-64, C&D-128 and finally, I got a Commodore IBM clone with turbo mode with a whopping 40 Meg Western Digital HD! :)

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 12:55 AM

Funny, the Paradise EGA chip in my Amstrad (w00t, 256kB EGA in an IBM compatible!) blocks memory access during scans. The processor only gets to play for a few microseconds during horizontal refresh, and a miliseond during vertical. But it has its own RAM (no parity, of course), attached to the peripheral bus, accessible by the processor, so it doesn't hold up programs or I/O unless addressed.

EGA is quite slow if you want to update a 640x350 screen, touching every pixel will easily take a second to complete. Longer if you try doing it in QBasic. :-p However, using the EGA hardware to emulate scrolling, while drawing sprites in 320x200, was the step that enabled the first live action video game on the PC.

There aren't all that many IO-intensive things to do. Switching graphics modes writes to every single register on the adapter once (sometimes twice). Time constants in the monitor's sweep limit any need of this going fast. Palette animation on VGA is a bit tedious, because you have to update 768 bytes of palette registers at once. But VGA machines are a bit faster, and a palette refresh might be done during vertical refresh only, so it's also not a big deal.

I don't know enough about HDD access to say if it involves IO. I suppose there are interrupts involved, and a buffer and DMA, at least on the DMA-supporting cards (IDE+?). I suppose an old MFM/RLL drive may not be fast enough to really require DMA anyway.

Oh, very much so. The graphics routines sucked, too. It's SOP to take the video hardware by the balls if you're doing anything speedy in DOS. Just for the hell of it, I once wrote a buffered mode 13h graphics routine, then decided to run it in EGA... so I made a buffer converter to strip 256 colors down to 16, divide into bitplanes and copy it to the video RAM. Actually runs pretty good on modern hardware, takes a second or two on the vintage hardware though!

Tim

--
Deep Friar: a very philosophical monk.
Website: http://webpages.charter.net/dawill/tmoranwms

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 3:09 AM

Sure there was a mechanism; "don't do that!". Just wait for the retrace before you touch memory. Not a good mechanism, but there was a mechanism. ;-)

Sure. The BIOS routines were meant for console mode I/O, where speed didn't really matter much.

Not likely.

Absolutely. There was nothing special about PCs, other than everyone made them and they all worked, more or less. Good enough, anyway.

The first systems I used were /360s, a /75 in college and then a boatload of /85s at work, though the bubble gum and bailing wire was getting tired on them by '78 or so. They were pretty much junk to begin with[*].

Interesting story about Intel memory on those things. IBM had never used Intel products but since the corporate purchasing group was in the Hudson Valley Intel placed their NE rep in Poughkeepsie. One day he gets a PO for $100M in memory. The memory was never seen again and all the industry rags were wondering where it went. Well, those /85s got brand new memory, their memory got ETNed and shipped with new machines. The power supplies were hacked in and if there was going to be a failure, our engineering people were going to be the ones to suck it up.

Oh, since they had never done any significant business with IBM, the Intel rep had a 1% commission deal as a small incentive. They wanted to renegotiate, "fine, we can negotiate next years, anytime you want."

[*] recalled from customers because they had unsolvable metastability problems.

- R
- Robert Baer
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 3:09 AM

MAXIM chip, I have some,

chip you

hang

or buffer).

bits / second

controller.

analog 10 bit channels, and set 1 PWM output.

with the free code space?

If you are to compete with M$, you _must_ have software bloat!!! The more bloat, the better the competition!

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 7:44 AM

Current virtual memory machines have dedicated translation buffers, these TLBs work just like caches. In normal operation, most virtual to physical address translations are done in TLBs and only occasionally there would be a miss and a new table partition needs to be loaded from main memory.

In the same way, the i432 addressing could work quite effectively these days.

With suitable caching, the TMS9900 style "register set in memory" would also quite effective these days.

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 8:37 AM

The 80386 had read, write and execute bits in the segment descriptor register, why did't they use these to limit the access to different segments ?

In 386 and later models, the segment registers are still there (usually mapped from 0..4 GiB in an overlapped way), before going through the virtual to physical translation.

Already in decent 70's virtual memory minicomputers, each virtual memory page had read, write and execute bits on a page by page basis.

A large complex program running in a single huge address space can be a problem, if something is malfunctioning.

However, now that people have learned how to write multithreaded programs, it should not be a huge leap for most programs to write truly multitasking programs, with each program running in a separate, protected address space. The communication can be handled with OS services or using shared memory areas to share that data that is meant to be shared and not everything.

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 8:42 AM

One reason for the popularity of self modifying code in the old days was simply the lack of usable addressing modes.

Lack of index registers or bad indirect addressing support often forced to modify the effective address part of the instruction, in order to go through each element in a table.