So: Anti software bloat

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 10:03 AM

On a sunny day (Fri, 21 May 2010 20:09:49 -0700) it happened Robert Baer wrote in :

with the free code space?

Well, there is a trend towards sanity happeing (at least I hope so). mainly drive by cellphones and other the like devices. Now if they could only kill those Java snails...

I think MS realised[D[D[D[ that, or at least somebody there has noticed the writing on the wall. Within the shortest time the smallest device will be able to do anything Big Bloat Billy can, for less money, with more speed, and less resources, more stability, and without those idiotic auto updates that refuse to switch of your PC because they are installing new bugs. That is why I burned my xp disk, all of the above.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 2:17 PM

l"

ys

The 432 would still need an extra cycle or it would have to have a huge amount of extra hardware to make the translated version of the address table.

The number of elements in the translation table would be far higher than the number of elements in the TLBs. Virtual memory works with pages much larger than just one floating point value. This would make the accesses slower because the speed of memory tends to fall as the sqrt() of the size.

The ALU would end up waiting on all these extra actions and delays. There are good reasons that nobody has gone back to the idea.

Registers maps to a linear chunk of memory could be made to work. For most of the operation, the machine would actually work like ones that don't map the registers to RAM. Only if an access happens to hit the part that is mapped to the registers does anything special need to happen.

- D
- Didi
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 4:08 PM

The 54xx DSPs of TI have a memory-mapped set of registers (but not all, e.g. the 40 bit accumulators are not). Quite efficient, as they are mapped in dual-access memory (two accesses per cpu cycle), it's been a while since I did the assembler for it and wrote stuff but I have no memory of any limitations because of that. In fact I think I could at some point take advantage of the fact the registers could be accessed also this way (with all the pipeline delay implications etc., but I think I did use that for something).

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 9:15 PM

Yes, but even two level TLB's are relatively few in number and a single entry covers a significant amount of memory. I think the 432 dealt with much smaller "objects" than "physical pages".

I think it depends on how many register sets you end up using. I.e., one tends to think of register set in the context of a thread state. However, I could easily see compilers using this feature to implement small stack frames (I bet most blocks could benefit from a single "register context"!). If that's the case, you'll frequently have cache misses as you enter new blocks, etc. (of course, you will win

*while* in that context... but I am not sure what the overall cost would be for those misses vs. having a register file *in* the processor.)

Sounds like a good (safe) "senior project" :>

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 9:23 PM

Some OS's *do*. I rely on "write protection" to support CoW in my DSM implementation.

Some newer processors cut corners on their "memory protection" schemes (basically, just giving you write protection and some "access protection"). Many also don't implement real virtual memory (restartable instructions, etc.).

There is a *huge* difference in "feel" when working on "cheap" systems vs. more "full fledged" platforms. You do things very differently and lean on the hardware a lot more to give you extra capabilities.

Yes. With some machines you can hack the (cheap) memory protection units *dynamically* -- but, that requires a lot of care in how you set up the linkage editor for the build.

It's *really* nice to be able to have the OS "ride herd" over "tasks" (let's not debate what those are) and bring errant tasks under control without jeopardizing the rest of the system.

It's also a nice way of doing things as you can move things out of the kernel which makes them easier to maintain *and* more powerful.

Ha! ----------------------------^^^^^^^^^^^ :>

I do almost everything via RPC/IPC. One big advantage is that it makes it relatively easy to move to true multiprocessor systems without relying on SMP/UMA. It gives you a lot more flexibility in applying horsepower to a problem -- move tasks to different processors to get true parallelism, etc.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 9:27 PM

I often use it as an expedient. I.e., rewrite the target of a "call" instead of having to call *through* a pointer or via a jump table, switch or condition tree.

It is especially useful in ISRs where every cycle saved can actually represent a significant portion of the ISR itself. E.g., my ISR's are little RAM-based routines that typically invoke a set of (often unrelated) "handlers" each of which may change from one invocation to the next. Being able to rewrite the targets of each of these "handlers" makes dispatch a lot slicker.

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sat, May 22, 2010 10:19 PM

IIRC, the '432 had a flat memory model, single-level store, much like IBM's FS and AS/400. Every byte, no mater where it was (or what device), had a flat address. So, an object was a byte (pretty sure a '432 byte = 8-bits). It's been too long...

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sun, May 23, 2010 6:27 AM

It depends how you define an "object".

If every array element is a separate object with an own object descriptor, the TLBs would be huge.

However, if some kind of AddressTable[object]+offset addressing is supported, it would make sense to make single objects of each array and structure and use the offset to access the individual element.

To save TLBs, even the function local variables could be considered a structure and handled as a single object. However, the compiler could use discretion to assign objects to each individual variable e.g. during debugging or with high security requirements, alternative all local elementary values could be put into same structure to speed up execution.

Such mixed model would have quite good security and dynamic fragmentation and garbage collection problems could be avoided, when fragmentation can be avoided, by updating the address table and copying data for a specific dynamic memory element.

Of course, if you want a truly object oriented computer, you would need a few extra bits on each byte for the type tag :-).

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sun, May 23, 2010 3:02 PM

The 432 did this. Every object needed a table entry and a single floating point number was an object.

That is what the 286 was sort of doing. The 432 wouldn't hear of it. If you make segments and put some things in different segments then things can't be written off the end of and into a different segment.

As soon as you start letting the compiler decide to combine things you are trusting software for the security. The paged based protection also does this. There is no advantage to the segmentation other than allowing things to be packed tighter than a page. The page based wins out because the extra memory doesn't cost much and the simpler hardware runs faster.

If you just allow translation of the pages, addresses by translating the upper bits of the address, you gain almost all of the avoiding of fragmentation. This still makes for simple hardware but means you can change the logical position of memory sections.

If your version of malloc() uses a smart version of the first fit and last fit method, you get very little fragmentation of memory. The simple first fit does quite well. Adding the option that some things get put as high in memory as possible, gives it a small improvement if the things put at high memory are selected.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sun, May 23, 2010 9:25 PM

solve=20

use=20

=20

plenty of=20

well for=20

The 6502 had a 1-cycle memory read or write, plus typically 2-3 bytes per instruction to read, plus up to two more cycles for a operand read or write. Average throughput about 280 kIP to 300 kIP.

have=20

II in=20

fast=20

eeeeverything=20

frame,=20

in a=20

frame=20

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Sun, May 23, 2010 10:10 PM

have=20

the=20

ISTR=20

The 88000 was a Motorola attempt at a RISC machine.

indirect