64-bit embedded computing is here and now - Page 2

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: 64-bit embedded computing is here and now
On Tuesday, June 8, 2021 at 2:39:29 AM UTC-5, Don Y wrote:
Quoted text here. Click to load it

|> I contend that a good many "32b" implementations are really glorified
|> 8/16b applications that exhausted their memory space.

The only thing that will take more than 4GB is video or a day's worth of photos.
So there is likely to be some embedded aps that need a > 32-bit address space.
Cost, size or storage capacity are no longer limiting factors.

Am trying to puzzle out what a 64-bit embedded processor should look like.
At the low end, yeah, a simple RISC processor.  And support for complex arithmetic
using 32-bit floats?  And support for pixel alpha blending using quad 16-bit numbers?
32-bit pointers into the software?

Re: 64-bit embedded computing is here and now
On 08/06/2021 21:38, James Brakefield wrote:

Could you explain your background here, and what you are trying to get
at?  That would make it easier to give you better answers.

Quoted text here. Click to load it

No, video is not the only thing that takes 4GB or more.  But it is,
perhaps, one of the more common cases.  Most embedded systems don't need
anything remotely like that much memory - to the nearest percent, 100%
of embedded devices don't even need close to 4MB of memory (ram and
flash put together).

Quoted text here. Click to load it

Some, yes.  Many, no.

Quoted text here. Click to load it

Cost and size (and power) are /always/ limiting factors in embedded systems.

Quoted text here. Click to load it

There are plenty to look at.  There are ARMs, PowerPC, MIPS, RISC-V.
And of course there are some x86 processors used in embedded systems.

Quoted text here. Click to load it

Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
CISC devices are RISC.  Not all are simple.

Quoted text here. Click to load it

A 64-bit processor will certainly support 64-bit doubles as well as
32-bit floats.  Complex arithmetic is rarely needed, except perhaps for
FFT's, but is easily done using real arithmetic.  You can happily do
32-bit complex arithmetic on an 8-bit AVR, albeit taking significant
code space and run time.  I believe the latest gcc for the AVR will do
64-bit doubles as well - using exactly the same C code you would on any
other processor.

Quoted text here. Click to load it

You would use a hardware 2D graphics accelerator for that, not the
processor.

Quoted text here. Click to load it

With 64-bit processors you usually use 64-bit pointers.

Re: 64-bit embedded computing is here and now
On Tuesday, June 8, 2021 at 3:11:24 PM UTC-5, David Brown wrote:
Quoted text here. Click to load it

|> Could you explain your background here, and what you are trying to get
at?

Am familiar with embedded systems, image processing and scientific applications.
 Have used a number of 8, 16, 32 and ~64bit processors.   Have also done work in
 FPGAs.  Am semi-retired and when working was always trying to stay ahead of  
 new opportunities and challenges.

Some of my questions/comments belong over at comp.arch

Re: 64-bit embedded computing is here and now
On Tue, 8 Jun 2021 22:11:18 +0200, David Brown


Quoted text here. Click to load it

It certainly is correct to say of the x86 that its legacy, programmer
visible, instruction set is CISC ... but it is no longer correct to
say that the chip design is CISC.

Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
onto the front of what essentially is a load/store RISC.

"Complex" x86 instructions (in RAM and/or $I cache) are dynamically
translated into equivalent short sequences[*] of RISC-like wide format
instructions which are what actually is executed.  Those sequences
also are stored into a special trace cache in case they will be used
again soon - e.g., in a loop - so they (hopefully) will not have to be
translated again.


[*] Actually, a great many x86 instructions map 1:1 to internal RISC
instructions - only a small percentage of complex x86 instructions
require "emulation" via a sequence of RISC instructions.


Quoted text here. Click to load it

Correct.  Every successful RISC CPU has supported a suite of complex
instructions.  


Of course, YMMV.  
George

Re: 64-bit embedded computing is here and now
On 09/06/2021 06:16, George Neuner wrote:
Quoted text here. Click to load it

Absolutely.  But from the user viewpoint, it is the ISA that matters -
it is a CISC ISA.  The implementation details are mostly hidden (though
sometimes it is useful to know about timings).

Quoted text here. Click to load it

And also, some sequences of several x86 instructions map to single RISC
instructions, or to no instructions at all.

It is, of course, a horrendously complex mess - and is a major reason
for x86 cores taking more power and costing more than RISC cores for the
same performance.

Quoted text here. Click to load it

Yes.  People often parse RISC as R(IS)C - i.e., they think it means the
ISA has a small instruction set.  It should be parsed (RI)SC - the
instructions are limited compared to those on a (CI)SC cpu.

Quoted text here. Click to load it


Re: 64-bit embedded computing is here and now
Am 09.06.2021 um 10:40 schrieb David Brown:
Quoted text here. Click to load it


... and at about that time they also abandoned the last traces of their  
original von-Neumann architecture.  The actual core is quite strictly  
Harvard now, treating the external RAM banks more like mass storage  
devices than an actual combined code+data memory.

Quoted text here. Click to load it

That depends rather a lot on who gets to be called the "user".

x86 are quite strictly limited to the PC ecosystem these days: boxes and  
laptops built for Mac OS or Windows, some of them running Linux instead.  
  There the "user" is somebody buying hardware and software from  
completely unrelated suppliers.  I.e. unlike in the embedded world we  
discuss here, the persons writing software for those things had no say  
at all what type of CPU is used.  They're thus not really the "user."  
If they were, they probably wouldn't be using an x86. ;-)

The actual x86 users couldn't care less about the ISA --- the  
overwhelming majority of them haven't the slightest idea what an ISA  
even is.  Some of them used to have a vague idea that there was some  
32bit vs. a 64bit whatchamacallit somewhere in there, but even that has  
surely faded away by now, as users no longer even face the decision  
between them.

Re: 64-bit embedded computing is here and now

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

I meant "the person using the ISA" - i.e., the programmer.  And even
then, I meant low-level programmers who have to understand things like
memory models, cache thrashing, coding for vectors and SIMD, etc.  These
are the people who see the ISA.  I was not talking about the person
wiggling the mouse and watching youtube!


Re: 64-bit embedded computing is here and now
On 6/8/2021 22:38, James Brakefield wrote:
Quoted text here. Click to load it

The real value in 64 bit integer registers and 64 bit address space is
just that, having an orthogonal "endless" space (well I remember some
30 years ago 32 bits seemed sort of "endless" to me...).

Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.

32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP
*numbers* can be quite useful for storing/passing data.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/


Re: 64-bit embedded computing is here and now
On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:

Quoted text here. Click to load it

That depends on what you expect from the OS.  If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.

But, consider how life was before Windows used compartmentalized
applications (and OS).  How easily it is for one "application"
(or subsystem) to cause a reboot -- unceremoniously.

The general direction (in software development, and, by
association, hardware) seems to be to move away from unrestrained
access to the underlying hardware in an attempt to limit the
amount of damage that a "misbehaving" application can cause.

You see this in languages designed to eliminate dereferencing
pointers, pointer arithmetic, etc.  Languages that claim to
ensure your code can't misbehave because it can only do
exactly what the language allows  (no more injecting ASM
into your HLL code).

I think that because you are the sole developer in your
application, you see a distorted vision of what the rest
of the development world encounters.  Imagine handing your
codebase to a third party.  And, *then* having to come
back to it and fix the things that "got broken".

Or, in my case, allowing a developer to install software
that I have to "tolerate" (for some definition of "tolerate")
without impacting the software that I've already got running.
(i.e., its ok to kill off his application if it is broken; but
he can't cause *my* portion of the system to misbehave!)

Quoted text here. Click to load it

32 bit numbers have appeal if you're registers are 32b;
they "fit nicely".  Ditto 64b in 64b registers.

Re: 64-bit embedded computing is here and now
On 6/9/2021 4:29, Don Y wrote:
Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

So how does the linear 64 bt address space get in the way of
any protection you want to implement? Pages are still 4 k and
each has its own protection attributes governed by the OS,
it is like that with 32 bit processors as well (I talk power, I am
not interested in half baked stuff like ARM, risc-v etc., I don't
know if there could be a problem like that with one of these).

There is *nothing* to gain on a 64 bit machine from segmentation,  
assigning overlapping address spaces to tasks etc.

Notice I am talking *logical* addresses, I was explicit about
that.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Re: 64-bit embedded computing is here and now
On 6/9/2021 10:56 AM, Dimiter_Popoff wrote:
Quoted text here. Click to load it

With a linear address space, you typically have to link EVERYTHING
as a single image to place each thing in its own piece of memory
(or use segment based addressing).

I can share code between tasks without conflicting addressing;
the "data" for one instance of the app is isolated from other
instances while the code is untouched -- the code doesn't even
need to know that it is being invoked on different "data"
from one timeslice to the next.  In a flat address space,
you'd need the equivalent of a "context pointer" that you'd
have to pass to the "shared code".  And, have to hope that
all of your context could be represented in a single such
reference!  (I can rearrange physical pages so they each
appear "where expected" to a bit of const CODE).

Similarly, the data passed (or shared) from one task (process) to
another can "appear" at entirely different logical addresses
"at the same time" as befitting the needs of each task WITHOUT
CONCERN (or awareness) of the existence of the other task.
Again, I don't need to pass a pointer to the data; the address
space has been manipulated to make sure it's where it should be.

The needs of a task can be met by resources "harvested" from
some other task.  E.g., where is the stack for your TaskA?
How large is it?  How much of it is in-use *now*?  How much
can it GROW before it bumps into something (because that something
occupies space in "its" address space).

I start a task (thread) with a single page of stack.  And, a
limit on how much it is allowed to consume during its execution.
Then, when it pushes something "off the end" of that page,
I fault a new page in and map it at the faulting address.
This continues as the task's stack needs grow.

When I run out of available pages, I do a GC cycle to
reclaim pages from (other?) tasks that are no longer using
them.

In this way, I can effectively SHARE a stack (or heap)
between multiple tasks -- without having to give any
consideration for where, in memory, they (or the stacks!)
reside.

I can move a page from one task (full of data) to another
task at some place that the destination task finds "convenient".
I can import a page from another network device or export
one *to* another device.

Because each task's address space is effectively empty/sparse,
mapping a page doesn't require much effort to find a "free"
place for it.

I can put constraints on each such mapping -- and then runtime
checks to ensure "things are as I expect":  "Why is this NIC
buffer residing in this particular portion of the address space?"

With a task bound to a semicontiguous portion of memory, it can
deal with that region as if it was a smaller virtual region.
I can store 32b pointers to things if I know that my addresses
are based from 0x000 and the task never extends beyond a 4GB
region.  If available, I can exploit "shorter" addressing modes.

Quoted text here. Click to load it

What do you gain by NOT using it?  You're still dicking with the MMU.
(if you aren't then what value the MMU in your "logical" space?  map
each physical page to a corresponding logical page and never talk to
the MMU again; store const page tables and let your OS just tweek the
base pointer for the TLBs to use for THIS task)

You still have to "position" physical resources in particular places
(and you have to deal with the constraints of all tasks, simultaneously,
instead of just those constraints imposed by the "current task")

Quoted text here. Click to load it


Re: 64-bit embedded computing is here and now
On 6/10/2021 3:12, Don Y wrote:
Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Nothing could be further from the truth. What kind of crippled
environment can make you think that? Code can be position
independent on processors which are not dead by design nowadays.
When I started dps some 27 years ago I allowed program modules
to demand a fixed address on which they would reside. This exists
to this day and has been used 0 (zero) times. Same about object
descriptors, program library modules etc., the first system call
I wrote is called "allocm$", allocate memory. You request a number
of bytes and you get back an address and the actual number of
bytes you were given (it comes rounded by the memory cluster
size, typically 4k (a page). This was the *first* thing I did.
And yes, all allocation is done using worst fit strategy, sometimes
enhanced worst fit - things the now popular OS-s have yet to get to,
they still have to defragment their disks, LOL.

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

So how do you pass the offset from the page beginning if you do
not pass an address.
And how is page manipulation simpler and/or safer than just passing
an address, sounds like a recipe for quite a mess to me.
In a 64 bit address space there is nothing stopping you to
pass addresses or not passing them and allow access to areas
you want to and disallow it elsewhere.
Other than that there is nothing to be gained by a 64 bit architecture
really, on 32 bit machines you do have FPUs, vector units etc.
doing calculation probably faster than the integer unit of a
64 bit processor.
The *whole point* of a 64 bit core is the 64 bit address space.


Quoted text here. Click to load it


Quoted text here. Click to load it

This is the beauty of 64 bit logical address space. You allocate
enough logical memory and then you allocate physical on demand,
this is what MMUs are there for. If you want to grow your stack
indefinitely - the messy C style - you can just allocate it
a few gigabytes of logical memory and use the first few kilobytes
of it to no waste of resources. Of course there are much slicker
ways to deal with memory allocation.


Quoted text here. Click to load it

Quoted text here. Click to load it

This is called "allocate on demand" and has been around
for times immemorial, check my former paragraph.

Quoted text here. Click to load it

This is called "memory swapping", also for times immemorial.
For the case when there is no physical memory to reclaim, that
is.
The first version of dps - some decades ago - ran on a CPU32
(a 68340). It had no MMU so I implemented "memory blocks",
a task can declare a piece  a swap-able block and allow/disallow
its swapping. Those blocks would then be shared or written to disk when
more memory was needed etc., memory swapping without an MMU.
Worked fine, must be still working for code I have not
touched since on my power machines, all those decades later.

Quoted text here. Click to load it

You can do this in a linear address space, too - this is what
the MMU is for.


Quoted text here. Click to load it

So instead of simply passing an address you have to switch page
translation entries, adjust them on each task switch, flush and
sync whatever it takes - does not sound very efficient to me.

Quoted text here. Click to load it

This is the beauty of having the 64 bit address space, you always
have enough logical memory. The "64 bit address space per task"
buys you *nothing*.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Re: 64-bit embedded computing is here and now
On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:

[attrs elided]

Quoted text here. Click to load it

You missed my point -- possibly because this issue was raised
BEFORE pointing out how much DYNAMIC management of the MMU
(typically an OS delegated acticity) "buys you":
     "That depends on what you expect from the OS."

If you can ignore the MMU *completely*, then the OS is greatly
simplified.  YOU (developer) take on the responsibilites of remembering
what is where, etc.  EVERYTHING is visible to EVERYONE and at
EVERYTIME.  The OS doesn't have to get involved in the management
of objects/tasks/etc.  That's YOUR responsibility to ensure
your taskA doesn't go dicking around with taskB's resources.

Welcome to the 8/16b world!

The next step up is to statically deploy the MMU.  You build
a SINGLE logical address space to suit your liking.  Then, map
the underlying physical resources to it as best fits.  And,
this never needs to change -- memory doesn't "move around",
it doesn't change characteristics (readable, writeable,
exeuctable, accessable-by-X, etc.)!

But, you can't then change permissions based on which task is
executing -- unless you want to dick with the MMU dynamically
(or swap between N discrete sets of STATIC page tables that
define the many different ways M tasks can share permissions)

So, you *just* use the MMU as a Memory Protection Unit; you mark
sections of memory that have CODE in them as no-write, you mark
regions with DATA as no-execute, and everything else as no-access.

And that's the way it stays for EVERY task!

This lets you convert RAM to ROM and prevents "fetches" from "DATA"
memory.  It ensures your code is never overwritten and that the
processor never tries to execute out of "data memory" and NOTHING
tries to access address regions that are "empty"!

You've implemented a 1980's vintage protection scheme (this is how
we designed arcade pieces, back then, as you wanted your CODE
and FRAME BUFFER to occupy the same limited range of addresses)

<yawn>

Once you start using the MMU to dynamically *manage* memory (which
includes altering protections and re-mapping), then the cost of the
OS increases -- because these are typically things that are delegated
*to* the OS.

Whether or not you have overlapping address spaces or a single
flat address space is immaterial -- you need to dynamically manage
separate page tables for each task in either scheme.  You can't
argue that the OS doesn't need to dick with the MMU "because it's
a flat address space" -- unless you forfeit those abilities
(that I illustrated in my post).

If you want to compare a less-able OS to one that is more featured,
then its disingenuous to blame that on overlapping address spaces;
the real "blame" lies in the support of more advanced features.

The goal of an OS should be to make writing *correct* code easier
by providing features as enhancements.  It's why the OS typically
reads disk files instead of replicating that file system and driver
code into each task that needs to do so.  Or, why it implements
delays/timers -- so each task doesn't reinvent the wheel (with its
own unique set of bugs).

You can live without an OS.  But, typically only for a trivial
application.  And, you're not likely to use a 64b processor just
to count characters received on a serial port!  Or as an egg timer!

Quoted text here. Click to load it

YOU pass an object to the OS and let the OS map it where *it*
wants, with possible hints from the targeted task (logical address
space).

I routinely pass multiple-page-sized objects around the system.

"Here's a 20MB telephone recording, memory mapped (to wherever YOU,
its recipient, want it).  Because it is memory mapped and has its
own pager, the actual amount of physical memory that is in use
at any given time can vary -- based on the resource allocation
you've been granted and the current resource availability in the
system.  E.g., there may be as little as one page of physical
data present at any given time -- and that page may "move" to
back a different logical address based on WHERE you are presently
looking!

Go through and sort out when Bob is speaking and when Tom is speaking.
"Return" an object of UNKNOWN length that lists each of these time
intervals along with the speaker assumed to be talking in each.  Tell
me where you (the OS) decided it would best fit into my logical address
space, after consulting the hint I provided (but that you may not have
been able to honor because the result ended up *bigger* than the "hole"
I had imagined it fitting into).  No need to tell me how big it really
is as I will be able to parse it (cuz I know how you will have built that
list) and the OS will track the memory that it uses so all I have to  do
is free() it (it may be built out of 1K pages, 4K pages, 16MB pages)!"

How is this HARDER to do when a single task has an entire 64b address
space instead of when it has to SHARE *a* single address space among
all tasks/objects?

Quoted text here. Click to load it

The MMU has made that mapping a "permanent" part of THIS task's
address space.  It isn't visible to any other task -- why *should*
it be?  Why does the pointer need to indirectly reflect the fact
that portions of that SINGLE address space are ineligible to
contain said object because of OTHER unrelated (to this task) objects??

Quoted text here. Click to load it

And I can't do that in N overlapping 64b address spaces?

The only "win" you get is by exposing everything to everyone.
That's not the way software is evolving.  Compartmentalization
(to protect from other actors), opacity (to hide implementation
details), accessors (instead of exposing actual data), etc.

This comes at a cost -- in performance as well as OS design.
But, *seems* to be worth the effort, given how "mainstream"
development is heading.

Quoted text here. Click to load it

No, the whole point of a 64b core is the 64b registers.
You can package a 64b CPU so that only 20! address lines
are bonded out.  This limits the physical address space
to 20b.  What value to making the logical address
space bigger -- so you can leave gaps for expansion
between objects??

Quoted text here. Click to load it

Again, how is this any harder with "overlapping" 64b address spaces?
Or, how is it EASIER with nonoverlap?

Quoted text here. Click to load it

I'm not trying to be "novel".  Rather, showing that these
features come from the MMU -- not a "nonoverlapping"
(or overlapping!) address space.

I.e., the take away from all this is the MMU is the win
AND the cost for the OS.  Without it, the OS gets simpler...
and less capable!

Quoted text here. Click to load it

There's no disk involved.  The amount of physical memory
is limited to what's on-board (unless I try to move resources
to another node or -- *gack* -- use a scratch table in the RDBMS
as a backing store).

Recovering "no longer in use" portions of stack is "low hanging fruit";
look at the task's stack pointer and you know how much allocated stack
is no longer in use.  Try to recover it (of course, the task
may immediately fault another page back into play but that's
an optimization issue).

If there is no "low hanging fruit", then I ask tasks to voluntarily
relinquish memory.  Some tasks may have requested "extra" memory
in order to precompute results for future requests/activities.
If it was available -- and if the task wanted to "pay" for it -- then
the OS would grant the allocation (knowing that it could eventually
revoke it!)  They could relinquish those resources at the expense of
having to recompute those things at a later date ("on demand" *or* when
memory is again available).

If I can't recover enough resources "voluntarily", then I
*take* memory away from a (selected) task and inform it
(raise an exception that it will handle as soon as it gets
a timeslice) of that "theft".  It will either recover from
the loss (because it was being greedy and didn't elect
to forfeit excess memory that it had allocated when I asked,
earlier) *or* it will crash.  <shrug>  When you run out
of resources, SOMETHING has to give (and the OS is better
suited to determining WHAT than the individual tasks are...
they ALL think *they* are important!)

Again, "what do you expect from your OS?"

Quoted text here. Click to load it

Yes, see?  There's nothing special about a flat address space!

Quoted text here. Click to load it

It's not intended to be fast/efficient.  It's intended to ensure
that the recipient -- AND ONLY THE RECIPIENT -- is *now*
granted access to that page's contents.  depending on semantics,
it can create a copy of an object or "move" the object, leaving
a "hole" in the original location.

[I.e., if move semantics, then the original owner shouldn't be
trying to access something that he's "given away"!  Any access,
by him, to that memory region should signal a fatal exception!]

If you don't care who sees what, then you don't need the MMU!
And we're back to my initial paragraph of this reply!  :>

Quoted text here. Click to load it

If "always having enough logical memory" is such a great thing,
isn't having MORE logical memory (because you've moved other
things into OVERLAPPING portions of that memory space) an
EVEN BETTER thing?

Again, what does your flat addressing BUY the OS in terms of
complexity reduction?  (your initial assumption)
    "...a big difference to how the OS is done"

Re: 64-bit embedded computing is here and now
On 6/10/2021 16:55, Don Y wrote:
Quoted text here. Click to load it
 >
Don, this becomes way too lengthy and repeating itself.

You keep on saying that a linear 64 bit address space means exposing
everything to everybody after I explained this is not true at all.

You keep on claiming this or that about how I do things without
bothering to understand what I said - like your claim that I use the MMU
for "protection only".
NO, this is not true either. On 32 bit machines - as mine in
production are - mapping 4G logical space into say 128M of physical
memory goes all the way through page translation, block translation
for regions where page translation would be impractical etc.
You sound the way I would have sounded before I had written and
built on for years what is now dps. The devil is in the detail :-).

You pass "objects", pages etc. Well guess what, it *always* boils
down to an *address* for the CPU. The rest is generic talk.
And if you choose to have overlapping address spaces when you
pass a pointer from one task to another the OS has to deal with this
at a significant cost.
In a linear address space, you pass the pointer *as is* so the OS does
not have to deal with anything except access restrictions.
In dps, you can send a message to another task - the message being
data the OS will copy into that tasks memory, the data being
perfectly able to be an address of something in another task's
memory. If a task accesses an address it is not supposed to
the user is notified and allowed to press CR to kill that task.
Then there are common data sections for groups of tasks etc.,
it is pretty huge really.

The concept "one entire address space to all tasks" is from the 60-s
if not earlier (I just don't know and don't care to check now) and it
has done a good job while it was necessary, mostly on 16 bit CPUs.
For today's processors this means just making them run with the
handbrake on, *nothing* is gained because of that - no more security
(please don't repeat that "expose everything" nonsense), just
burning more CPU power, constantly having to remap addresses etc.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/



Re: 64-bit embedded computing is here and now
On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
Quoted text here. Click to load it

Task A has built a structure -- a page worth of data residing
at 0x123456.  It wants to pass this to TaskB so that TaskB can perform
some operations on it.

Can TaskB acccess the data at 0x123456 *before* TaskA has told it
to do so?

Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it?

Can TaskA alter the data at 0x123456 *after* it has "passed it along"
to TaskB -- possibly while TaskB is still using it?

Quoted text here. Click to load it

I didn't say that YOU did that.  I said that to be able to ignore
the MMU after setting it up, you can ONLY use it to protect
code from alteration, data from execution, etc.  The "permissions"
that it applies have to be invariant over the execution time of
ALL of the code.

So, if you DON'T use it "for protection only", then you are admitting
to having to dynamically tweek it.

*THIS* is the cost that the OS incurs -- and having a flat address
space doesn't make it any easier!  If you aren't incurring that cost,
then you're not protecting something.

Quoted text here. Click to load it

Yes, the question is "who manages the protocol for sharing".
Since forever, you could pass pointers around and let anyone
access anything they wanted.  You could impose -- but not
ENFORCE -- schemes that ensured data was shared properly
(e.g., so YOU wouldn't be altering data that *I* was using).

[Monitors can provide some structure to that sharing but
are costly when you consider the number of things that may
potentially need to be shared.  And, you can still poke
directly at the data being shared, bypassing the monitor,
if you want to (or have a bug)]

But, you had to rely on programming discipline to ensure this
worked.  Just like you have to rely on discipline to ensure
code is "bugfree" (how's that worked for the industry?)

Quoted text here. Click to load it

How does your system handle the above example?  How do you "pass" the
pointer from TaskA to TaskB -- if not via the OS?  Do you expose a
shared memory region that both tasks can use to exchange data
and hope they follow some rules?  Always use synchronization
primitives for each data exchange?  RELY on the developer to
get it right?  ALWAYS?

Once you've passed the pointer, how does TaskB access that data
WITHOUT having to update the MMU?  Or, has TaskB had access to
the data all along?

What happens when B wants to pass the modified data to C?
Does the MMU have to be updated (C's tables) to grant that
access?  Or, like B, has C had access all along?  And, has
C had to remain disciplined enough not to go mucking around
with that region of memory until A *and* B have done modifying
it?

I don't allow anyone to see anything -- until the owner of that thing
explicitly grants access.  If you try to access something before it's
been made available for your access, the OS traps and aborts your
process -- you've violated the discipline and the OS is going to
enforce it!  In an orderly manner that doesn't penalize other
tasks that have behaved properly.

Quoted text here. Click to load it

So, you don't use the MMU to protect TaskA's resources from TaskB
(or TaskC!) access.  You expect LESS from your OS.

Quoted text here. Click to load it

What are the addresses "it's not supposed to?"  Some *subset* of
the addresses that "belong" to other tasks?  Perhaps I can
access a buffer that belongs to TaskB but not TaskB's code?
Or, some OTHER buffer that TaskB doesn't want me to see?  Do
you explicitly have to locate ("org") each buffer so that you
can place SOME in protected portions of the address space and
others in shared areas?  How do you change these distinctions
dynamically -- or, do you do a lot of data copying from
"protected" space to "shared" space?

Quoted text here. Click to load it

Again, you expose things by default -- even if only a subset
of things.  You create shared memory regions where there are
no protections and then rely on your application to behave and
not access data (that has been exposed for its access) until
it *should*.

Everybody does this.  And everyone has bugs as a result.  You
are relying on the developer to *repeatedly* implement the sharing
protocol -- instead of relying on the OS to enforce that for you.

It's like putting tons of globals in your application -- to
make data sharing easier (and, thus, more prone to bugs).

You expect less of your OS.

My tasks are free to do whatever they want in their own protection domain.
They KNOW that nothing can SEE the data they are manipulating *or*
observe HOW they are manipulating it or *influence* their manipulation
of it.

Until they want to expose that data.  And, then, only to those entities
that they think SHOULD see it.

They can give (hand-off) data to another entity -- much like call-by-value
semantics -- and have the other entity know that NOTHING that the
original "donor" can do AFTER that handoff will affect the data that
has been "passed" to them.

Yet, they can still manipulate that data -- update it or reuse that
memory region -- for the next "client".

The OS enforces these guarantees.  Much more than just passing along
a pointer to the data!  Trying to track down the donor's alteration
of data while the recipient is concurrently accessing it (multiple
tasks, multiple cores, multiple CPUs) is a nightmare proposition.
And, making an *unnecessary* copy of it is a waste of resources
(esp if the two parties actually ARE well-behaved)

Quoted text here. Click to load it

Remapping is done in hardware.  The protection overhead is a
matter of updating page table entries.  *You* gain nothing by creating
a flat address space because *you* aren't trying to compartmentalize
different tasks and subsystems.  You likely protect the kernel's
code/data from direct interference from "userland" (U/S bit) but
want the costs of sharing between tasks to be low -- at the expense
of forfeiting protections between them.

*Most* of the world consists of imperfect coders.  *Most* of us have
to deal with colleagues (of varying abilities) before, after and
during our tenure running code on the same CPU as our applications.

     "The bug is (never!) in my code!  So, it MUST be in YOURS!"

You can either stare at each other, confident in the correctness
of your own code.  Or, find the bug IN THE OTHER GUY'S CODE
(you can't prove yours is correct anymore than he can; so you have to
find the bug SOMEWHERE to make your point), effectively doing his
debugging *for* him.

Why do you think desktop OS's go to such lengths to compartmentalize
applications?  Aren't the coders of application A just as competent
as those who coded application B?  Why would you think application
A might stomp on some resource belonging to application B?  Wouldn't
that be a violation of DISCIPLINE (and outright RUDE)?

You've been isolated from this for far too long.  So, don't see
what it's like to have to deal with another(s)' code impacting
the same product that *you* are working on.

Encapsulation and opacity are the best ways to ensure all interactions
to your code/data are through permitted interfaces.
   "Who overwrote my location 0x123456?  I know *I* didn't..."
   "Who turned on power to the motor?  I'm the only one who should do so!"
   "Who deleted the log file?"
There's a reason we eschew globals!

I can ensure TaskB can't delete the log file -- by simply denying him
access to logfile.delete().  But, letting him use logfile.append()
as much as he wants!  At the same time, allowing TaskA to delete or
logfile.rollover() as it sees fit -- because I've verified that
TaskA does this appropriately as part of its contract.  And, there's
no NEED for TaskB to ever do so -- it's not B's responsibility
(so why allow him the opportunity to ERRONEOUSLY do so -- and then
have to chase down how this happened?)

If TaskB *tries* to access logfile.delete(), I can trap to make his
violation obvious:  "Reason for process termination: illegal access"

And, I don't need to do this with pointers or hardware protection
of the pages in which logfile.delete() resides!  I just don't let
him invoke *that* method!  I *expect* my OS to provide these mechanisms
to the developer to make his job easier AND the resulting code more robust.

There is a cost to all this.  But, *if* something misbehaves, it leaves
visible evidence of its DIRECT actions; you don't have to wonder WHEN
(in the past) some datum was corrupted that NOW manifests as an error
in some, possibly unrelated, manner.

Of course, you don't need any of this if you're a perfect coder.

You don't expose the internals of your OS to your tasks, do you?
Why?  Don't you TRUST them to observe proper discipline in their
interactions with it?  You trust them to observe those same
disciplines when interacting with each other...  Why can't TaskA
see the preserved state for TaskB?  Don't you TRUST it to
only modify it if it truly knows what it's doing?  Not the result
of resolving some errant pointer?

Welcome to the 70's!

Re: 64-bit embedded computing is here and now
On 6/11/2021 0:09, Don Y wrote:
Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

If task A does not want any of the above it just places them in a
page to which it only has access. Or it can allow read access only.
*Why* do you confuse this with linear address space? What does the
one have to do with the other?

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Of course dps is dealing with it, all the time. The purpose of the
linear *logical* address space is just orthogonality and simplicity,
like not having to remap passed addresses (which can have a lot
of further implications, like inability to use addresses in another
tasks structure).

Quoted text here. Click to load it

Quoted text here. Click to load it

Oh but it does - see my former paragraph.


Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it


Quoted text here. Click to load it




I already explained that. If task A wants to leave a message
into task B memory it goes through a call (signd7$ or whatever,
there are variations) and the message is left there.
If task A did not want to receive messages it won't even be
attempted by the OS, will return a straight error (task does not
support... whatever). If the message is illegal the result is
similar. And if it happens that task A tries to access directly
memory of task B which it is not supposed to it will just go to
the "task A memory access violation. Press CR to kill it".

You have to rely on the developer to get it right if they
write supervisor code. Otherwise you need not.
The signalling system works in user mode though you can
write supervisor level code which uses it, but if you
are allowed to write at that level you can mess up pretty
much everything, I hope you are not trying to wrestle
*that* one.

Quoted text here. Click to load it

Quoted text here. Click to load it


By just writing to the address task A has listed for the
purpose. It is not in a protected area so the only thing
the MMU may have to do is a tablewalk.

*THIS* demonstrates the advantage of the linear logical
address space very well.

Quoted text here. Click to load it

Quoted text here. Click to load it

Either of these has its area which allows messaging. I don't
see what you want to achieve by making it only more cumbersome
(but not less possible) to do.

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it


So well, how is the linear address space in your way of doing that?
It certainly is not in my way when I do it.

Quoted text here. Click to load it


Why on Earth do you think that? And what does the linear address space
have to do with *any* of it?
Pages can be as small as 4k why do you not just have them properly
setup upon task start or at some time by having the page which
can receive messages open to accesses and the rest closed?
And again, how on Earth do you see any relevance between a linear
logical address space and all this.

Quoted text here. Click to load it



Quoted text here. Click to load it

Quoted text here. Click to load it

This is up to the tasks, they can make system calls to mark
pages non-swappable, write protected etc., you name it.
And again, ***this has nothing to do with orthogonality
of the logical address space***.

Quoted text here. Click to load it

Quoted text here. Click to load it

Why would you want to protect regions you don't want protected?
The common data sections are quite useful when you write a largish
piece of software which runs as multiple tasks in multiple
windows - e.g. nuvi, the spectrometry software - it has
multiple "display" windows, a command window into which
one can also run dps scripts etc., why would you want to
deprive them of that common section? They are all part of the
same software package.
But I suppose you are not that far yet since you still wrestle
scheduling and memory protection.


Quoted text here. Click to load it

Quoted text here. Click to load it

Not at all. And for I don't know which time, this has 0% to do
with the linearity of the logical address space which is what
you objected.

Please let us just get back to it and just agree with the obvious,
which is that linear logical address space has *nothing* to do
with security.


Leave DPS alone. DPS is a large thing and even I could not
tell you everything about it even if I had the weeks it would
take simply because there are things I have to look at to
remember. Please don't try to tell the world how the OS you want
to write is better than what you simply do not know.
Tell me about the filesystem you have
implemented for it (I'd say you have none by the way you
sound), how you implemented your tcp/ip stack, how your
distributed file system works (in dps, I have dfs - a device
driver which allows access to remote files just as if they
are local provided the dfs server has allowed access to that
user/path etc.). Then tell me how you implemented windowing,
how do you deal with offscreen buffering, how do you refresh
which part and how do you manipulate which gets pulled where
etc. etc., it is a long way to go but once you have some
screenshots it will be interesting to compare this or that.
Mine are there to see and well, I have not stopped working
either.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/








Re: 64-bit embedded computing is here and now
On 6/10/2021 3:13 PM, Dimiter_Popoff wrote:
Quoted text here. Click to load it

As I tease more of your design out of you, it becomes apparent why
you "need" a flat address space.  You push much of the responsibility
for managing the environment into the developer's hands.  *He* decides
which regions of memory to share.  He talks to the MMU (even if through
an API).  He directly retrieves values from other tasks.  Etc.

So, he must be able to get anywhere and do anything at any time
(by altering permissions, if need be).

By contrast, I remove all of that from the developer's shoulders.
I only expect a developer to be able to read the IDL for the
objects that he "needs" to access and understand the syntax required
for each such access (RMI/RPC).  The machine LOOKS like it is
a simple uniprocessor with no synchronization issues that the
developer has to contend with, no network addressing, no cache
or memory management, etc.

EVERYTHING is done *indirectly* in my world.  Much like a file system
interface (your developer doesn't directly write bytes onto the disk
but, rather, lets the file system resolve a filename and create
a file handle which is then used to route bytes to the media).

The interface to EVERYTHING in my system is through such an
extra layer of indirection.  Because things exist in different
address spaces, on different processors, etc. the OS mediates
all such accesses.  ALL of them!  Yes, it's inefficient.  But,
the processor runs at 500MHz and I have 244 of them in my
(small!) alpha site -- I figure I can *afford* to be a little
inefficient (especially as you want to *minimize* interactions
between objects just as a general design goal)

Because of this, I can enforce fine-grained protection mechanisms;
I can let you increment a counter -- but not decrement it
(assuming a counter is an object).  Or, let you read its contents
but never alter them.  Meanwhile, some other client (task)
can reset it but never read it.

And, the OS can act as a bridge/proxy to an object residing on
a different node -- what "address" do you access to reference
Counter 34 on node 56?  Who tells you that it resides on 56
and hasn't been moved to node 29??

Because the OS can provide that proxy interface, I can *move*
an object between successive accesses -- without its clients
knowing this has happened.  As if the file server you
are accessing had suddenly been replaced by another machine
at a different IP address WHILE you were accessing files!

Likewise, because the access is indirect, I can interpose
an agency on selective objects to implement redundancy for that
object without the client using THAT interface ever knowing.

Or, support different versions of an interface simultaneously
(which address do you access to see the value of the
counter as an unsigned binary long?  which address to see
it as a floating point number?  which address to see it
as an ASCII string?)

Note that I can do all of these things with a flat *or* overlapping
address space.  Because a task doesn't need to be able to DIRECTLY
access anything -- other than the system trap!

You, on the other hand, have to build a different mechanism (e.g.,
your distributed filesystem) to access certain TYPES of objects
(e.g., files) without concern for their location.  That ability
comes "free" for EVERY type of object in my world.

It is essential as I expect to be interacting with other nodes
continuously -- and those nodes can be powered up or down
independent of my wishes.  Can I pull a board out of your MCA
and expect it to keep running?  Unplug one of my nodes (or
cut the cable, light it on fire, etc.) and there will be
a hiccup while I respawn the services/objects that were
running on that node to another node.  But, clients of
those services/objects will just see a prolonged RMI/RPC
(if one was in progress when the node was killed)

Note that I've not claimed it is "better".  What I have claimed
is that it "does more" (than <whatever>).  And, because it does
more (because I EXPECT it to), any perceived advantages of a
flat address space are just down in the "noise floor".  They
don't factor into the implementation decisions.  By the time I
"bolted on" these features OUTSIDE your OS onto your implementation,
I'd have a BIGGER solution to the same problem!

["Access this memory address directly -- unless the object you want
has been moved to another node.  In which case, access this OTHER
address to figure out where it's gone to; then access yet another
address to actually do what you initially set out to do, had the
object remained 'local'"]

This sums up our differences:

Quoted text here. Click to load it

Why WOULDN"T you want to protect EVERYTHING??

Sharing should be an exception.  It should be more expensive to
share than NOT to share.  You don't want things comingling unless
they absolutely MUST.  And, the more such interaction, the more
you should look at the parties involved to see if refactoring
may be warranted.  "Opaque" is the operative word.  The more
you expose, the more interdependencies you create.

Re: 64-bit embedded computing is here and now
On 6/11/2021 7:55, Don Y wrote:
Quoted text here. Click to load it





It is not true that the developer is in control of all that. Messaging
from one task to another goes through a system call.

Anyway, I am not interested in discussing dps here/now.

The *only* thing I would like you to answer me is why you think
a linear 64 bit address space can add vulnerability to a design.

Dimiter



Re: 64-bit embedded computing is here and now
On 6/11/2021 4:14 AM, Dimiter_Popoff wrote:
Quoted text here. Click to load it

But the client directly retrieves the values.  The OS doesn't provide
them (at least, that's what you said previously)

Quoted text here. Click to load it

Please tell me where I said it -- in and of itself -- makes a
design vulnerable?

HOW any aspect of an MCU is *used* is the cause of vulnerability;
to internal bugs, external threats, etc.  The more stuff that's exposed,
the more places fault can creep into a design.  It's why we litter code
with invariants, check for the validity of input parameters, etc.
Every interface is a potential for a fault; and an *opportunity*
to bolster your confidence in the design (by verifying the interfaces
are being used correctly!)

[Do you think all of these ransomware attacks we hear of are
the result of developers being INCREDIBLY stupid?  Or, just
"not paranoid enough"??]

Turning off an MMU (when you have one available) is obviously
putting you in a more "exposed" position than correctly
*using* it (all else being equal).  Unless, of course, you
don't have the skills to use it properly.

There are firewire implementations that actually let the external
peripheral DMA directly into the host's memory.  Any fault in the
implementation *inside* the host obviously exposes the internals
of the system to an external agent.  Can you be 100.0% sure that
the device you're plugging in (likely sold with your type of
computer in mind and, thus, aware of what's where, inside!) is
benign?

<https://en.wikipedia.org/wiki/DMA_attack

Is there anything *inherently* wrong with DMA?  Or Firewire?  No.
Do they create the potential for a VULNERABILITY in a system?  Yes.
The vulnerability is a result of how they are *used*.

My protecting-everything-from-everything-else is intended to eliminate
unanticipated attack vectors before a hostile actor (third party
software or external agent) can discover an exploit.  Or, a latent
bug can compromise the proper operation of the system.  It's why I
*don't* have any global namespaces (if you can't NAME something,
then you can't ACCESS it -- even if you KNOW it exists, somewhere;
controlling the names you can see controls the things you can access)

It's why I require you to have a valid "Handle" to every object with
which you want to interact; if you don't have a handle to the
object, then you can't talk to it.  You can't consume it's resources
or try to exploit vulnerabilities that may be present.  Or, just plain
ask it (mistakenly) to do something incorrect!

It's why I don't let you invoke EVERY method on a particular object,
even if you have a valid handle!  Because you don't need to be ABLE
to do something that you don't NEED to do!  Attempting to do so
is indicative of either a bug (because you didn't declare a need
to access that method when you were installed!) or an attempted
exploit.  In either case, there is no value to letting you continue
with a simple error message.

<https://en.wikipedia.org/wiki/Principle_of_least_privilege

It's why each object can decide to *sever* your "legitimate" connection
to any of it's Interfaces if it doesn't like what you are doing
or asking it to do.  "Too bad, so sad.  Take it up with Management!
And, no, we won't be letting you get restarted cuz we know there's
something unhealthy about you!"

It's why access controls are applied on the *client* side of
a transaction instead of requiring the server/object to make
that determination (like some other capability-based systems).
Because any server-side activities consume the server's
resources, even if it will ultimately deny your request
(move the denial into YOUR resources)

It's why I enforce quotas on the resources you can consume -- or
have others consume for your *benefit* -- so an application's
(task) "load" on the system can be constrained.

If you want to put staff in place to vet each third party application
before "offering it in your store", then you have to assume that
overhead -- and HOPE you catch any malevolent/buggy actors before
folks install those apps.  I think that's the wrong approach as
it requires a sizeable effort to test/validate any submitted
application "thoroughly" (you end up doing the developer's work
FOR him!)

Note that bugs also exist, even in the absence of "bad intent".
Should they be allowed to bring down your product/system?  Or,
should their problems be constrained to THEIR demise??

[I'm assuming your MCA has the ability to "print" hardcopy
of <whatever>.  Would it be acceptable if a bug in your print
service brought down the instrument?  This *session*?
Silently corrupted the data that it was asked to print?]

ANYTHING (and EVERYTHING) that I can do to make my system more robust
is worth the effort.  Hardware is cheap (relatively speaking).
Debugging time is VERY costly.  And, "user inconvenience/aggravation"
is *outrageously* expensive!  I let the OS "emulate" features that
I wished existed in the silicon -- because, there, they would
likely be less expensive to utilize (time, resources)

This is especially true in my alpha site application.  Imagine being
blind, deaf, wheelchair confined, paralyzed/amputee, early onset
altzheimers, or "just plain old", etc. and having to deal with something
that is misbehaving ALL AROUND YOU (because it pervades your home
environment).  It was intended to *facilitate* your continued presence
in YOUR home, delaying your transfer to an a$$i$ted care facility.
Now, it's making life "very difficult"!

"Average Joes" get pissed off when their PC misbehaves.
   Imagine your garage door opening in the middle of the night.
   Or, the stereo turns on -- loud -- while you're on the phone.
   Or, the phone hangs up mid conversation.
   Or, the wrong audio stream accompanies a movie you're viewing.
   Or, a visitor is announced at the front door, but noone is there!
   Or, the coffee maker turned on too early and your morning coffee is mud.
   Or, the heat turns on midafternoon on a summer day.
   Or, the garage door closes on your vehicle as you are exiting.
   Or, your bedside alarm goes off at 3AM.
How long will you wait for "repair" in that sort of environment?
When are you overwhelmed by the technology (that is supposed to be
INVISIBLE) coupled with your current condition -- and just throw
in the towel?

YOU can sell a spare MCA to a customer who wants to minimize his
downtime "at any cost".  Should I make "spare houses" available?
Maybe deeply discounted??  :<

What about spare factories??

Re: 64-bit embedded computing is here and now
On 6/11/2021 15:10, Don Y wrote:
Quoted text here. Click to load it




Quoted text here. Click to load it

Quoted text here. Click to load it

I am not sure what this means. The recipient task has advertised a field
where messages can be queued, the sending task makes a system call
designating the message and which task is to receive it; during that
call execution the message is written into the memory of the recipient.
Then at some point later the recipient can see that and process the
message. What more do you need?

Quoted text here. Click to load it

This is how the exchange started:


Quoted text here. Click to load it

Now if you have missed the "logical" word in my post I can
understand why you went into all that. But I was quite explicit
about it.
Anyway, I am glad we agree that a 64 bit logical address space
is no obstacle to security. From there on it can only be something
to make programming life easier.

Dimiter




Quoted text here. Click to load it


Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it


Quoted text here. Click to load it


Quoted text here. Click to load it


Quoted text here. Click to load it


Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it


Quoted text here. Click to load it



Quoted text here. Click to load it

Quoted text here. Click to load it

Quoted text here. Click to load it









Quoted text here. Click to load it


Quoted text here. Click to load it


Site Timeline