Speaking of Multiprocessing... - Page 2

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: Speaking of Multiprocessing...
On 3/24/2017 8:48 PM, David Brown wrote:
Quoted text here. Click to load it

I guess I didn't explain the full context.  But as I mentioned, it is  
simpler I think to include the swap memory instruction that will allow a  
test and set operation to be implemented atomically without disrupting  
any other functions.  It will use up an opcode, but it would be a simple  
one with the only difference from a normal memory write being the use of  
the read path.

--  

Rick C

Re: Speaking of Multiprocessing...
On 25/03/17 03:10, rickman wrote:
Quoted text here. Click to load it

Well, you are the one implementing this - so you have to figure out what  
solution makes most sense for you.  Here's another idea you could consider.

If you are dealing with just one CPU here, I have always thought a  
"disable interrupts for the next X instructions then restore interrupt  
status" instruction would be handy - with X being something like 4.  
That would let you do atomic reads, writes or read-modify-write  
instructions covering at least two memory addresses, without need for  
special memory or read-write-modify opcodes.



Re: Speaking of Multiprocessing...
On 3/25/2017 10:25 AM, David Brown wrote:
Quoted text here. Click to load it

How is that different from instructions to enable and disable  
interrupts?  This doesn't really help me as there are N logical CPUs.  
They just share the same hardware.  But they all run concurrently in  
nearly every sense.  They just use different clock cycles so that memory  
accesses are not literally concurrent.  So interrupts aren't the (only)  
issue.

--  

Rick C

Re: Speaking of Multiprocessing...

Quoted text here. Click to load it

I thought that SMP stands for _symmetric_ multiprocessing.

Disabling interrupts works well when dealing with one CPU and
peripherals or the case with one master supervisor CPU and a lot of
slave CPUs (AMP).


Re: Speaking of Multiprocessing...
On 3/25/2017 4:43 PM, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

I don't follow how disabling interrupts will help resolve multiple CPUs  
accessing the same memory location?  Disabling interrupts only stops  
other processes on the same CPU from accessing the same memory location.  
  How would that prevent processes on other CPUs from accessing that  
location during the multiple instruction access of the process in  
question?

--  

Rick C

Re: Speaking of Multiprocessing...
On 25/03/17 20:27, rickman wrote:
Quoted text here. Click to load it


An instruction like the one I propose would be safer than the normal  
"disable all interrupts" instruction, because it places a limit on the  
latency of interrupts.  Code that simply disables interrupts could do so  
for an arbitrary length of time - here it is specifically limited.

It is harder to make this work well for your SMT cpu.  Here you might  
change things to say that for these next 4 clock cycles, the current  
logical cpu runs on /every/ clock cycle - all other SMT threads are  
paused.  Depending on how you have organised things, that might be  
simple or it might be nearly impossible.  (On the XMOS, if only one  
thread is running it gets a maximum of 1 cycle out of every 5, with the  
rest wasted - this is due to the 5 stage pipeline of its cpu and that  
each logical cpu can only have one instruction in action at a time.)


Re: Speaking of Multiprocessing...
On 24.3.2017 ?. 19:36, rickman wrote:
Quoted text here. Click to load it

I am not sure a bit per reservation unit will work in a multiprocessor
environment (it certainly will on your single core design, obviously).

Both the reserved address (range, typically) and the reserving processor
will need to know who reserved what. But this is just a "gut feeling",
I did not think seriously about it.
Quoted text here. Click to load it

Test And Set, a classic 68k RMW opcoode. Read a byte (just byte size
allowed), adjust the Z flag (Z IIRC...) to the state of bit 7 (MSB),
then write the byte back with bit 7 set. While doing this the processor
provides some means (holds AS asserted on the 68k IIRC) such that the
access to the address can not be granted to another processor for the
entire duration of the TAS opcode.

Looks like in your case a simple bset (again 68k lingo, bit test and
set, but on two separate memory accesses, potentially interruptible
by another bus master - not by a processor interrupt, it is a single
instruction) will do what you need.

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI             http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/


Re: Speaking of Multiprocessing...
On 3/24/2017 6:42 PM, Dimiter_Popoff wrote:
Quoted text here. Click to load it

Not interruptible in any way, shape or form in my approach.  The read  
and write happen in the same clock cycle - one of the beauties of having  
dual port memory in an FPGA.

--  

Rick C

Re: Speaking of Multiprocessing...
On 03/24/2017 04:22 PM, rickman wrote:
Quoted text here. Click to load it

Sounds similar to the old ARM answer to the problem: SWP Rdst, Rsrc, [addr].



--  
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.

Re: Speaking of Multiprocessing...
wrote:

Quoted text here. Click to load it


I'm pretty sure it predates S/360.

Re: Speaking of Multiprocessing...
On 25.3.2017 ?. 05:47, Robert Wessel wrote:
Quoted text here. Click to load it

I would not know, my first encounter with that sort of thing was
on the 68k. The first processor I designed a board with - which
was the first computer I owned - was the 6809... here are the remnants
of this board (early 80-s):
http://tgi-sci.com/misc/grany09.gif
Had yet to be exposed to the TAS concept when I was making this one :).

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI             http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/




Re: Speaking of Multiprocessing...
wrote:

Quoted text here. Click to load it


My point was merely that TAS is far, far older than you had indicated.
It was a standard instruction on S/360s when they were introduced to
the world in 1964, and I'm pretty sure it was not new then.

Re: Speaking of Multiprocessing...
On 23/03/17 23:38, rickman wrote:
Quoted text here. Click to load it

There are many, many ways to implement synchronisation between threads,
processors, whatever.  In theory, they are mostly equivalent in that any
one can be used to implement the others.  In practice, there can be a
lot of differences in the overheads in the hardware implementation, and
the speed in practice.

Typical implementations are "compare-and-swap" instructions (x86 uses
these) and load-linked/store-conditional (common on RISC systems where
instructions either load /or/ store, not both.  And of course, on single
processor systems there is always the "disable all interrupts" method.

But if you can use dedicated hardware, there are many other methods.
The XMOS devices have hardware support for pipelines and message
passing.  On a dual-core PPC device I used, there is a hardware block of
semaphores.  Each semaphore is a pair of 16-bit ID, 16-bit value that
you can only access as a 32-bit read or write.  You can write to it if
the current ID is 0, or if the ID you are writing matches that of the
semaphore.  There is plenty of scope for variation based on that theme.



Re: Speaking of Multiprocessing...
On 24/03/17 08:17, David Brown wrote:
Quoted text here. Click to load it

I received my first XMOS board from Digi-Key a couple of days
ago, and I'm looking forward to using it for some simple
experiments. I /feel/ that many low-level things will be
much simpler and with fewer potential nasties lurking in
the undergrowth. (I felt the same with the Transputer, for
obvious reasons, but never had a suitable problem at that
time)

With your experience, did you find any undocumented gotchas
and any pleasant or unpleasant surprises?


Re: Speaking of Multiprocessing...
On 24/03/17 10:28, Tom Gardner wrote:
Quoted text here. Click to load it

Before saying anything else, I would first note that my work with XMOS
systems was about four years ago, when they first started getting
popular.  I believe many things that bugged me most have been improved
since then, both in the hardware and software, but some may remain.

I think the devices themselves are a really neat idea.  You have very
fast execution, very efficient hardware multi-threading, very
predictable timings, and a variety of inter-thread and inter-process
communication methods.

Their "XC" programming language was also a neat idea, based on C with
additional primitives to support the hardware features and
multi-threading stuff, and an attempt to make some aspects of C safer
(real arrays, control of when you can access variables, etc.).

However, IMHO the whole thing suffered from a number of serious flaws
that limit the possibilities for the chips.  Sure, they would work well
in some circumstances - but I was left with the feeling that "if only
they had done /this/, the devices would be so much better and could be
used for so many more purposes".  It is a little unfair to concentrate
on the shortcomings rather than the innovations and features, but that
is how I felt when using them.  And again, I know that at least some
issues here have been greatly improved since I last used them.


A obvious flaw with the chips is lack of memory.  The basic device with
one cpu and 8 threads had 64K ram that was for program memory and
run-time data.  There was no flash - you had to use an external SPI
flash which used valuable pins (messing up the use of blocks of 8, 16 or
32 pins), and used up a thread if you wanted to be able to access the
flash at run-time.  And while you could implement an Ethernet MAC or a
480 Mbps USB 2.0 interface on the chip, there was nowhere near enough
ram for buffering or to do anything useful with the interface.  Adding
external memory was ridiculously expensive in terms of pins, threads,
and run-time inefficiency.

The hardware threading is great, and provides a really easy model for
all sorts of things.  To make a UART transmitter, you have a thread that
waits for data coming in on a pipe.  To transmit a bit, you set a pin,
wait for a bit time (using hardware timers), then move on to the next
bit.  The code is simple and elegant.  A UART receiver is not much
harder.  There is lots of example code in this style.

Then you realise that to implement a UART, you have used a quarter of
the chip's resources.  Your elegant flashing light is another thread, as
is your PWM output.  Suddenly you find you are using a 500 MIPS chip to
do the work of a $0.50 microcontroller, and you only have a thread or
two left for the actual application.

And you end up trying to run FreeRTOS on one of your threads, or make
your own scheduler to multiplex several PWM channels in one thread.
Much of the elegance quickly disappears for real-world applications.


Then there is the software.  The XC language lets you write code that
starts tasks in parallel, automatically allocates channels for
communication, lets you declare timers and wait on them.  That's all
great in theory - but it quickly gets confusing when you try to figure
out the details of when you can pass these around, when they get
allocated and deallocated, or when you can have a thread create new
threads.  XC carefully tracks threads and data accesses, spotting and
blocking all sorts of possible race conditions.  If a variable is
written by one thread, then it can't be accessed from another.  You can
work with arrays safely, but you can't take addresses.  Data gets passed
between threads using communication channels that are safe from race
conditions and nicely synchronised.

And then you realise that to actually make the thing work, you would
need far more channels than there are on the device, and they would need
to be far faster - all you really wanted was for two threads to share a
circular buffer, and you know in your application code when it is safe
to use it.  But you can't do that in XC - the language and the tools
won't let you.  So you have to write that code in C, with calls back and
forth with the XC code that handles the multi-threading stuff.

And then you realise that from within the C, you need to access some
hardware resources like timers, that can't be expressed properly in C,
and you can't get back to the XC code at the time.  So you end up with
inline assembly.


Then there are the libraries and examples.  These were written in such a
wide variety of styles that it was impossible to figure out what was
going on.  A typical example project would involve a USB interface and,
for example, SPDIF channels for an USB audio interface.  The
Eclipse-based IDE was fine, but the example did not come as a project -
it came as a collection of interdependent projects.  Some bits referred
to files in different projects.  Some bits merely required other
projects to be compiled.  Some bits of the code in one project would use
assembly for hardware resources, others would use XC, others would use C
intrinsic functions, and others would use a sort of XML file that
defines the setup for your chip resources.  If you change values in one
file in one project (say, the USB vendor ID), you have to figure out
which sub-projects need to be manually forced to re-build in order for
it to take effect consistently throughout the project.  Some parts use a
fairly obvious configuration file - a header with defines that let you
control things like IDs, number of channels, pins, etc.  Except they
don't - only /some/ of the sub-projects read and use the configuration
file, other parts are hard-coded or use values from elsewhere.  It was a
complete mess.


Now, I know that newer XMOS devices have more resources, built-in flash,
proper hardware peripherals for the devices that are most demanding or
popular, and so on.  And I can only hope that the language and tools
have been improved to the point where inline assembly is not required,
and that the examples and libraries have matured to the point that the
libraries are usable as-is, and the examples show practical ways to
develop code.

I really hope XMOS does well here - it is so good to see a company that
thinks in a very different way and brings in these new ideas.  So if
your experience with modern XMOS devices and tools is good, I would love
to hear about it.



Re: Speaking of Multiprocessing...
On 24/03/17 10:19, David Brown wrote:
Quoted text here. Click to load it

Thanks for a speedy, comprehensive response. I'll re-read
and digest it properly later.

My initial gut feel is that many of your points were
valid and probably are still valid - because they
/ought/ to still be valid.

The issues that most interest me relate to where you found
it necessary to step outside the toolchain. Part of me thinks
(hopes, really) that it is merely because your problem
wasn't well suited to the devices strengths (esp. guaranteed
timing), and/or were too big, and/or importing existing
code/thinking lead to friction, and/or the tools were immature.

I expect I'll end up agreeing with many of your observations,
but I'll have fun finding that out :)

Re: Speaking of Multiprocessing...
On 24/03/17 12:06, Tom Gardner wrote:
Quoted text here. Click to load it


I know that at least some of my points are no longer an issue, or at
least not as much of an issue - XMOS have devices with flash, USB
hardware, etc.  At least some of the toolchain issues should be fixable.
 And the mess of the examples and libraries is certainly fixable - at
least, if one disregards the time and effort it would involve!

Quoted text here. Click to load it

The existing code was mainly XMOS's own examples, libraries and
reference designs...

I do agree that much of their USB stuff was poorly suited to the devices
and too big for them, and that probably made things worse - but it was
XMOS's own code.  With newer devices with hardware USB peripherals, I
expect fewer such issues.

I will go along with your hope - expectation, even - that the tools have
matured and improved over time.

Quoted text here. Click to load it


Re: Speaking of Multiprocessing...
On 24/03/17 10:19, David Brown wrote:
Quoted text here. Click to load it

Yes, those are precisely the aspects that interest me. I'm
particularly interested in easy-to-implement hard realtime
systems.

As far as I am concerned, caches and interrupts make it
difficult to guarantee hard realtime performance, and C's
explicit avoidance of multiprocessor biasses C away from
"easy-to-implement".

Yes, I know about libraries and modern compilers that
may or may not compile your code in the way you expect!

I'd far rather build on a solid foundation than have to
employ (language) lawyers to sort out the mess ;)


Besides, I want to use Occam++ :)


Quoted text here. Click to load it

Yes, those did strike me as limitations to the extent I'm
skeptical about networking connectivity. But maybe an XMOS
device plus an ESP8266 would be worth considering for some
purposes.


Quoted text here. Click to load it

Yes. However I don't care about wasting some resources if
it makes the design easier/faster, so long as it isn't too
expensive in terms of power and money.


Quoted text here. Click to load it

Needing to run a separate RTOS would be a code smell.
That's where the CSP+multicore approach /ought/ to be
sufficient. For practicality, I exclude peripheral
libraries and networking code from that statement.



Quoted text here. Click to load it

That's the kind of thing I'm interested in exploring.


Quoted text here. Click to load it

That's the kind of thing I'm interested in exploring.


Quoted text here. Click to load it

At which point many advantages would have been lost.


Quoted text here. Click to load it

Irritating, but not fundamental, and as you point out, they could
be fixed with application of time and money.


Quoted text here. Click to load it

I think we are largely in violent agreement.


StartKIT to echo anything it receives on the USB line,
with a second task flipping uppercase to lowercase. That
should give me a feel for a low-end resource usage.

Then I'd like to make a reciprocal frequency counter to
see how far I can push individual threads and their
SERDES-like IO primitives.

And I'll probably do some bitbashing to create analogue
outputs that draw pretty XY pictures on an analogue scope.


Re: Speaking of Multiprocessing...
On 24/03/17 17:09, Tom Gardner wrote:
Quoted text here. Click to load it

The XMOS devices have a lot more control over timing than you get on a
normal processor (especially a superscaler one with caches).  The
development tools include a lot of timing analysis stuff too.

Quoted text here. Click to load it

XC is still based on C !

Quoted text here. Click to load it

It is a /long/ time since I used Occam (except for shaving away unlikely
explanations).  It was fun, as long as you were careful with the spacing
in your source code.

XMOS programming is more inspired by Hoare's CSP than Occam.  They
should probably implement Go for the devices.


Quoted text here. Click to load it

There are XMOS devices with Ethernet hardware, I believe - that seems
the best idea.

Quoted text here. Click to load it

The chip has 8 hardware threads (per core).  If you want more than 8
threads, you either have a bigger and more expensive chip, or you have
to do something in software.

Quoted text here. Click to load it

Yes.  Hopefully, newer versions of the tools make this sort of hack
unnecessary.

Quoted text here. Click to load it

Quoted text here. Click to load it

You'll have fun, anyway - unless you get issues with dependency messes
from sub-projects under Eclipse.  If that happens, I recommend
re-arranging things so that you have one project with multiple
directories - it will save you a lot of hair-pulling.



Re: Speaking of Multiprocessing...
On 24/03/17 16:33, David Brown wrote:
Quoted text here. Click to load it


Exactly.



Yes, but hopefully with little need to use the important
areas that keep language lawyers employed.

If the "awkward bits" still /have/ to be used, then
Opportunities Will Have Been Missed.


Quoted text here. Click to load it

Indeed. I've toyed with Python and don't dislike its
ethos, but semantically significant spaces make me
shudder. For a start, how does and editor's
copy-and-paste work reliably. The consider refactoring
browsers.

I still remember makefiles, where tabs were invisibly
different to spaces.


Quoted text here. Click to load it

Oh, <waves arms>. I've always regarded Occam (and now XC)
as an instantiation of CSP in a language rather than a
library. To that extent I haven't needed to distinguish.


Quoted text here. Click to load it

The hardware seems the least of the problem; the software
stack is more of an issue.



Quoted text here. Click to load it

Yes. But I haven't thought through the ramifications
of that, yet. In particular, I haven't developed my
set of design patterns.


Quoted text here. Click to load it

There's too much unreasoned prejudice against cut-n-paste,
and too much in favour of common code bases. Sometimes
cut-and-paste is optimum, e.g. when developing several
PCBs over several years, the best way of ensuring you can
regenerate a design is to copy all library components
into project-specific libraries.

I don't mind code duplication if it means that I can
find what needs to be modified, and then modify it in the
knowledge that I haven't affected other parts of a system.
I've seen companies get that dreadfully wrong, with
the result that mods took 6 months to get to a customer.


Site Timeline