Speaking of Multiprocessing...

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 24, 2017 11:22 PM

Not interruptible in any way, shape or form in my approach. The read and write happen in the same clock cycle - one of the beauties of having dual port memory in an FPGA.

--

Rick C

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 24, 2017 11:33 PM

Sounds similar to the old ARM answer to the problem: SWP Rdst, Rsrc, [addr].

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 12:48 AM

Aha! That is some useful information to bring to the table. You told us earlier about some things that you are /not/ doing, but not what you /are/ doing. (Or if you did, I missed it.)

In that case, I would recommend making some dedicated hardware for your synchronisation primitives. A simple method is one I described earlier

- have a set of memory locations where you the upper half of each 32-bit entry is for a "thread id". You can only write to the entry if the current upper half is 0, or it matches the thread id you are writing to it. It should be straightforward to implement in an FPGA.

The disadvantage of this sort of solution is scaling - if your hardware supports 64 such semaphores, then that's all you've got. A solution utilizing normal memory (such as TAS, CAS, LL/SC) lets user code make as many semaphores as it wants. But you can always use the hardware ones to implement more software semaphores indirectly.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 2:10 AM

I guess I didn't explain the full context. But as I mentioned, it is simpler I think to include the swap memory instruction that will allow a test and set operation to be implemented atomically without disrupting any other functions. It will use up an opcode, but it would be a simple one with the only difference from a normal memory write being the use of the read path.

--

Rick C

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 3:47 AM

I'm pretty sure it predates S/360.

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 10:40 AM

I would not know, my first encounter with that sort of thing was on the 68k. The first processor I designed a board with - which was the first computer I owned - was the 6809... here are the remnants of this board (early 80-s):

formatting link

Had yet to be exposed to the TAS concept when I was making this one :).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 2:25 PM

Well, you are the one implementing this - so you have to figure out what solution makes most sense for you. Here's another idea you could consider.

If you are dealing with just one CPU here, I have always thought a "disable interrupts for the next X instructions then restore interrupt status" instruction would be handy - with X being something like 4. That would let you do atomic reads, writes or read-modify-write instructions covering at least two memory addresses, without need for special memory or read-write-modify opcodes.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 7:27 PM

How is that different from instructions to enable and disable interrupts? This doesn't really help me as there are N logical CPUs. They just share the same hardware. But they all run concurrently in nearly every sense. They just use different clock cycles so that memory accesses are not literally concurrent. So interrupts aren't the (only) issue.

--

Rick C

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 8:43 PM

I thought that SMP stands for _symmetric_ multiprocessing.

Disabling interrupts works well when dealing with one CPU and peripherals or the case with one master supervisor CPU and a lot of slave CPUs (AMP).

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 9:06 PM

I don't follow how disabling interrupts will help resolve multiple CPUs accessing the same memory location? Disabling interrupts only stops other processes on the same CPU from accessing the same memory location. How would that prevent processes on other CPUs from accessing that location during the multiple instruction access of the process in question?

--

Rick C

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sun, Mar 26, 2017 5:42 AM

My point was merely that TAS is far, far older than you had indicated. It was a standard instruction on S/360s when they were introduced to the world in 1964, and I'm pretty sure it was not new then.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sun, Mar 26, 2017 3:28 PM

An instruction like the one I propose would be safer than the normal "disable all interrupts" instruction, because it places a limit on the latency of interrupts. Code that simply disables interrupts could do so for an arbitrary length of time - here it is specifically limited.

It is harder to make this work well for your SMT cpu. Here you might change things to say that for these next 4 clock cycles, the current logical cpu runs on /every/ clock cycle - all other SMT threads are paused. Depending on how you have organised things, that might be simple or it might be nearly impossible. (On the XMOS, if only one thread is running it gets a maximum of 1 cycle out of every 5, with the rest wasted - this is due to the 5 stage pipeline of its cpu and that each logical cpu can only have one instruction in action at a time.)

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sun, Mar 26, 2017 5:27 PM

The other CPUs can be halted, (hard perhaps, but not impossible) but the point of the multi-CPU idea is to utilize a pipeline to make the clock faster, but rather than make it a single pipelined CPU with all it's warts, make it N CPUs. No one CPU can hog all the clock cycles because of the pipeline, no different from the XMOS design.

But just as important is to not impact interrupt latency. Preventing execution for any other CPU will impact interrupt latency adversely which is a primary design goal. This is intended for hard, real time use. The sort of thing where a CPU may well be counting cycles for short delays or need to respond to an even on the next cycle.

The instruction architecture of the last CPU I built allowed literally 1 clock interrupt latency as it only took one cycle to push all needed info to the stacks. The clock speed won't increase linearly with the pipeline length, but otherwise the cost of adding CPUs up to 16 is trivial. So I have thought of using some of the CPUs for interrupt handling. Can't get much faster than zero cycles. :)

--

Rick C