atomic test_and_set on ADSP21020

- A
- alb
  
  Contact options for registered users
posted
11 years ago

Thu, Jul 12, 2012 1:27 PM

Hi everyone,

I'm trying to implement a locking mechanism without the need to disable the interrupts (and yes, I know I can do this disabling interrupts) and I thought about a 'test-and-set' function which will atomically set the lock which will then be released at the end of the critical section.

Does anybody know about an implementation with the target's Instruction Set? I found in the C runtime library for the ADSP21060 (not ADSP21020) a function called test_and_set_semaphore which is intended for the bus lock in a multiprocessor system where there's the need to lock access to the address bus. The function looks overly complicated though, including a timeout I do not need. Maybe I can strip off the timeout part and use the rest.

In your opinion, is it still possible to implement it in C language, keeping the atomicity? Here I have found that the 'volatile' type qualifier can be a tricky business, moreover, considering that reordering can be an issue with these types of operations, I'm afraid about what is written in the g21k documentation. Here an excerpt:

4.3.4 Reordering & Optimization > If an asm() has output operands, G21K assumes, for optimization > purposes, the instruction lacks side effects except to change the output > operands. This does not mean that instructions with a side effect > cannot be used, but you must be careful. The compiler may eliminate > them if the output operands are not used, or move them out of loops, > or replace two with one if they constitute a common subexpression. > Also, if your instruction does have a side effect on a variable that > otherwise appears not to change, the old value of the variable may be > reused later if it happens to be found in a register. > You can prevent an asm() instruction from being deleted, moved > significantly, or combined, by writing the keyword volatile after > the asm . For example: > #define set_priority(x) \ > asm volatile (?set_priority %0;?: /* no outputs */ : ?d? (x)) > Note: Even a volatile asm() instruction can be moved in ways that > appear insignificant to the compiler, such as across jump instructions. > You cannot expect a sequence of volatile asm() instructions to > remain perfectly consecutive. If you want consecutive output, use a > single asm() construct. Another way to avoid reordering is to use > the output of an asm() construct in a C instruction. > Even with these precautions, be aware that the -O and -O2 > compiler switches may cause asm() instructions to be moved. > Compile without optimization to minimize the chance of reordering.

It is surprising to me the amount of haziness in the documentation. 'moved significantly', 'ways that appear insignificant', 'minimize the chance'.

Shall a make a probability distribution and measure what is the probability that my code gets reordered?

Any suggestion or pointer is appreciated.

Al

--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

- D
- dp
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 12, 2012 4:52 PM

Atomic access without having interrupts masked takes some particular hardware in the processor, like a "test and set" instruction on the 68k or like the lwarx/stwcx. pair on power.

You cannot do this in software, let alone trying to do it by mastering the programming language of the device at phrase-book level (in your case, C).

Even if you are fluent in the device's assembly if there is no opcode doing that there is just no way of doing it. Notice that you do not need the complete atomicity of TAS and lwarx/stwcx. which are usable in multiprocessor systems as well; all you need is something equivalent to the 68k bset and bclr (bit test and set, bit test and clear), which do not lock the bus but can't be interrupted by an IRQ.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 12, 2012 9:24 PM

Assuming that this locking mechanism is intended for the same 21020 system that has featured in your earlier threads, I'm wondering how this locking mechanism can be useful.

As I remember, your program schedules a number of ordinary processes or coroutines co-operatively (without preemption), while these processes (and the scheduler) can be interruped for interrupt handling.

If you want the lock to provide mutual exclusion only between ordinary processes, you can use a simple boolean flag variable that is set by assignment statements in the normal way (non-atomically), because the ordinary processes do not preempt each other. (However, to make a process wait for a lock held by some other process, you must make the first process poll repeatedly for the lock, since your scheduler cannot

-- as I recall -- suspend a process to wait for an event.)

If you want the lock to provide mutual exclusion between ordinary processes and interrupt handlers, you would indeed need a way to test and set a flag atomically. But if the interrupt occurs when the flag is set to "locked", the interrupt handler cannot wait for the lock to be opened, so it can do nothing. This seems very inconvenient to me; the interrupt is effectively "lost" since it cannot update the locked data structure.

The nice point about using interrupt disabling (or masking) to implement mutual exclusion between non-interrupt processes and interrupt processes is that this makes the interrupt wait (be pending) until the lock is opened (interrupts enabled or unmasked).

Why do you want to avoid interrupt disabling/masking? It seems to me that interrupt disabling/masking is the only usable solution for mutual exclusion in your environment.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 4:08 AM

Actually mutual exclusion CAN be done in software ... at least on a uniprocessor with in-order execution. See Dekker's algorithm, or Peterson's algorithm, or Lamport's algorithm, or any of a dozen others.

And if you have atomic instructions you don't really need them to set condition codes ... an atomic register-memory swap will suffice.

We don't really know that the OP needs atomicity ... only that he needs mutual exclusion.

Maybe.

Almost. Even on a uniprocessor, without bus locking you need to guarantee in-order execution. That may mean using barrier instructions on an OoO processor.

Parallel ops notwithstanding, the 21020 executes in-order anyway so that is not an issue. However, I don't think it has any atomic instructions ... I don't have a manual handy to double check, but unlike the 21060, the 21020 wasn't intended for multiprocessor use.

George

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 7:58 AM

It is indeed intended for the same system, but in order to handle multiple processes (FSMs) trying to access the output resource there has been quite a bit of modification to the previous scheme.

First of all we realized the output fifo wasn't really needed. Each process will have its own memory buffer and will pass the pointers together with size of memory to the interrupt service routine which will take care about writing to the output register (except for the first writing which will be done by the process).

In order to 'serve' multiple processes with the need to transmit data we thought about implementing a 'request queue' (which can eventually be even prioritized maybe according to the link budget you suggested in an earlier thread).

When the process put the request in the queue it also 'attempts' to start the transmission but if the resource is locked it will not succeed. At the beginning the lock is free and the first process which attempts to start the transmission will lock it, while all the others which follow will only manage to set their request in the queue.

Given this logic, the interrupt service routine, once it completed transmitting the first packet, should also check the queue and lock the resource in case there is a pending packet.

Since lock is set and reset by both ordinary processes and interrupt service routines I am concerned about running into a race condition. But I should say that I did not yet find the possibility for this race to occur... so maybe you are right, a 'test_and_set' function might be not useful.

Indeed this is my first try and I will have it tested today. And yes, there's no way to suspend a process, therefore my processes are always returning to the main loop, while waiting for the transmission to be finished. The lock is indeed needed only to prevent multiple processes to start a new transmission when another one is going on, but they will still be able to append their request to the queue (if not full! In that case they should wait until their request is appended to the queue).

The locking mechanism will not prevent the interrupt to run. The interrupt will continue to write bytes to the serial output until the packet is complete and then will free the lock. In the situation where there is another packet in the queue the same interrupt service routine will pull the data from the buffer and transmit it, setting again the lock, in order to prevent another process to start transmission.

The reason why I would have liked to avoid disabling/masking interrupt is to understand what I'm doing :-). I currently have a 'model' - in my mind - of how the processor works and according to the model I should be able to do what I want without the need to disable the interrupts. I simply do not like to do things without the need to do them and, as you pointed out, it is possible that a simple flag will be sufficient.

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 8:10 AM

On 7/12/2012 6:52 PM, dp wrote: [...]

I found an interesting article on the 80x86 which suggests the use of other types of instructions which are not intended for 'test and set' but can achieve the same functionality:

formatting link

the author talks about how to use the /xchg/ operation or the /shr/ operation in order to achieve the same effect without the need of a 'test and set' instruction.

I'm pretty sure you can do this with several CPUs at 'phrase-book' level, since is just about calling a built-in function of gcc:

formatting link

And I was indeed wondering whether the g21k, which is based on gcc, provided this possibility. I did not find anything in the documentation which points to this functionality, but you never know whether the documentation is fully describing the compiler.

You can use different opcodes to achieve the same result.

indeed and the article I refer to is exactly showing how to do this without a 'test and set' instruction.

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 8:47 AM

[...]

Thanks for the pointers, I indeed had a look at those, but it looks to me that if there's some 'reordering' going on the logic of these algorithms will fail, unless some memory barrier is introduced.

this is also what I understood.

I tend to believe now that I only need mutual exclusion and considering that my processes are not running in a multithreading environment I should be safe without an atomic instruction.

Quoting the manual:

But what makes things a little worrisome is that the g21k does attempt to reorder instructions.

- D
- dp
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 12:26 PM

No you cannot. In order to execute more than one opcode in sequence without the sequence being interrupted you need either to have the interrupt masked or guarantee there will be no interrupts during that time span.

You may also want to read the links you post. It took me 2-3 seconds to spot the sentence saying the code there does not work on all processors.

Well you certainly are entitled to your beliefs. Show us the code when you invent the software uninterruptible test and set.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 1:09 PM

On 7/13/2012 2:26 PM, dp wrote: [...]

[...]

You may also want to read my post which, for your convenience, is copied here for the relevant part. I never said the code works on *all processors*.

[...]

as per the link I posted (on 80x86 architectures):

example #1:

example #2:

both xchg and shr are instructions which achieve test_and_set functionality with no need to have dedicated hardware.

On ADSP21020 there are several (7) operations which are not interruptable. Quoting the UM:

So I was wondering if one can take advantage of these operations to implement a 'test and set' which is atomic in the sense that cannot be interrupted.

- D
- dp
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 1:35 PM

o carry, 0->Flag.

at if already in use.

Having xchg (and perhaps shr, don't know what it is doing, I am not x86 interested) _is_ the hardware it takes. It is indeed a read then write opcode which cannot be interrupted, equivalent to bset/bclr in this context.

s

Most if not all processors have non single cycle opcodes (e.g. division is non single cycle even on all RISC cores I have seen). Unless you have a specific, implemented in hardware, read and write opcode so it cannot be interrupted (which obviously will take

While writing this I recalled that you seemed to have a lengthy thread on FIFOs; if that is meant for implementing a FIFO well, you just do not need the masking etc. For the FIFO read and write pointers, typically one of them is written only by the interrupt service routine and the other - only by the interruptible code.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 2:12 PM

shr is 'shift right' operation. But then I do not understand your point, you *always* need some hardware which your software runs with (except for simulators). In the examples above a shift operation is used to achieve the test_and_set functionality.

Most of the instruction set of the adsp21020 is single cycle operation.

The ALU of the target I'm using has a series of Status Flags which are modified upon execution of the operation. I believe that checking these flags may lead to a similar result of the shift operation posted.

This post is about mutual exclusion and how to achieve that without the need to mask the interrupts. But even in the FIFO implementation a 'write_to_fifo()' should prevent 'read_from_fifo()' to run, since both of them will change the status of the EMPTY/FULL flag. Indeed there's the possibility not to use the EMPTY/FULL flag at all, leaving always 1 element of the fifo empty and in this case there should be no race between the two functions.

- R
- Rocky
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 2:23 PM

On Jul 13, 4:12=A0pm, alb wrote: Indeed there's

Now I understand why you wanted the flag. I always just use the 1 element empty strategy.

- D
- dp
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 2:53 PM

It always takes some hardware, yes. Sometimes you do have the hardware to do read/write in one indivisible opcode and sometimes you do not. Whether it is a shift or whatever operation is irrelevant, the key is that after the operation you know the state of at least one bit in a memory location prior to _and_ after the operation, this operation having been never interrupted. Just check the opcodes of your processor for such an instruction, will take less time than doing a single usenet post :-). (Not that I mind your posts, the group needed some revival so yours are welcome by me at least).

This is the "normal" way of doing it for me, has been for decades really.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 4:21 PM

Yes. Hence the "in-order" execution requirement.

The software algorithms also will work under pre-emption as long as a single memory access is not interruptible. That sounds strange because it's true for all modern chips ... even with VMM an access fault cancels and reschedules the instruction, the resulting interrupt then is taken between instructions. However, this wasn't necessarily true for early processors, some of which had interruptible load and store operations. The early software exclusion algorithms were designed with this fact in mind.

Yes, but the compiler won't elide accesses to volatile variables - meaning it won't cache a volatile in a register but will store the value back immediately after changing it and will reload the value from memory every time it is needed. Nor will it reorder volatile accesses wrt other accesses ... C (and C++) considers a volatile variable access to be a program sequencing point.

I never played much with g21k ... back when I was doing DSP work, the company had VisualDSP for the 060, and since it covered the 020 as well I just used it for everything.

With respect to g21k being a GCC derivative, some of the early GCC versions had a *lot* of bugs (particularly some of the 3.x) ... so you might want to check out the bug list for the version your compiler is based on (separate from the provided g21k bug list which may not include all the GCC bugs).

George

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 4:29 PM

Assuming you can guarantee the visible ordering of certain memory accesses, Dekker's Algorithm can be used to do mutual exclusion for a two processor system, without requiring any atomic update instruction. It's very old. There are more complicated algorithms than can handle more than two processors (Eisenberg and McGuire's, for example), again, requiring nothing more than some control over memory ordering.

In either case, if you cannot force some memory ordering, you're not going to manage much with a shared memory multi-processor system, so those, with system specific extensions (if needed) to force memory ordering, should work on all useful SMP systems.

While mutual exclusion is not quite what test-an-set doe, but once you have either, you can synthesize the other.

As for "non-interruptible updates", these can be adapted to provide that - basically the interrupt routine has to run its half of Dekker's, and then if the other side has already acquired the lock, it has to continue running the original (interrupted) code until it releases the lock (and then resumes the interrupt handler). This is usually practical only inside an OS kernel where both the "other" routine that wanted to do the uninterruptable update and the interrupt handler run in the same address space (of course you're not generally going to be able to disable interrupts outside the kernel anyway). Ugly, but it can be made to work.

In general, all stuff to be avoided if you have proper hardware support for atomics and disabling interrupts.

- D
- dp
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 5:24 PM

It's been 13 years since I last considered something of that sort, in fact IIRC what I reinvented - and discarded as unusable - was what seems to be called "Dekker's algorithm". It only lowers the probability of a conflict. The thing is, to read and update a flag takes two separate accesses; if one of the requestors interrupts the other between these two accesses, completes the transaction, then it still tests the address and succeeds for a second time - after which gets interrupted by the first one which writes its ID (not knowing an entire cycle has been up etc.) we have a conflict. It can be practical in some cases, perhaps the timing of some systems might guarantee success, but this is by no means obvious. The path I took back then was similar (for two processors), but I had a CPLD which allowed me to insert a timeout for one of the accesses so the above cited conflict could never occur.

Basically, no indivisible read/write means no means to exclude the mess. On small systems this is just being done by masking the interrupt, on large ones - like power - by "check if successful and rerun if not", perhaps the simplest of all as it does not require locking of the bus (but it does need hardware detection of a write to the shared location, just a write detection is also OK).

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 7:00 PM

Just out of curiosity, what was the dividing line between early and modern architectures ? Of the early architectures, VAX/VMS could in the worst case generate about a dozen page fault interrupts for a single instruction (instruction crossing page boundary, with 1-6 operands and each causing a page fault) and of these any indirection causing one or two (unaligned across page boundary) access interrupts.

The string and decimal instructions could cause multiple additional page fault interrupts and were definitely restartable, rather than interruptible instructions, but of course irrelevant to the mutual exclusion discussion.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 10:07 PM

That sounds like a sensible simplification, for your system. But it does mean that the each process must keep track of whether its buffer has been transmitted, and not use the buffer for a new message until the preceding message has been transmitted.

Aha, so the "lock" is not actually protecting access to some independent data structure, and making processes wait for access, but is more of a flag that shows whether or not a buffer is being transmitted. This should be an easier problem to solve.

Yes, I think it can be solved without disabling interrupts (although I would still use interrupt disabling, as the simpler solution). Below is a sketch of a solution that I believe should work, when the request queue is as non-prioritized FIFO queue. A prioritized queue would be rather more difficult, and probably interrupt disabling would be the only practical solution in that case. Apologies for a long piece of code, but here it is:

typedef struct buffer_t { data_t data[MAX_BUFFER_SIZE]; int length; int data_index; bool transmitted; struct buffer_t *next; } buffer_t; // The data in the buffer are data[0 .. length - 1]. // The "data_index" is the index of the next data[] // to transmit. // The "transmitted" flag is just what its name implies. // The "next" field points to the next buffer in the // request queue, or is NULL if this buffer is the // last one.

volatile buffer_t * q_head = NULL; // Not NULL when and only when a buffer is being // transmitted, and then points to the first (i.e. // currently transmitting) buffer in the request queue. // // If a non-interrupt process reads q_head as NULL, // an output interrupt will NOT happen again, until // transmission is again started by some process. // // If a non-interrupt process reads q_head as not NULL, // an output interrupt WILL happen after that read of // q_head (and perhaps immediately after the read).

volatile buffer_t * volatile q_tail = NULL; // NULL when and only when no buffer has ever been // transmitted or put in the queue, and otherwise // points to the last buffer in the queue (when // transmission is going on) or to the last // buffer transmitted (otherwise).

// Assume that all buffers in the queue are statically // allocated, or at least never deallocated, and that // the "next" field is irrelevant after the buffer has // been transmitted, until the buffer is again enqueued. // // Also assume that all buffers entered in the queue, or // for which transmission is started, have "length" > 0.

void transmit (buffer_t * volatile buf) // Starts transmitting the buf, or enqueues it for // later tranmission. Assume that this can be // interrupted at any point by an interrupt from an // ongoing transmission, but cannot be preempted by // another non-interrupt process. { // Assume that buf->data and buf->length are set // by caller and that buf->length > 0.

// Prepare the buffer for enqueing or transmitting: buf->transmitted = FALSE; buf->next = NULL; buf->data_index = 0;

// Enqueue the buffer, if there is something // ahead of it in the queue: if (q_head != NULL) { // Transmission is going on (or was going on // when we read q_head). q_tail->next = buf; // If transmission was still going on when // the above assignment to q_tail was executed, // the interrupt handler will eventually transmit // this buf. Otherwise transmission stopped // before reaching this buf. q_tail = buf; // This update of q_tail is for the benefit of // the next caller of transmit(). }

// If transmission is now going on, either it is // still transmitting some earlier buffer in the // queue, or has already started on transmitting // this buf. In either case, we have nothing more // to do.

// If transmission has now stopped, either it had // already stopped before we enqueued (or did not // enqueue) this buf, or it has had time to transmit // this buf, and then stop.

// Start transmission, if it stopped before // transmitting this buf: if (q_head == NULL) { // No transmission going on now. if (!(buf->transmitted)) { // All of the earlier buffers in the queue // have been transmitted, but not this buf. // This buf is now the first in the queue. q_head = buf; q_tail = buf; buf->data_index = 1; // Start transmission: write_to_output_port (buf->data[0]); } } }

void output_isr (void) // Interrupt handler for the interrupt that reports // "transmission complete" of the last data_t written // to the output port. // That data_t was q_head->data[q_head->data_index - 1], // so the data_t at q_head->data_index is next to be // transmitted, unless the whole q_head has been // transmitted already. // Assume that this can interrupt any non-interrupt // process at any point, but then itself runs atomically // (i.e. cannot be interrupted by anything else that // uses the same variables). { buffer_t * buf = q_head; // This initialization may trigger a warning about // qualifiers being discarded (because "q_head" is // qualified as volatile, but "buf" is not), but // this warning can be ignored. Using "buf" instead of // q_head is an optimization (perhaps premature) // to avoid unnecessary repeated reads of q_head // and its fields. In principle, buf should be // declared as "buffer_t * volatile buf", but this // is unnecessary since this function will not be // in-lined, and instruction reordering within this // function does not matter, because the function is // executed atomically with respect to the non- // interrupt processes.

if (buf->data_index >= buf->length) { // This buf was completely transmitted. // Go on to the next buffer in the queue, // if there is one: buf = buf->next; q_head = buf; }

if (buf != NULL) { // Some data are left to transmit from this // buf (= q_head). write_to_output_port (buf->data[buf->data_index]); buf->data_index ++; } }

I have not tested the above code, only thought about it.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 13, 2012 10:17 PM

Add:

buf->transmitted = TRUE;

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Jul 14, 2012 7:10 AM

I'm not quite sure what you mean, but Dekker's is well understood (and has been for half a century), and does, in fact, provide mutual exclusion, while avoiding deadlock and starvation, while requiring neither atomic update or disabled interrupts. The Wikipedia article on it is OK, but doesn't really into too much depth. Dijkstra's paper describing Dekker's is linked, and is thorough, but not really easy going.