VME Auto system controller ID issue

We are using Motorola MVME5100 boards and VxWorks 6.3. We modified the delivered BSP to allow us to have shared memory windows across all processor cards. Now, our problem appears to be that the auto syscon feature of the board is not working properly. That is to say, if any controller, not in slot 0, is present and has auto-config jumper set to AUTO., then we see the following behavior:

syscontroller card0 can read/write into slave card N with no problems. Then, slave N can read/write into syscon0 shared memory. So far so good. Now, after a slave access of syscon slot 0 shared memory, any accesses across the VME bus, result in a hang. ie the sys controller in slot 0 can no longer read/write to slot N.

If the jumper on all cards (other than slot0) are set to NO SYSCON, then all accesses across the VME bus appear to function properly and there are no hangs. This indicates a hardware issue to us, but we are not 100% certain.

We verified the same behavior across multiple 5100 cards (with various RAM amounts)--get the same result. We also verified the same behavior with slot

0 being a MVME6100 card and slot 1 being a MVME5100 card.

So... the questions are:

1) is this a known HW issue with MVME5100 cards? 2) if not, is there any possibility that the VxWorks BSP could cause the behavior? 3) can we conclude that both/all boards think they are system controller? 4) is there a SW fix to make the auto-config jumper work as intended?

Thanks,

Bo

Reply to
Bo
Loading thread data ...

Good point Bill. Do you know how I can test/change VxWorks to confirm it is or isn't a RMW issue?

This is what I thought as well. I do recall at a previous employer we had similar issues with the same Tundra chip---and the workaround was extra crap that the BSP had to perform during initialization.... but I really don't want to go that route again if avoidable.

Thanks,

Bo

Reply to
Bo

On Mon, 15 Jan 2007 10:54:28 -0600, Bo queried:

I wish I did; still don't have a good handle on how shared memory is _really_ implemented in spite of mucking with it on and off for a number of years. The point is that if the memory spaces weren't shared the transactions would succeed, otherwise Tundra wouldn't be able to sell chip one. I suspect your implementation is drawing out a latent defect in the implementation of shared memory; I'm aware of another who encountered a similar hang using a more standard configuration (but totally weird in other ways). That too is unresolved as far as I know.

Regards

--
>@
Reply to
William Dennen

... snip ...

I have no idea whether this is applicable to the OP's problem, but in general memory is shared as long as it is not written. If a process wants to write in it, the page table for that process is modified to remap that portion, a copy of the original made, and the write then proceeds. That portion of the memory is then no longer shared.

If the memory is truly shared, so that one processes writes show up in other processes memory space, then various synchronization protocols must be used. This can involves semaphores, monitors, critical sections, etc.

Threads are generally lightweight processeses, using memory shared with other threads in the same process, and will need the synchronization primitives to access it.

--
Chuck F (cbfalconer at maineline dot net)
   Available for consulting/temporary embedded and systems.
Reply to
CBFalconer

Yes the memory is truly shared. However, it is my understand that the RMW protection scheme across a VME backplane is implemented in hardware generally and that any HW that does not support RMW, the RMW protection scheme must be emulated by SW--resulting in a much slower transaction. In my particular case, it seems that the physical option jumper causes the HW to work/not work depending on its position-- which seems divorced from SW in my view. That is, if it was a SW issue, the problem would exist regardless of the HW jumper position.

I do find it odd that earlier board models (using the same TUndra chip) do not exhibit the problem.

Bo

Reply to
Bo

1) C&D cannot read/write. 2) no 3) no

ie it 'appears' to be an honest-to-God hardware lock-up--from which only a power cycle will recover. Scary, huh?

I don't think that TAS has been changed---at least not by me.

Thanks for the suggestions and help,

Bo

Reply to
Bo

Indeed it's scary and smells of an errata, it appears that the system controller has left the scene. I would recommend getting Tundra to look at the issue. I'm sure they'll want a dump of the Universe and a trace if you've got the capability. You can initiate the dialog at

formatting link
Hopefully they can simulate the sequence ...

Regards

--
>@
Reply to
William Dennen

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.