Just venting...

Wow! I haven't had a circuit kick my ass like this one in a LONG time. I have two boards that need to exchange (4) bytes of information from board

-A to board-B only.

Unfortunately, the original designer (me) didn't really think ahead, and so there really is no provision for the two boards to talk to each other.

So, (and I'm in a rush, as usual), I thought I could take an easy way out ( which is now turning into a mini-nightmare.)

Board-A powers-up the regulators on Board-B, which take several millisecond s to ramp-up, and for the Board-B processor to come out of reset.

OK - no problem so far -- works exactly as expected. Board A "knows" that timing, and so Board-B can be inferred (sloppily) sinc e the range of startup time across Board-B samples isn't significant.

So, I figured all I would have to do is to time the data exchange. (i.e., send out a byte every 100mS or so, hold it on the port, and have boa rd-A sample it somewhere near the middle of the windows.)

Speed is not important.

And I can't get this damn thing to work!

The timing is perfect (verified on a scope to within 20 uSec timing), in a window 100 mS wide. And I still can't get the damn thing to work.

And by work: I mean.... Of the (4) bytes I'm trying send -- three of them always transfer and work just fine. The forth one (which also always arrives on time), always has (

2) bits forced LOW.

And for the life of me, I don't know why. ??! It is frustrating beyond belief.

I have done the "divide and conquer" thing, and have it narrowed down to th e interface (which is just 8 opto's) Good to 100 kHz, no problem there. A ND - did I mention, work just fine with the other bytes, and also with some test code just rotating bits.

But hook it up to the real Board-B and everything goes to shit. Well, not everything, just the two bits of the aforementioned byte. So, I go in the code and FORCE those values, and the same two bits are bad. It's like havi ng a bad connector that selectively ohms out OK, depending on what data you send. It is driving me batty!!

I am honestly out of ideas. Well - one thing I haven't checked (and is super-highly-unlikely to be the problem): is that maybe the Compiler is screwing me over. But that's a rea ch.

OK. My mini-rant is over. At this point, I could have redesigned both boards and been ahead of the ga me. (And of course, I'm under some pressure to get the boards released.)

Right.....

Reply to
mpm
Loading thread data ...

You narrowed it down to the opto-isolators? Then how could a compiler be the issue? Can you post any of this schematic, at least the opto stage?

Reply to
sea moss

rd-A to board-B only.

so there really is no provision for the two boards to talk to each other.

(which is now turning into a mini-nightmare.)

nds to ramp-up, and for the Board-B processor to come out of reset.

nce the range of startup time across Board-B samples isn't significant.

oard-A sample it somewhere near the middle of the windows.)

a window 100 mS wide. And I still can't get the damn thing to work.

k just fine. The forth one (which also always arrives on time), always has (2) bits forced LOW.

the interface (which is just 8 opto's) Good to 100 kHz, no problem there. AND - did I mention, work just fine with the other bytes, and also with so me test code just rotating bits.

t everything, just the two bits of the aforementioned byte. So, I go in th e code and FORCE those values, and the same two bits are bad. It's like ha ving a bad connector that selectively ohms out OK, depending on what data y ou send. It is driving me batty!!

e problem): is that maybe the Compiler is screwing me over. But that's a r each.

game.

You start out saying you did not have comms between the boards, then you ta lk about using comms through optos. Should I assume you added them somehow ???

Anyway, I'm not understanding the details that are making this so hard. Sh ould I assume you have had board A transmit square waves (a Johnson ring co unter pattern seems appropriate) and you scoped the signals on board B to v erify the hardware is working?

I get that this doesn't have critical timing and that you thought it should be easy to transfer slow data, maybe use one of the optos as a clock and h ave the software monitor it to know exactly when to take in the data? You would be one bit down but you could still send 4 bits and reassemble the by tes. Use one opto as the high/low flag and only two bits are idle... the t wo that aren't working properly maybe?

Would that be the quickest path to getting it out the door.

I know I hate schedules.

--
  Rick C. 

  - Get 1,000 miles of free Supercharging 
 Click to see the full signature
Reply to
Ricketty C

talk about using comms through optos. Should I assume you added them someh ow???

Should I assume you have had board A transmit square waves (a Johnson ring counter pattern seems appropriate) and you scoped the signals on board B to verify the hardware is working?

ld be easy to transfer slow data, maybe use one of the optos as a clock and have the software monitor it to know exactly when to take in the data? Yo u would be one bit down but you could still send 4 bits and reassemble the bytes. Use one opto as the high/low flag and only two bits are idle... the two that aren't working properly maybe?

Sorry. I butchered my description a little. It's actually Board-B that transmits the 4 bytes to Board-A over an 8-bit " bus". Board-A powers-up Board-B, so that is the "synch" for lack of a better term . Synch + the time it takes for the Board-B regulator to come up to speed. ( About 6 mS). Regardless, I really don't think it's a timing issue at this point.

Board-B is 3.3V logic, and Board-A is 5V. So, the interface is through optocouplers on Board-A. (LTV847's) Each board also uses different grounds, but that's not a problem for the Op to's.

I must be doing something super-silly, but I can't find it. This really "ought" to work, and doesn't.

If I can't get it going tomorrow (which will be Day-4, including my wasted weekend!!), I'll see if I can post the schematic here.

I don't have a hardware debugger for this, but do have a software simulator . It can't find the problem either. (Although I wouldn't put too much faith in that as simulators can introduce their own problems when you start havin g a lot of I/O going on - even though that's the case here. But just sayin '.)

Reply to
mpm

ld be easy to transfer slow data, maybe use one of the optos as a clock and have the software monitor it to know exactly when to take in the data? Yo u would be one bit down but you could still send 4 bits and reassemble the bytes. Use one opto as the high/low flag and only two bits are idle... the two that aren't working properly maybe?

I think this idea crossed my mind earlier today, or maybe over the weekend. (And weirder approaches, trust me.)

I can already tell we're going to have to re-spin this product anyway if it 's this difficult to "upgrade" or add features to it, etc.. which is what 's driving the present dilemma. (Like, put it all on one board, for start ers!)

But to your point: I elected not to "go weird", because honestly -- the ti med approach really "should" work. And it actually does... well, mostly.

Maybe a good night's sleep will help?

Reply to
mpm

You don't say what lang or what fashion of processor this is, is it C?

It sounds like a software issue. do you have enough storage allocated for the data, is it an endian-ness issue, are you overrunning a buffer, are you invoking some kind of undefined behavior.

I doubt it's a compiler issue.

Reply to
bitrex

ould be easy to transfer slow data, maybe use one of the optos as a clock a nd have the software monitor it to know exactly when to take in the data? You would be one bit down but you could still send 4 bits and reassemble th e bytes. Use one opto as the high/low flag and only two bits are idle... t he two that aren't working properly maybe?

d.

it's this difficult to "upgrade" or add features to it, etc.. which is wh at's driving the present dilemma. (Like, put it all on one board, for sta rters!)

timed approach really "should" work. And it actually does... well, mostly.

Yeah, sometimes simple things can be hard to find, like glasses on your for ehead.

That is one of the reasons why I like doing things my way. I can always te ar it apart into pieces and figure out if each piece works. I use Forth on my MCUs so at any time I can go interactive and twiddle bits and test thin gs without worrying about software problems. Then once you have working te st code, that is the foundation of your production code.

I'm thinking this is software as well. I'm not real clear on the nature of the failures, but it just doesn't sound like a hardware issue. Can you pu t a scope on any line that tells you when the bits are being read? Or do t hey go directly to MCU pins?

--
  Rick C. 

  + Get 1,000 miles of free Supercharging 
 Click to see the full signature
Reply to
Ricketty C

Have you got everything declared volatile that needs it?

huh? Is the bad data present on only one side of the optos?

Are the optocopler leds all terminated to the same supply rail? And the receivers?

That was my first thought.

--
  Jasen.
Reply to
Jasen Betts

ork just fine. The forth one (which also always arrives on time), always h as (2) bits forced LOW.

e game.

I don't understand how it can be narrowed down to the interface and yet it might still be software. The first thing I would do is determine which sid e of the software/hardware divide it is on. In other words, use other soft ware that is more simple. Or no software at all if possible.

Not really enough info to know what's happening for sure.

--
  Rick C. 

  -- Get 1,000 miles of free Supercharging 
 Click to see the full signature
Reply to
Ricketty C

The portion of the code that is responsible for the action at hand is now w ritten in inline Assembler (as part of my efforts to nail down the problem) . The rest of the code is in C.

I will verify, again, that the memory model is correct. All total, this Board-B only uses about 30 bytes of RAM, including the stac k. It has 256 Bytes of RAM total (and 4K extra RAM, which I'm not using). The on-board ADC uses a two's-complement arrangement with a reference of V cc/2 and I'm using the internal 2 MHz ADC peripheral clock. But again, all working perfectly. (as far as I can tell - verified on Board-B independent ly of Board-A). It's (4) bytes from the ADC channels that I need to send t o Board-A.

Both boards are 8051 varieties. Board-A is AT89LP51ED2, and Board-B is AT8

9LP3240. (A lot of our legacy products run on 8051's.)

I agree. It's Keil uVision5.

I remember when I first starting playing with microcontrollers back in my t eens. For what seems like forever, I simply could not get a 1200 baud, RS-

232 circuit working. It would send, but not receive. I tried everything.. Then, I discovered that there was an "REN" (Receive Enable) bit tucked aw ay inside one of the control registers for the UART. Damn.

Sort of sounds like a deja-vu event going on here, but I know these two par ticular microcontrollers like the back of my hand. (save the recent bad ba tch of 'ED2s we got from Atmel I discussed in a prior post.) That was a d ifferent product than this one, though. And this one does not use EEPROM.

Well, I have to get ready for work... I'm honestly not looking forward to today.

Oh, I forgot to mention: Yesterday, we tied to the two different grounds together and it made no dif ference. That's not something we would want to allow in the field, but it' s OK to do in a lab setting. I wasn't expecting it to matter. And it didn 't.

Reply to
mpm

Well, it seems you have looked at the transfer of data that 'should' be happening, how about some unrelated process/event that may be stepping on the final byte of data before it gets transferred? Going slow can sometimes make the window for external influence bigger... Good luck! you really need a real time debugger.

-bill m.

Reply to
Bill Martin

written in inline Assembler (as part of my efforts to nail down the proble m). The rest of the code is in C.

ack. It has 256 Bytes of RAM total (and 4K extra RAM, which I'm not using) . The on-board ADC uses a two's-complement arrangement with a reference of Vcc/2 and I'm using the internal 2 MHz ADC peripheral clock. But again, a ll working perfectly. (as far as I can tell - verified on Board-B independe ntly of Board-A). It's (4) bytes from the ADC channels that I need to send to Board-A.

T89LP3240. (A lot of our legacy products run on 8051's.)

teens. For what seems like forever, I simply could not get a 1200 baud, R S-232 circuit working. It would send, but not receive. I tried everything .. Then, I discovered that there was an "REN" (Receive Enable) bit tucked away inside one of the control registers for the UART. Damn.

articular microcontrollers like the back of my hand. (save the recent bad batch of 'ED2s we got from Atmel I discussed in a prior post.) That was a different product than this one, though. And this one does not use EEPROM .

ifference. That's not something we would want to allow in the field, but i t's OK to do in a lab setting. I wasn't expecting it to matter. And it di dn't.

Do you have a logic analyzer, or at least a DSO? Can you see the bits at t he MCU correctly? What about on the other side of the opto?

Follow the bits! They either start bad or go bad somewhere along the chain .

Reply to
DemonicTubes

Yeah, it's definitely the kind of situation you'd want to use a logic analyzer or a scope that can decode digital, or failing that even an Arduino connected in lieu of the 8051 board to the optocouplers as a data logger over USB to a PC, and figure out exactly where the bytes go bad.

Reply to
bitrex

Right, the priority is to eliminate suspects, I'd start by doing whatever I needed to eliminate the physical interface component, even rig up an Arduino in place of the 8051 receiver running a bare minimum receive-and-transmit-over-USB program to log data to a PC.

Find who has an airtight alibi and whomever is left without one....

Reply to
bitrex

orehead.

This reply wins the prize!

I finally got it fixed yesterday. Root case was Stack Pointer corruption. (And now don't I feel like an idio t?)

To make matters worse, in the Keil uVision5 compiler, there's a dropdown wi ndow on the left hand side that not only shows the status of the Stack Poin ter, but also shows you the "Max Stack Pointer". Obviously if the max is g reater than what you set, you're going to run into memory overruns.

Maybe they could make it into a flashing red box upon violation? Or, maybe I could get in the habit of checking the obvious things first?!!

The thing is, I never (have had) to worry about this because in my general template for new projects, I have a line that sets the SP to 80h (i.e., hal f-way up the available RAM). Been doing that now for at least the past 20 years or so. It's like an automatic default for me, by now. (And I would only rarely have to modify it for projects that needed more than the 128 By tes of RAM allocated.)

Well, somewhere, somehow, on this particular project, that line got erased

- and I honestly never thought about it, mostly because of the weird way it was repeatable problem.

Anyway, it's off my bench now, and good riddance.

Reply to
mpm

Glad you found it! Don't feel too bad... you're not even the millionth person to have that happen. That sort of problem has bedeviling us all the way back until shortly after they stopped using mercury-filled tubes as memory elements. :-)

Years ago ('74 or so, when I was in college), after installing an operating system upgrade, the mainframe system started crashing or aborting jobs sporadically with all sorts of weird symptoms that suggested memory corruption. The system had had a persistent problem with its core memory and for weeks everyone thought that this was happening again, but the diagnostics couldn't find anything.

Eventually, we figured out that the OS upgrade had added a couple of system ghost-jobs ("daemons" in Unix terms), and that this had bumped up the total number of simultaneous tasks from 128 to 130. A site-specific performance-monitoring patch was in use, which had maintained some data on each task in a single page of memory (reserved at boot time)... and, yup, one page was enough for no more than 128 tasks. When tasks #128 and #129 started up, the monitoring patch gleefully stored their data in the _next_ page of memory, which was going to end up belonging to something else entirely... BOOM.

One of these days I really do need to try writing something in Rust or a similar "Yes, I'll let you do interesting things, but no, I won't let you do so unsafely" language. It'll be interesting to see if that approach actually works in a near-bare-metal embedded application where you're dealing with hardware quite directly.

Reply to
Dave Platt

Ya:

Reply to
bitrex

forehead.

Cool! I haven't won anything in a long time. What do I win??? :)

How did you figure this out? I know the first things to check when debuggi ng hardware is the PSU and the clocks. I think that is mostly because they are very hard to diagnose from symptoms. Likewise there are certain thing s that should always be checked first in software. Not that we always foll ow those principles. After all, we are experts, we know what we are doing and it can't be something that simple!

Glad you found it.

--
  Rick C. 

  -+ Get 1,000 miles of free Supercharging 
 Click to see the full signature
Reply to
Ricketty C

probably this

formatting link

--
  Jasen.
Reply to
Jasen Betts

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.