Just venting...

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Wow!
I haven't had a circuit kick my ass like this one in a LONG time.
I have two boards that need to exchange (4) bytes of information from board
-A to board-B only.

Unfortunately, the original designer (me) didn't really think ahead, and so
 there really is no provision for the two boards to talk to each other.

So, (and I'm in a rush, as usual), I thought I could take an easy way out (
which is now turning into a mini-nightmare.)

Board-A powers-up the regulators on Board-B, which take several millisecond
s to ramp-up, and for the Board-B processor to come out of reset.

OK - no problem so far -- works exactly as expected.
Board A "knows" that timing, and so Board-B can be inferred (sloppily) sinc
e the range of startup time across Board-B samples isn't significant.

So, I figured all I would have to do is to time the data exchange.
(i.e., send out a byte every 100mS or so, hold it on the port, and have boa
rd-A sample it somewhere near the middle of the windows.)

Speed is not important.

And I can't get this damn thing to work!

The timing is perfect (verified on a scope to within 20 uSec timing), in a  
window 100 mS wide.  And I still can't get the damn thing to work.

And by work:  I mean....
Of the (4) bytes I'm trying send -- three of them always transfer and work  
just fine.  The forth one (which also always arrives on time), always has (
2) bits forced LOW.

And for the life of me, I don't know why. ??!
It is frustrating beyond belief.

I have done the "divide and conquer" thing, and have it narrowed down to th
e interface (which is just 8 opto's)  Good to 100 kHz, no problem there.  A
ND - did I mention, work just fine with the other bytes, and also with some
 test code just rotating bits.

But hook it up to the real Board-B and everything goes to shit.  Well, not  
everything, just the two bits of the aforementioned byte.  So, I go in the  
code and FORCE those values, and the same two bits are bad.  It's like havi
ng a bad connector that selectively ohms out OK, depending on what data you
 send.  It is driving me batty!!

I am honestly out of ideas.
Well - one thing I haven't checked (and is super-highly-unlikely to be the  
problem): is that maybe the Compiler is screwing me over.  But that's a rea
ch.

OK.  My mini-rant is over.
At this point, I could have redesigned both boards and been ahead of the ga
me.
(And of course, I'm under some pressure to get the boards released.)

Right.....

Re: Just venting...
You narrowed it down to the opto-isolators?  Then how could a compiler be the issue?  Can you post any of this schematic, at least the opto stage?

Re: Just venting...
On Monday, June 1, 2020 at 8:38:44 PM UTC-4, mpm wrote:
Quoted text here. Click to load it
rd-A to board-B only.
Quoted text here. Click to load it
so there really is no provision for the two boards to talk to each other.
Quoted text here. Click to load it
 (which is now turning into a mini-nightmare.)
Quoted text here. Click to load it
nds to ramp-up, and for the Board-B processor to come out of reset.
Quoted text here. Click to load it
nce the range of startup time across Board-B samples isn't significant.
Quoted text here. Click to load it
oard-A sample it somewhere near the middle of the windows.)
Quoted text here. Click to load it
a window 100 mS wide.  And I still can't get the damn thing to work.
Quoted text here. Click to load it
k just fine.  The forth one (which also always arrives on time), always has
 (2) bits forced LOW.
Quoted text here. Click to load it
the interface (which is just 8 opto's)  Good to 100 kHz, no problem there.  
 AND - did I mention, work just fine with the other bytes, and also with so
me test code just rotating bits.
Quoted text here. Click to load it
t everything, just the two bits of the aforementioned byte.  So, I go in th
e code and FORCE those values, and the same two bits are bad.  It's like ha
ving a bad connector that selectively ohms out OK, depending on what data y
ou send.  It is driving me batty!!
Quoted text here. Click to load it
e problem): is that maybe the Compiler is screwing me over.  But that's a r
each.
Quoted text here. Click to load it
game.

You start out saying you did not have comms between the boards, then you ta
lk about using comms through optos.  Should I assume you added them somehow
???  

Anyway, I'm not understanding the details that are making this so hard.  Sh
ould I assume you have had board A transmit square waves (a Johnson ring co
unter pattern seems appropriate) and you scoped the signals on board B to v
erify the hardware is working?  

I get that this doesn't have critical timing and that you thought it should
 be easy to transfer slow data, maybe use one of the optos as a clock and h
ave the software monitor it to know exactly when to take in the data?  You  
would be one bit down but you could still send 4 bits and reassemble the by
tes.  Use one opto as the high/low flag and only two bits are idle... the t
wo that aren't working properly maybe?  

Would that be the quickest path to getting it out the door.  

I know I hate schedules.  

--  

  Rick C.

  - Get 1,000 miles of free Supercharging
We've slightly trimmed the long signature. Click to see the full one.
Re: Just venting...
On Monday, June 1, 2020 at 9:12:57 PM UTC-4, Ricketty C wrote:

Quoted text here. Click to load it
talk about using comms through optos.  Should I assume you added them someh
ow???  
Quoted text here. Click to load it
Should I assume you have had board A transmit square waves (a Johnson ring  
counter pattern seems appropriate) and you scoped the signals on board B to
 verify the hardware is working?  
Quoted text here. Click to load it
ld be easy to transfer slow data, maybe use one of the optos as a clock and
 have the software monitor it to know exactly when to take in the data?  Yo
u would be one bit down but you could still send 4 bits and reassemble the  
bytes.  Use one opto as the high/low flag and only two bits are idle... the
 two that aren't working properly maybe?  
Quoted text here. Click to load it
  
Sorry.  I butchered my description a little.
It's actually Board-B that transmits the 4 bytes to Board-A over an 8-bit "
bus".
Board-A powers-up Board-B, so that is the "synch" for lack of a better term
.
Synch + the time it takes for the Board-B regulator to come up to speed.  (
About 6 mS).  Regardless, I really don't think it's a timing issue at this  
point.

Board-B is 3.3V logic, and Board-A is 5V.
So, the interface is through optocouplers on Board-A.  (LTV847's)
Each board also uses different grounds, but that's not a problem for the Op
to's.

I must be doing something super-silly, but I can't find it.
This really "ought" to work, and doesn't.

If I can't get it going tomorrow (which will be Day-4, including my wasted  
weekend!!), I'll see if I can post the schematic here.

I don't have a hardware debugger for this, but do have a software simulator
.
It can't find the problem either.  (Although I wouldn't put too much faith  
in that as simulators can introduce their own problems when you start havin
g a lot of I/O going on - even though that's the case here.  But just sayin
'.)

Re: Just venting...
On Monday, June 1, 2020 at 9:12:57 PM UTC-4, Ricketty C wrote:
Quoted text here. Click to load it
ld be easy to transfer slow data, maybe use one of the optos as a clock and
 have the software monitor it to know exactly when to take in the data?  Yo
u would be one bit down but you could still send 4 bits and reassemble the  
bytes.  Use one opto as the high/low flag and only two bits are idle... the
 two that aren't working properly maybe?  
Quoted text here. Click to load it

I think this idea crossed my mind earlier today, or maybe over the weekend.
(And weirder approaches, trust me.)

I can already tell we're going to have to re-spin this product anyway if it
's this difficult to "upgrade" or add features to it, etc..   which is what
's driving the present dilemma.   (Like, put it all on one board, for start
ers!)

But to your point:  I elected not to "go weird", because honestly -- the ti
med approach really "should" work.  And it actually does... well, mostly.

Maybe a good night's sleep will help?

Re: Just venting...
On Monday, June 1, 2020 at 9:51:41 PM UTC-4, mpm wrote:
Quoted text here. Click to load it
ould be easy to transfer slow data, maybe use one of the optos as a clock a
nd have the software monitor it to know exactly when to take in the data?  
You would be one bit down but you could still send 4 bits and reassemble th
e bytes.  Use one opto as the high/low flag and only two bits are idle... t
he two that aren't working properly maybe?  
Quoted text here. Click to load it
d.
it's this difficult to "upgrade" or add features to it, etc..   which is wh
at's driving the present dilemma.   (Like, put it all on one board, for sta
rters!)
Quoted text here. Click to load it
timed approach really "should" work.  And it actually does... well, mostly.
Quoted text here. Click to load it

Yeah, sometimes simple things can be hard to find, like glasses on your for
ehead.  

That is one of the reasons why I like doing things my way.  I can always te
ar it apart into pieces and figure out if each piece works.  I use Forth on
 my MCUs so at any time I can go interactive and twiddle bits and test thin
gs without worrying about software problems.  Then once you have working te
st code, that is the foundation of your production code.  

I'm thinking this is software as well.  I'm not real clear on the nature of
 the failures, but it just doesn't sound like a hardware issue.  Can you pu
t a scope on any line that tells you when the bits are being read?  Or do t
hey go directly to MCU pins?  

--  

  Rick C.

  + Get 1,000 miles of free Supercharging
We've slightly trimmed the long signature. Click to see the full one.
Re: Just venting...
On Monday, June 1, 2020 at 11:08:27 PM UTC-4, Ricketty C wrote:

Quoted text here. Click to load it
orehead.  

This reply wins the prize!

I finally got it fixed yesterday.
Root case was Stack Pointer corruption.  (And now don't I feel like an idio
t?)

To make matters worse, in the Keil uVision5 compiler, there's a dropdown wi
ndow on the left hand side that not only shows the status of the Stack Poin
ter, but also shows you the "Max Stack Pointer".  Obviously if the max is g
reater than what you set, you're going to run into memory overruns.

Maybe they could make it into a flashing red box upon violation?
Or, maybe I could get in the habit of checking the obvious things first?!!

The thing is, I never (have had) to worry about this because in my general  
template for new projects, I have a line that sets the SP to 80h (i.e., hal
f-way up the available RAM).  Been doing that now for at least the past 20  
years or so.  It's like an automatic default for me, by now.  (And I would  
only rarely have to modify it for projects that needed more than the 128 By
tes of RAM allocated.)

Well, somewhere, somehow, on this particular project, that line got erased  
- and I honestly never thought about it, mostly because of the weird way it
 was repeatable problem.

Anyway, it's off my bench now, and good riddance.

Re: Just venting...
Quoted text here. Click to load it

Glad you found it!  Don't feel too bad... you're not even the
millionth person to have that happen.  That sort of problem has
bedeviling us all the way back until shortly after they stopped using
mercury-filled tubes as memory elements.  :-)

Years ago ('74 or so, when I was in college), after installing an
operating system upgrade, the mainframe system started crashing or
aborting jobs sporadically with all sorts of weird symptoms that
suggested memory corruption.  The system had had a persistent problem
with its core memory and for weeks everyone thought that this was
happening again, but the diagnostics couldn't find anything.

Eventually, we figured out that the OS upgrade had added a couple of
system ghost-jobs ("daemons" in Unix terms), and that this had bumped
up the total number of simultaneous tasks from 128 to 130.  A
site-specific performance-monitoring patch was in use, which had
maintained some data on each task in a single page of memory (reserved
at boot time)... and, yup, one page was enough for no more than 128
tasks.  When tasks #128 and #129 started up, the monitoring patch
gleefully stored their data in the _next_ page of memory, which was
going to end up belonging to something else entirely... BOOM.

One of these days I really do need to try writing something in Rust
or a similar "Yes, I'll let you do interesting things, but no, I won't
let you do so unsafely" language.  It'll be interesting to see if that
approach actually works in a near-bare-metal embedded application where
you're dealing with hardware quite directly.


Re: Just venting...
On 6/3/2020 7:14 AM, mpm wrote:
Quoted text here. Click to load it

Ya:

<https://www.us-cert.gov/bsi/articles/knowledge/coding-practices/ensure-bounds-no-memory-region-are-violated

Quoted text here. Click to load it


Re: Just venting...
On Wednesday, June 3, 2020 at 7:14:44 AM UTC-4, mpm wrote:
Quoted text here. Click to load it
 forehead.  
Quoted text here. Click to load it

Cool!  I haven't won anything in a long time.  What do I win???  :)  

How did you figure this out?  I know the first things to check when debuggi
ng hardware is the PSU and the clocks.  I think that is mostly because they
 are very hard to diagnose from symptoms.  Likewise there are certain thing
s that should always be checked first in software.  Not that we always foll
ow those principles.  After all, we are experts, we know what we are doing  
and it can't be something that simple!  

Glad you found it.  

--  

  Rick C.

  -+ Get 1,000 miles of free Supercharging
We've slightly trimmed the long signature. Click to see the full one.
Re: Just venting...
Quoted text here. Click to load it

probably this  
https://i.kym-cdn.com/photos/images/newsfeed/000/034/706/winternet.jpg

--  
  Jasen.

Re: Just venting...
On 6/1/2020 8:38 PM, mpm wrote:
Quoted text here. Click to load it

You don't say what lang or what fashion of processor this is, is it C?

It sounds like a software issue. do you have enough storage allocated  
for the data, is it an endian-ness issue, are you overrunning a buffer,  
are you invoking some kind of undefined behavior.

I doubt it's a compiler issue.


Re: Just venting...
On Monday, June 1, 2020 at 10:30:29 PM UTC-4, bitrex wrote:

Quoted text here. Click to load it

The portion of the code that is responsible for the action at hand is now w
ritten in inline Assembler (as part of my efforts to nail down the problem)
.   The rest of the code is in C.  

Quoted text here. Click to load it
  
Quoted text here. Click to load it

I will verify, again, that the memory model is correct.
All total, this Board-B only uses about 30 bytes of RAM, including the stac
k.  It has 256 Bytes of RAM total (and 4K extra RAM, which I'm not using).  
 The on-board ADC uses a two's-complement arrangement with a reference of V
cc/2 and I'm using the internal 2 MHz ADC peripheral clock.  But again, all
 working perfectly. (as far as I can tell - verified on Board-B independent
ly of Board-A).  It's (4) bytes from the ADC channels that I need to send t
o Board-A.

Both boards are 8051 varieties.  Board-A is AT89LP51ED2, and Board-B is AT8
9LP3240.  (A lot of our legacy products run on 8051's.)

Quoted text here. Click to load it
I agree.  It's Keil uVision5.

I remember when I first starting playing with microcontrollers back in my t
eens.  For what seems like forever, I simply could not get a 1200 baud, RS-
232 circuit working.  It would send, but not receive.  I tried everything..
  Then, I discovered that there was an "REN" (Receive Enable) bit tucked aw
ay inside one of the control registers for the UART.  Damn.

Sort of sounds like a deja-vu event going on here, but I know these two par
ticular microcontrollers like the back of my hand.  (save the recent bad ba
tch of 'ED2s we got from Atmel I discussed in a prior post.)   That was a d
ifferent product than this one, though.  And this one does not use EEPROM.

Well, I have to get ready for work...
I'm honestly not looking forward to today.

Oh, I forgot to mention:
Yesterday, we tied to the two different grounds together and it made no dif
ference.  That's not something we would want to allow in the field, but it'
s OK to do in a lab setting.  I wasn't expecting it to matter.  And it didn
't.



Re: Just venting...
On 6/2/20 4:44 AM, mpm wrote:
Quoted text here. Click to load it

Well, it seems you have looked at the transfer of data that 'should' be  
happening, how about some unrelated process/event that may be stepping  
on the final byte of data before it gets transferred? Going slow can  
sometimes make the window for external influence bigger...
Good luck! you really need a real time debugger.
-bill m.

Re: Just venting...
On Tuesday, June 2, 2020 at 5:44:26 AM UTC-6, mpm wrote:
Quoted text here. Click to load it
 written in inline Assembler (as part of my efforts to nail down the proble
m).   The rest of the code is in C.  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
ack.  It has 256 Bytes of RAM total (and 4K extra RAM, which I'm not using)
.  The on-board ADC uses a two's-complement arrangement with a reference of
 Vcc/2 and I'm using the internal 2 MHz ADC peripheral clock.  But again, a
ll working perfectly. (as far as I can tell - verified on Board-B independe
ntly of Board-A).  It's (4) bytes from the ADC channels that I need to send
 to Board-A.
Quoted text here. Click to load it
T89LP3240.  (A lot of our legacy products run on 8051's.)
Quoted text here. Click to load it
 teens.  For what seems like forever, I simply could not get a 1200 baud, R
S-232 circuit working.  It would send, but not receive.  I tried everything
..  Then, I discovered that there was an "REN" (Receive Enable) bit tucked  
away inside one of the control registers for the UART.  Damn.
Quoted text here. Click to load it
articular microcontrollers like the back of my hand.  (save the recent bad  
batch of 'ED2s we got from Atmel I discussed in a prior post.)   That was a
 different product than this one, though.  And this one does not use EEPROM
.
Quoted text here. Click to load it
ifference.  That's not something we would want to allow in the field, but i
t's OK to do in a lab setting.  I wasn't expecting it to matter.  And it di
dn't.

Do you have a logic analyzer, or at least a DSO?  Can you see the bits at t
he MCU correctly?  What about on the other side of the opto?

Follow the bits!  They either start bad or go bad somewhere along the chain
.

Re: Just venting...
On 6/2/2020 10:20 AM, DemonicTubes wrote:
Quoted text here. Click to load it

Right, the priority is to eliminate suspects, I'd start by doing  
whatever I needed to eliminate the physical interface component, even  
rig up an Arduino in place of the 8051 receiver running a bare minimum  
receive-and-transmit-over-USB program to log data to a PC.

Find who has an airtight alibi and whomever is left without one....

Re: Just venting...

Quoted text here. Click to load it

Have you got everything declared volatile that needs it?

Quoted text here. Click to load it

huh? Is the bad data present on only one side of the optos?

Are the optocopler leds all terminated to the same supply rail?
And the receivers?

Quoted text here. Click to load it

That was my first thought.

Quoted text here. Click to load it


--  
  Jasen.

Re: Just venting...
On Tuesday, June 2, 2020 at 12:02:37 AM UTC-4, Jasen Betts wrote:
Quoted text here. Click to load it
ork just fine.  The forth one (which also always arrives on time), always h
as (2) bits forced LOW.
Quoted text here. Click to load it
e game.

I don't understand how it can be narrowed down to the interface and yet it  
might still be software.  The first thing I would do is determine which sid
e of the software/hardware divide it is on.  In other words, use other soft
ware that is more simple.  Or no software at all if possible.  

Not really enough info to know what's happening for sure.  

--  

  Rick C.

  -- Get 1,000 miles of free Supercharging
We've slightly trimmed the long signature. Click to see the full one.
Re: Just venting...
On 6/2/2020 1:14 AM, Ricketty C wrote:
Quoted text here. Click to load it

Yeah, it's definitely the kind of situation you'd want to use a logic  
analyzer or a scope that can decode digital, or failing that even an  
Arduino connected in lieu of the 8051 board to the optocouplers as a  
data logger over USB to a PC, and figure out exactly where the bytes go bad.


Site Timeline