Unusual experiences you have encountered while debugging ?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
[or the day I looked into the face of Hell. :-)]

And now for something a little different. :-)

What experiences have you had during debugging an embedded system that
make you really wonder out loud what the hell was going on ?

I've been building a homebrew programmer for the PIC32MX and have hit
some major issues due to the really lousy Microchip programming
specification.

At one point, just to see what happened, I decided to read virtual
address 0x00000000 which the datasheet says is unmapped. This is what
I got back:

[snip]

Identifying device attached to programmer
Device reports id = 04a00053
prog_read_block: base_address = 00000000, length = 16
0000: 48 65 6c 6c 48 65 6c 6c 48 65 6c 6c 48 65 6c 6c HellHellHellHell

After a minute or so I realised that a mundane set of circumstances
(PIC32MX mapping in an alleged unused address space, problems with the
specification supplied read memory sequence after the first longword,
realising the burner part of my programmer was actually working, etc)
had combined to create the above illusion.

However, for a minute or so I started to think Microchip had filled
the unused address space with an "interesting" pattern. Oops. :-)

Simon.

--  
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Re: Unusual experiences you have encountered while debugging ?
On 5/28/2015 6:18 PM, Simon Clubley wrote:
Quoted text here. Click to load it

Most memorable was late 70's developing an 8085-based device (we didn't
call it "embedded systems" back then).

We had exactly one prototype.  Plastic case was built from milled pieces
of lexan, bonded together and painted.  Mechanisms were all "one off"
hand made.  Ditto electronics, etc. (I think we pilfered a power supply
from one of our existing products).

Burned a set of EPROMs (Yippee!  2K byte devices!  No more 1702's!!).
Closed the lid -- carefully.  Hit the power switch...

<flash>, *Bang!*

"WTF???"

Technician had placed a Black Cat with nichrome wire across the power
supply for the "Bang!" -- and a flashbulb for the "<flash>".

He took great pleasure in commenting about how shook up I was!

Then, I drew his attention to the fact that the machine wasn't
powering up: "Ooops!"

Suddenly, *he* was the one who was shook up!  (How to explain to the
boss that his practical joke had cost us THE prototype!  :> )

Quoted text here. Click to load it

<grin>

When I worked on The Reading Machine, one of the basic tests we would do while
bringing the system up was to push phonemes at the speech synthesizer to
verify the data path was intact, synthesizer functional, amplifier, etc.
These were all incredibly short bits of code because you had to "bit switch"
them into core (minicomputer-based).  So, you just had a crib sheet of
octal codes that you'd quickly toggle into the machine, hit RUN and watch
(listen) what happens.

The "stock" test just pushed a single phoneme code in at four different
inflection levels.  Sort of like:  "ah, Ah, AH, *AH*" in an endless loop.

That, of course, is boring.

One day, we had the bankers coming in to appraise our assets (loan, I guess).
A working machine (end product) is worth a helluvalot more than a bunch of
components!  So, big push to get all the "inventory" into a salable state!

Banker guy (?) wandered into our building to look things over.  Machines
all over the place (these are minis so they are pretty sizable... roughly
as big as a dishwasher, etc.).  Hallways, offices, lab, workshop, etc.

Then, the inevitable question:  "Do these all work?"  (obvious reason behind
that!).  Boss kinda cringed a bit and said "Yes".  "Can I see one?"  (beads
of sweat on the boss's brow...)  "Sure".

Boss of course had no idea what the *actual* state of each individual machine
happened to be.  We were freely swapping parts from machines to get as many
units "up" as possible.

And, if he had steered the banker to a *particular* machine, that would have
looked suspicious! (i.e., "Why can't you show me THIS machine, RIGHT HERE?")

So, he reached down and flipped the power switch.  The core-resident code
immediately started to run (none of this "boot delay" you see with modern
machines).  Chance had it that I had been working on that machine some
time previous.  And, the last test I had apparently performed was the
synthesizer test.  So, the machine immediately began pushing the phoneme
codes:  F UH2 K Y1 IU U1 F UH2 K Y1 IU U1 F ...

Of course, the banker had very little experience with synthetic speech
(recall, this is late 70's) so he was "squinting" (why do people squint
when trying to HEAR something??) trying to understand what this "noise"
meant.  Damn near everyone else in the building had no trouble sorting
it out!!  Boss was cherry red.

When he finally caught on, he laughed *so* hard... and forgot all about
the fact that he had NOT seen the machine demonstrated as "operational"!

Re: Unusual experiences you have encountered while debugging ?
Quoted text here. Click to load it

Hello Don,

If I had done that, I would have expected to have been fired. :-)

Quoted text here. Click to load it

I'm young enough to have missed the machines which needed a full bootstrap
routinely keyed into them, but old enough to have run across (as a student)
machines with a full console front panel.

So yes, I understand the _strong_ desire to have kept this stuff short. :-)

BTW, I think it also makes you reflect that you have knowledge and
experience of a way of doing things that today's newcomers will never
experience - at least it does for me.

This even shows up in silly little ways; for example, I sometimes miss
the ability to physically write protect a drive in certain situations.

I also suspect that my code is tighter as a result of growing up on more
resource limited machines.

Simon.

--  
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Re: Unusual experiences you have encountered while debugging ?

Quoted text here. Click to load it

The classic (and safer) one was to have a long length of pneumatic
tubing leading into the back of a rack. You blew cigarrete smoke into
the far end of the tube while your esteemed colleague was working on the
rack...


The other funny one was when we finally got our controller prototype
working. It had a 8748 microcontroller sequencing a pneumatic machine,
motor etc. We set up a camera to take a picture of it running, the
camera flash goes off and *bang*, the machine locks up.

We put the silver sticker over the EPROM window after that.

--  

John Devereux

Re: Unusual experiences you have encountered while debugging ?
On 29/05/15 17:58, John Devereux wrote:
Quoted text here. Click to load it

You can still do that with the RPi, as noted just about
everywhere including
http://www.theguardian.com/technology/2015/feb/09/raspberry-pi-2-camera-flash-power-off



Re: Unusual experiences you have encountered while debugging ?
Hi Simon,

On 5/29/2015 9:15 AM, Simon Clubley wrote:
Quoted text here. Click to load it


<shrug>  For the most part, I've been fortunate to work with people
who "didn't take themselves too seriously".  This, IME, makes a huge
difference in how "creative" people can get in their solutions...
less worried about failing or "doing something that, in hindsight,
was obviously pretty 'stupid'".  OTOH, ripe for coming up with really
*clever* approaches to problems that "less inspired" designs would
stumble on.  Not the sort of environment for folks with big egos.

Quoted text here. Click to load it

The "normal" application was obviously too long to bit-switch in like this.
A tiny bipolar ROM (I think 16x16 -- or maybe 32x16?) did the normal
bootstrap... which loaded the image from a "data cassette" (the
"Compact Cassette" format that was popular for music, at the time).
Once loaded (into *core*), it was persistent, of course.  So, subsequent
power-ups just caused the code to start running immediately (cassette load
was pretty slow).

Quoted text here. Click to load it

The biggest take-away is learning to *think* about a problem before
just flailing away at it:  "Let me try this, recompile... nope, that
wasn't it!"  I think a lot of bugs creep into code because people
only partially think through their proposed remedies -- it's too
easy to just make a change, recompile and see the code (*appear*!)
to work... then, move on as if that problem was solved.  As if each
problem was nothing more than a "typo".

At one point, I was working for a firm that had subcontracted some
defense work from big blue.  I was responsible for debugging the
"processor" in the device.

We got a new device and their engineer came to help get the first
machine up and running.  A "Series 1" minicomputer was used to drive
the test harness.  The comms path (hardware) between the S1 and UUT
was physically long (30 or 40 feet) and had to go through various
gyrations to get to the proper logic levels, etc.

*LOTS* of one-shots (though they don't like calling them that!)
to account for delays in various level translators, etc.  This one
triggers that one which, in turn, triggers this OTHER one, etc.

At one point, we couldn't get the two devices to communicate.  I
was convinced the problem was an insufficient delay in one stage of
the "one-shot chain".  Their engineer sat down, did the math and
convinced himself that this was NOT the problem.  So, dismissed the
idea and went chasing other possible problems "on paper".

Never one to blindly "defer to my elders", I just walked off, grabbed
a honking BIG capacitor that was lying on a nearby bench (without
concern for it's actual *rating*), held it across the timing capacitor
for the one-shot that I suspected and, voila!  Everything started
working!

"What did you just do??"

I showed him the cap.  His eyes went wide when he saw that it
was like 1000 times larger than the circuit required...

"Well, that's way too big!"

"Yes, I know.  But, obviously, the one that's *in* there isn't
big enough!  Now, we can sort out why that's the case!  (wrong
component installed?  tolerances?  some other issue that the
design failed to take into consideration?)

The current approach to much debugging seems to be "slap the
big capacitor in the circuit and, if it works, LEAVE IT THERE!"

Quoted text here. Click to load it

Possible with most of my SCSI drives (via a jumper).  The issue
is then whether the OS will gag when it encounters this restriction!

We used to code with the KNOWLEDGE/ASSURANCE that the executable would
be installed in R/O memory.  E.g., using 16rFF as a terminator because
it could easily be checked (with an "increment the byte that this register
is pointing at" opcode).

When we started building SRAM modules to *emulate* EPROMs, we had to
include a "write protect" switch to ensure the SRAM behaved *like*
an EPROM once the software image was installed.  You quickly learned
that failing to flip the switch caused your code to get clobbered
really quickly!  ("Hmmm... why are the data in all of these memory
locations exactly +1 from what they *should be?")

Quoted text here. Click to load it

The "attitude" also extends to other aspects of design, beyond "software".

E.g., a medical device I designed many years ago had to maintain an
internal database that would be served up via a pair of serial ports
and a query language that I had designed.  At the time, DRAM was
small (16Kx1, 64Kx1) and EXPENSIVE!  Stuffing 64K devices would
add considerably to the cost.  Yet, restricting the design to 16K
devices could later require a redesign of the PCB and/or software.

My solution was to stuff 16K parts -- but, allow any or all of them to
be replaced with 64K parts.  And, the software treated the first 16K
of that address space as "complete words"; but, all addresses beyond
that were treated as "N-(possibly non-contiguous)-bit wide".

During POST, the system would clarify the types of memory devices
present in each "bit position" -- in effect, creating a mask that
indicated where the bits were valid in this "beyond 16K space".
All accesses to the "data store" would occur through:
    result_t get_word(addr_t address, word_t &word)
    result_t put_word(addr_t address, word_t &word)
accessors.  Of course, much slower than doing a memory cycle on
a specific address!  But, infinitely faster than the data rate
that the query interface encountered (serial ports).

OTOH, I can recall another early design where the hardware guy had
opted to save the cost of a shift register -- forcing the software
to do shift-store cycles in a tight loop.  It was *embarassing*
to see how much that "savings" COST the design!

[hardware and software folks tend not to overlap, IME]

The problem with this sort of mindset is that it is REALLY hard
to shake!  I recall designing an interface to a PROM programmer
and found myself INSTINCTIVELY writing (in C) things like:
    put_nybble(...) {put value & 0x0F}
    put_byte(...) {put_nybble; put_nybble}
    put_word(...) {put_byte; put_byte}
    put_long(...) {put_word; put_word}
without thinking about whether this was *necessary* or *clear*!

I'm now working in a resource rich (more like "resource gluttony!")
environment and it is REALLY hard to discipline myself not to micro-manage
aspects of the design:  "Burn a few million cycles, who cares!  Use
them to improve reliability and ease maintenance efforts!"

I think going from scarce to plentiful is considerably easier than
trying to do it the other way around.  I suspect most folks who write code
for desktops haven't a clue as to maximum stack penetration, etc.  They
just tweek things until they *appear* to work -- and hope that they have
encountered (purely by CHANCE!) the worst case scenario at some point
"on the bench" (instead of designing *for* it!).  So, "flukes" just get
shrugged off -- instead of explored in detail:  "That SHOULDN'T happen!
So, why *did* it?  (you saw it, too, didn't you??)"

Site Timeline