Error/Beep/Blink codes

D

D Yuniskis 15 years ago

Hi,

I have three layers of "degradation" in the event of various degrees of system failure:

1) if just some *aspect* of the system fails (e.g., an application crashes, etc.) then I provide high level error reporting with diagnostic assistance. E.g., a "window" can appear explaining the nature of the problem and suggested remedies, etc. [note that I only offer that by way of an example to which you might relate; that's *not* what I actually do...]

2) if significant portions of the system are in an unreliable state, I present an unadorned error that, at least, *says* that there is an error and what it is (accompanied by a unique identifier that helps pinpoint where the error is raised). E.g., this is the equivalent of the "blue screen of death".

3) if *most* of the system appears unreliable, I need to fall back to some sort of "if all else fails" mechanism that is *guaranteed* to be able to convey information (unless the processor is toast).

This last level is roughly the equivalent of BIOS "beep codes" -- where the hardware and software required to drive the *display* can't be relied upon, etc.

Obviously, beep/blink codes can't say much in a manner that the casual user will be able to comprehend. So, the goal is to pare down the information presented to a careful balance that addresses three classes of people:

- the casual user who knows little more than "it's broke"

- the motivated user who will look up an error code for more information

- service personnel who can pinpoint the exact cause for the signaled error "code"

[note that I also have a means of passing "diagnostic data" out to service personnel -- but, that requires them to have possession of the device]

So, I want a scheme that addresses all of the above without requiring their intervention -- since that would require more of the system to be functional than an "output only" mechanism as well as requiring some confidence in their ability to correctly interact with a failed device (which is an exceptional condition so not the sort of thing they are likely to have "practice" doing!).

Taking audible annunciators, first...

You can vary only a few characteristics of an audio signal:

- volume

- frequency

- duration/interval/rate

- count

[note that amplitude modulation and/or frequency modulation fall into these categories]

You can't control "absolute" volume -- there are no controls in the device that you can count on as functioning (it's broke, remember??). So, in practical terms, you pick *a* volume level that you expect to be audible without alarmingly so (too loud is worse than too soft since a user realizing the device to be broken will pay closer attention to listen to "soft" sounds whereas a *loud* device report may be inappropriate for the current environment (e.g., a business meeting).

And, a user can't probably resolve more than two relative volume levels. And, since they are *relative*, EVERY error code mapping would have to employ both levels for the user to judge between them.

The same sorts of arguments apply to frequency and duration et al. (HIGH/LOW, LONG/SHORT, FAST/SLOW, etc.)

All of these "relative" indicators require the user to remember past sounds in order to qualify them based on *future* sounds. I.e., "Was that LOUD LOUD LOUD or SOFT SOFT SOFT?" The answer isn't known until the user has been exposed to all variations.

With that in mind, any error code format should ensure that these variations are in expected places. For example, beginning each error with a LOUD SOFT preamble lets the user *know* what loud and soft levels are *before* he has to "listen to the DATA".

IMO, the format should be fixed so the user knows what to expect and when to expect it. E.g., preamble, value, postamble. So, this implies all error codes should be the same number of "digits".

Further, the number of different *types* of digits should be small. So, perhaps three groups of 1 to 4 beeps. Or, a series of three different beeps (LONG, SHORT, SHORT vs. LONG, SHORT, LONG).

Obviously, there are lots of ways of encoding data with these different attributes.

I'm pretty convinced that varying amplitude (two levels) is The Wrong Way to go about encoding things.

In my opinion, the classic "beep COUNT" approach, takes a fair bit of time to emit a complete "error code" (e.g., imagine three groups of 1-4 beeps at 1 second intervals with a 2 second pause between groups -- 64 possibilities taking as much as 15+ seconds to issue). This requires the user to "remember" what has come before while concentrating on what is happening *now* (i.e., "There were 3 beeps, followed by 2 beeps and now this is the first beep in the third group of beeps...")

I *think* a better approach may be to alternate between a set of frequencies. E.g., 64 codes can be expressed with a set of six tones -- ~8 seconds for the entire code to be emitted. I think even those folks who need a bucket to carry a tune could "remember" something like "dah dee dee dee dah dee".

OTOH, it's not intuitive to map dah's and dee's to "symbols" so a non-techy would never think of translating this to the more memorable ABBBAB or 011101 or ...

Argh... tired of typing. Hopefully, this is enough to get some initial comments on various approaches. Let's avoid the visual analog of all this for the time being -- unless your argument relates to a duality that is or is not present in different encoding schemes...

Thanks,

--don

Vote

R

Rocky 15 years ago

What about morse code? Nokia did it on the old cellphones for SMS. dit-dit-dit dah-dah-dah dit-dit-dit Our microwave would give the sound of a heart monitor flatlining when it finished. blip --- blip --- blip --- beeeeeeep

Vote

U

upsidedown 15 years ago

At least make sure the modulation goes nicely through various cellular codecs. For instance some kinds of amplitude might be defeated by the a.g.c. in the telephone signal chain.

For repeatable errors (such as power up errors), it would be handy to tell the customer to put the cellular phone close to the device and repeat the error. The support person could then listen for the sequence him/herself and not rely on the end user interpretation of the error code.

Do you really have that much error situations, in which a more expressive error method can't be used.

If the problem is not repairable by the end user or local representative, is there much point in going into too much details in the error codes ?

Vote

P

Paul E. Bennett 15 years ago

The Morse code is a nice idea and those interested in what the error actually is should get used to reading the code easily enough provided it was, at most, kept at about 6 to 7 words per minute rate. Faster rates need more concentration on the part of the listener.

Tiered error-state fall-backs are sensible depending on the remaining capability of the system to function. Less easy to arrange in single processor systems but in the latest multi-core devices there is no reason why one of the cores couldn't be programmed to monitor and give the last ditch provision of useful debug information. Having used several processors in a system (each programmed for their own specific tasks) it nearly always made sense to have a robust system status monitor processor to communicate what was going on in the rest of the system.

******************************************************************** Paul E. Bennett............... Forth based HIDECS Consultancy Mob: +44 (0)7811-639972 Tel: +44 (0)1235-510979 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

D

D Yuniskis 15 years ago

Wow! What an *obvious* idea -- and one that I would have *completely* overlooked! This makes even more sense than the other "more expressive" mechanisms that I use *before* things deteriorate to this point!!

I.e., use something like the phone company uses for status messages

-- preceding each spoken message with a series of tones that tell you what the message really contains (in abbreviated form). Much like prefacing a text message with a numeric code (that can be easily parsed by a piece of software).

Hmmm... I will need to rethink this approach with this in mind!

The problem is that the "remedy" beyond this point is very expensive -- factory service. This is 24 hour turnaround (!). Just the shipping costs alone (forget the labor to troubleshoot and repair) can eat you alive!

So, you want to be able to get as much information from the device *without* having it in your possession.

E.g., consider a laptop and a "casual user": "It doesn't work".

- is the battery discharged?

- is the battery incapable of holding a charge?

- is the device *actually* running off AC power?

- is the disk spinning up?

- is something in the BIOS configuration hosed?

- is the backlight dead?

- has the disk image been corrupted?

- is the CPU/GPU overheated (fan failure)? etc.

You *could* try to talk the user through this sort of diagnosis over-the-phone. *Maybe* he can hear a spinning disk. *Maybe* he can feel airflow from a cooling fan. Maybe he is telling the truth when he says the battery has been charging for 6 hours. etc.

But, that effort costs you time/money. And, it will aggravate the user when he has invested EVEN MORE TIME "doing YOUR work for you" (over the phone) only to face the possibility of being told, "Gee, we don't know what the problem is... send the unit in to us..."

All of these are worthwhile bits of information to have

*before* resorting to "ship the 'defective' unit to us *while* we are 'overnighting' a replacement to you". Having a way to get information from the device itself without relying on the user's "interpretation" of symptoms, etc. is a big win.

The device *may* be "fixable" (not necessarily "repairable") by the user. E.g., if you can verify that the problem is a faulty battery, then you can just ship a replacement battery to the user for far less cost and EFFORT (yours as well as the user's) than replacing the entire device.

A "local representative" requires you to *have* folks in the field to support the devices. That imposes a big fixed cost "just in case" things *might* break. Easier if *you* can just give the user an answer and a remedy in short order.

E.g., apparently, a *huge* percentage of "returned" disk drives test as "No Defect Found". So, the manufacturer absorbs a big cost for testing and replacing those devices needlessly. And, runs the risk that the "replacement" device may be regarded as "similarly defective" (i.e., the first error was apparently user related or a consequence of some other aspect of the application that used the drive -- "software bug"?) which reflects badly on the disk manufacturer. (people remember returning the drive and might not remember that the real problem was proven as "elsewhere")

If, OTOH, the drive manufacturer could talk to the drive

*without* having to rely on all the other bits of the "system" in which it was employed, then many of these problems could be avoided.

Vote

J

jacko 15 years ago

Emitting DTMF maybe a good one, so an automated right department switchboard thing could happen.

Vote

R

robertwessel2 15 years ago

If you can do amplitude and frequency modulation, why not just speak the error codes? Of course that means you have to pick a language.

Vote

1

1 Lucky Texan 15 years ago

Consider a freq. SWEEP.

Perhaps a sweep UP - followed by beeps for the 'tens'.

then a sweep DOWN -followed by the unit beeps (0 or 1-9)

then pause or steady tone, then repeat.

Not a dissimilar problem automakers had from the pre-OBDII system in cars that would have a light flash long/short/very-long, to yield troublecodes. (maybe check troublecodes.net or .org w'ever for examples)

Vote

U

upsidedown 15 years ago

When multi megapixel pocket cameras are common and cellular phone cameras reach at least VGA or even megapixel resolution, why not generate static error code displays and ask the customer to use a camera or cell phone to send a picture of that situation. We have been able to detect missing terminators and other wiring problems this way.

I mostly work for a company with offices in quite a few countries all over the world, unfortunately most of the countries, in which English is not the native language (including the HQ).

If you have time (at least a few days) to be familiar with the English dialect spoken by the other person in a different country, the communication starts to be productive.

However, in problem solving situations, people usually meet for the first time and the problem has to be solved in an hour or so, so there is no time to learn the dialect of the other partner.

For this reason, we try to encourage e-mail (and other written communication) to avoid the problems with dialects. Asking to take pictures is often more productive than trying to ask someone to do complex sequences, if we do not have a common language.

Vote

R

Rob Gaddi 15 years ago

I don't remember who, but at one point a PC motherboard manufacturer was, in addition to putting the BIOS beeps, putting down a single

7-segment LED. It was the easiest diagnostic tool I've ever used. Must have been some fairly high-end motherboard; I can't see most of them parting with those 29 cents lightly.

If you've got room and pins, it's not a bad answer. Flash each digit for a second, then blank the display for two seconds. Define your status codes so as to not use any numbers with two consecutive digits the same.

Rob Gaddi, Highland Technology Email address is currently out of order

Vote

J

Jon Kirwan 15 years ago

BIOS 'post' codes were actually displayed on the front panel of some machines I worked on, starting nearer the end of the

80286 days and just before the 80386, memory serving. These responded to port 0x80 and 0x84, something like that? Two 7-seg digits wide. (I don't recall single digit cases.) They could be installed on the motherboard, extended out to case exterior, or added into an ISA slot, 8 or 16 bit, if available, or simply "across" the south bridge in some fashion. I still have a baggy of these post cards I could add into any machine with an 8-bit ISA slot. (Yes, I still have a few operating machines I use with ISA slots, too.)

I've got the same two-digit display on my newly purchased gigabyte motherboard, though there is no south bridge and no ISA capability. So I'm not sure how it is accessed from software, anymore. It seems strange to imagine a pci interface for it, so I'm guessing some of the chipsets include pins for the purpose or otherwise internally still catch those I/O addresses, after power-on configuration sets up the chipset.

Jon

Vote

D

D Yuniskis 15 years ago

Fails big time when a phone is *not* involved:

"Could you please tell me what the error tone was?"

"Well, it kind of made a bunch of beepy-boppy noises..."

Vote

D

D Yuniskis 15 years ago

Wrong audience. :> I doubt "GrandMa" is going to know Code! And, any implementation like that would be too "rich" for the information content it carried (unless you restricted yourself to all symbols of, e.g., 3 tones -- in which case, being able to map those tone sequences to "letters" doesn't really buy you much)

I think it is important that the format of the "error code" be rigidly constrained. If, like Morse, a symbol can have different numbers of elements (e.g., dit vs. dah-dah-dah), then you run the risk of a user failing to perceive/remember a portion of the code and effectively morphing one code into another. E.g., there is a reason you say "code 001, code 142, etc." and not "code 1, code 142, etc."

Vote

D

D Yuniskis 15 years ago

Gack! I, for one, don't have the "auditory acuity" (?) for such tasks (one reason I never took the Ham exam when I was younger)!

The same sorts of mechanisms can be applied in single processor devices *if* you plan for it from the start. Much harder to try to retrofit that capability into an existing system ex post factum -- unless you get lucky! :> It also is a lot easier if you design the hardware with this issue in mind.

E.g., to return the to laptop analogy I mentioned elsewhere...

If you can't control power to the disk drive, then a "low battery" will cause the entire system to "refuse to start" (because of the load placed on the battery by the disk trying to spin up dragging the power supply down *as* the processor is trying to report a failure, etc.)

"Reliability" is a wee bit harder to fit into designs, eh? ;-)

Vote

R

robertwessel2 15 years ago

Must

It's still port 80h, with either word or byte writes to that port. That was one of the DMA ports on the PC/XT, but that changed with the AT. AFAIK, the PCI POST cards are handled as legacy devices and get the fixed assignement to port 80h. PCIe versions appear to exist.

Vote

J

Jon Kirwan 15 years ago

The problem I still have with accepting that is that there is no ISA bus, anymore, and not even a south bridge to emulate one. How are frontside bus I/O transactions moved through the chipset to such a display, these days, without the south bridge? I know it works. I can see it. I just know that it doesn't happen like it used to and am curious.

CPU hits an I/O instruction. Start there. Which frontside bus transaction takes place? If the chipset picks it up, how does it handle it? (Used to be a south bridge and sideband channel wires between the north and south bridge to mediate. Now there is no south bridge, no sideband.) There certainly are no ISA bus transactions, anymore, so the old logic methods don't apply.

Jon

Vote

R

robertwessel2 15 years ago

=A0Must

Not quite sure I understand the question - it's just a PCI (or PCIe) transaction to an I/O mapped BAR on a particular device. Now as to the specifics of a PCI POST card, I don't know, but once it's configured, it should be no different than (say) a legacy serial port or basic VGA support. The configuration question is somewhat interesting, but I assume most POST cards just supply a default configuration (or perhaps even a fixed one). Most bridges should come up with I/O address filters turned off, so I/O transactions should bounce around everywhere as soon as the bridges are enabled.

Vote

T

Thad Smith 15 years ago

I have used a variable number of beeps: beeeeeep, (pause) beep, beep, beep. The number of short beeps is the code. I think I used about 2 beeps / sec.

If you have many possible codes, then separate them into digits with a space in between.

Thad

Vote

D

D Yuniskis 15 years ago

Speech (crude) takes about 40KB (or are you just suggesting something like canned saydigit(digit)?)

Vote

D

D Yuniskis 15 years ago

I've seen similar front panel displays. One even showed which disk block was being accessed (um, isn't this as useful as the myriad "status lights" on old mainframes? :> )

I was hoping to find a solution that had an audio counterpart (i.e., "blink" in contexts where visual indicators make sense; "beep" where they don't)

Vote

Error/Beep/Blink codes

Join the Discussion

Didn't find your answer?