Referencing 42 bits

Base 36? Or if you want some symbols not to appear base 36-x.

--
Reinhardt
Reply to
Reinhardt Behm
Loading thread data ...

It's early AM here, no coffee in the house, so please forgive the following : :)

I want to take 42 bits of information and encode, or condense them, into a small string of text to that I can easily, and even verbally, communicate a whole bunch of information at once. The receiving end would have an appro priate decoder.

Printable characters only. And ideally, restricted 0-9, A-Z (uppercase, and avoiding lower case), alth ough a Dash could be used if it ended up reducing the character count in th e encoded message. Maybe a couple others?, but want to stay away from thin gs like "?" or ";", ";", "%", etc...

Even more ideally, eliminate easily confused characters such as "O" "Zero", I, and L. Athough if upper-case only, the I and L can be kept since the c onfusion arises with their lowercase equivalents.

Maybe I'm just over-thinking this.?

Got any ideas? Maybe I should just see what 2^42 comes out to be in decimal or hex. But seems I could get it shorter.?

Reply to
mpm

It's early AM here, no coffee in the house, so please forgive the following: :)

I want to take 42 bits of information and encode, or condense them, into a small string of text to that I can easily, and even verbally, communicate a whole bunch of information at once. The receiving end would have an appropriate decoder.

Printable characters only. And ideally, restricted 0-9, A-Z (uppercase, and avoiding lower case), although a Dash could be used if it ended up reducing the character count in the encoded message. Maybe a couple others?, but want to stay away from things like "?" or ";", ";", "%", etc...

Even more ideally, eliminate easily confused characters such as "O" "Zero", I, and L. Athough if upper-case only, the I and L can be kept since the confusion arises with their lowercase equivalents.

Maybe I'm just over-thinking this.?

Got any ideas? Maybe I should just see what 2^42 comes out to be in decimal or hex. But seems I could get it shorter.? =================================================================================

I don't know what rules someone like Microsoft uses when they create one of those "6 groups of 5 characters each" software license id's, but for maximum safety I think as a minimum you should leave out zero, one, five, and the letters I, J, L, O, and S, since someone may write this and use lower case and not know what characters you excluded so they could try and "guess" a 1 for an l, for example, so don't give them the chance. You don't want to confuse zero and O, 1 and L, 5 and S, i and l, and I and J. There may be other pairs but these are the biggies I can think of off the top of my head.

2^42 is 4.4E12, and using 2-4, 6-9 and A-Z minus I, J, L, O, and S leaves 28 characters, and 28^9 is 1.1e13 so you will need 9 digits for your data. Write it as xxxx-xxxxx or xxx-xxx-xxx since smaller groups are easier to copy and remember. Just my layman's thoughts, anyway :-).

----- Regards, Carl Ijames carl.ijames aat deletethis verizon dott net

Reply to
Carl Ijames

2^42 = 32^8.4

so with a 32 symbol set it would take 9 symbols.

even with a 37 symbol set it wouls still take 9

however: 26^9 is also more than 2^42, so with only 9 letters you can still cover 2^42.

2^42 = 4398046511104 37^8 = 3512479453921 - not enough 26^9 = 5429503678976 - sufficient

perhaps drop the vowels and substitute in digits 34679 to avoid accidentally spelling offensive words,

--
umop apisdn
Reply to
Jasen Betts

42 bits if you had an alphabet of length 64 = 2^6 would be 7 characters (exactly) 32 = 2^5 would be 8.4 characters 16 - 2^4 would be 10.5 characters 2^5.25 = 38.05 so you need 39 symbols to get it into 8 (pleasing length)

Beware of 0OQ I1l 5S 8B my pet hate is when secure registration codes include one or more of these in some fancy hard to read 6pt font.

My instinct would be to use all 26 letters plus 234679 = 32 and then + - / * @ # seven others to make up 39.

In hex it will be 11 characters long (and it has the 8 B ambiguity).

--
Regards, 
Martin Brown
Reply to
Martin Brown

Ah! That makes it easy! 2^42 is *about* 4*1000^4 (2^10 ~= 1000) So, build a dictionary of ~1500 words (or, ideally, 4 *different* dictionaries each of ~1500 words). Assign a unique 11 bit number to each of the words in the first (second, third and fourth) dictionaries. Use the first dictionary to encode the first 11 bits, the second for the next 11 bits, etc. So, you end up with a four word phrase: yellow dog house runs Or, make the dictionaries smaller (encode just *6* bits -- so 64 words in each dictionary) and use a *seven* word phrase.

By keeping each dictionary "disjoint" from the others, you also eliminate the possibility of word transposition errors creeping into the phrase: dog yellow runs house would not be valid -- because "dog" is only present in the "second" dictionary, "yellow" only present in the *first*, etc. So, you have a sort of check algorithm built into the encoding.

Alternatively, you can view it as providing more flexibility to the user (he can mix up the words and you can *still* recover the original data!). Or, more flexibility to how you rearrange those words to make a more memorable "phrase": yellow dog runs house is probably more memorable than house yellow dog runs

This is practical because you claim the receiving end will have an "appropriate decoder" (yet haven't specified how complex that decoder will be -- so, a little piece of software or a list of words, etc.)

You can add words and shorten dicstionaries. Then, by carefully choosing the words in each dictionary (i.e., word position), you can create pseudo-meaningful sentences:

- first word is a 2 digit number between one and thrity two (it encodes 5 *obvious* bits)

- second word is an adjective from a list of eight colors to encode three bits

- third word is one of 32 nouns (cow, dog, horse, bottle, etc.) to encode 5 more bits

- fourth word is a verb from a list of 16 verbs to encode 4 bits

- fifth is a preposition from a list of...

"Twelve blue artichokes slid down..." "Eight green pidgeons ran along..."

Sure, they're nonsense. But, they have the same benefits of the dictionary approach listed above ("Blue artichokes twelve down slid..." is obviously not a valid encoding!) and are more memorable to a casual user. Can you recall the "authorization code" from *any* of your software licenses FIVE SECONDS after having READ IT?!

If you want a *simpler* decoder, you could adopt something similar; encode data in consonants and allow vowels to be inserted at will. So, B=0000, C=0001, D=0010, F=0011, etc. Then, "make up" pronounceable strings by inserting vowels as convenient:

FoD = 0011 0010

This is harder to *create* encodings but trivial to decode them (just elide the vowels and convert the consonants).

Reply to
Don Y

wing: :)

o a small string of text to that I can easily, and even verbally, communica te a whole bunch of information at once. The receiving end would have an a ppropriate decoder.

although a Dash could be used if it ended up reducing the character count i n the encoded message. Maybe a couple others?, but want to stay away from things like "?" or ";", ";", "%", etc...

ro", I, and L. Athough if upper-case only, the I and L can be kept since t he confusion arises with their lowercase equivalents.

l

This looks promising... working on it now. Thanks to everyone here. -mpm

Reply to
mpm

Another consideration, error detection and correction, argues that you WANT some redundancy, some extra symbols that can act as a test of the transmission. There are many examples of these: parity bits, and casting-out-nines, and ISBN (the numbers on books' barcodes have a final check-symbol) for numeric info come to mind. Hashes and MP5 checksums get more elaborate, and you can go all the way to ECC (error-correcting codes).

Reply to
whit3rd

uuencode is an early example of a program that does exactly this; it translates 8-bit data (usually from a file) into 7-bit ASCII that would survive early email and file-transfer protocols. It uses the whole ASCII set from space to _, so it includes a lot of punctuation that you may not want. xxencode uses a more restricted character set which may be closer to what you want. Base64 is a newer development of the same idea.

formatting link
formatting link
formatting link

An advantage to these is that code already exists in lots of programming languages to handle them.

One knock on all of these is that they use both uppercase and lowercase letters. If you always transmit this data with a computer, this doesn't matter so much, but humans might not know that the case is significant.

Matt Roberds

Reply to
mroberds

And, the issue of O0Q, 1l|, 8BE (E being a B that has "faded" over time or due to mechanical damage to a label, etc).

I *despise* MS's CoA's as, invariably, there are one or two characters that require a judgement call -- get it wrong and you risk the machine "tattling" on your "misdeeds".

I think it is important to remember that people have reasonably short (as in "number of items") memories -- the "magic seven" concept. So, beyond 7 ARBITRARY characters, most people will have to resort to writing things down. Witness how few folks commit LENGTHY passwords to memory.

I prefer the "encoded phrase" approach as it is usually easier for folks to remember such -- even if it is nonsensical. And, the inherent structure of the phrase helps reinforce that memory: "It was something about 25 ponies and a swan... and something was BLUE!" (i.e., "blue" only is allowed as the 3rd word in the phrase so even though the user forgot the phrase, they've remembered enough of it that you can reconstruct it -- far easier than "one of the digits was a '3'...")

I also like the "read it to me over the phone" test; if you have to ask "was that 'capital B' or just 'b'?" then you've compounded the memory and transfer actions. "Was that an 'eff' or an 'ess'?"

[Of course, you also have to consider *how* the data will be exchanged; punching digits on a DTMF keypad places different restrictions on the data format than "reading words to a human operator". Even then, the choice of words matters: "was that 'buck' or 'but'?"; "'safe' or 'save'?"; etc -- esp over a noisey or bandwidth limited channel!]

Regardless, the fact that he's *thinking* about the issue instead of just blindly opting for an "obvious" solution is

2 steps in the right direction!
Reply to
Don Y

Thats neat Don !

Might be a good way to encode passwords into something memorable.

--
Regards, 

Adrian Jansen           adrianjansen at internode dot on dot net 
Note reply address is invalid, convert address above to machine form.
Reply to
Adrian Jansen

When I was a "pre-teen" (is there such a thing?), I was involved in some research projects that did things like test cognition, etc.

One of the tasks was "random digit string recall" -- a string of digits would be recited (of course, length kept increasing -- 9, 12,

15, 18, etc. digits long) and you were tasked with remembering the digit sequence for recall.

Initially, just remember it long enough for it to be *recited* to you. But, over time, the experiment changed so you'd have to recall it "later in the day" (meanwhile, being challenged with other tasks). Then, given multiple strings and asked to recall a *particular* one (unknown at the time they are initially recited to you).

From this, it was apparent that people can't "hang onto" this much CONTEXT-FREE detail very reliably. And, that an "association" greatly improves retention. Anything that helps provide a framework against/upon which you can build neural pathways, etc. to reinforce those "arbitrary" facts.

[I keep lots of passwords, account numbers, PINs, phone numbers, etc. "in my head" because it seems easy for me. Other folks can't seem to keep track of a 4 digit PIN without writing it down! OTOH, I have to "visualize" what I hear -- my "auditory memory" sucks!]

Spoken words carry lots of side-channel information that helps reinforce their associations. E.g., rules of grammar remind you that it has to "make sense" (even if it is nonsense!) from a parts of speech perspective. You also make note of cadence as you "hear it" -- even if you are hearing it in your mind. Etc.

Places like IKEA use a similar idea -- don't have model numbers (which NO ONE will remember or bother to jot down); instead, give things *names* -- even if they have no relationship to the item's "nature": "Ah, that's a Jerome!"

Or, design the password *generator* to create "memorable pass phrases" for you (this has been done) instead of silly "license plate style" passwords (JYQ634L, etc.)

I've been looking into something similar for "error codes"; a numeric code is about as useful as t*ts on a bull. Too easy to ignore or "fail to remember". And, textual messages are too lengthy and easy to MIS remember ("Oh, it was something about an underline..." "Do you mean *newline*??").

OTOH, a couple of words might be odd enough that they would stick in the user's memory.

Or not.

Reply to
Don Y

Ooooh, I hope you name them after hurricanes (or use the same naming convention). ;-)

"The thing stopped working and just said 'Bertha'! What the hell?" "Ah, you had an IP configuration error..." "A what?!" ...

Tim

--
Seven Transistor Labs, LLC 
Electrical Engineering Consultation and Contract Design 
Website: http://seventransistorlabs.com
Reply to
Tim Williams

base 36 is an obvious choice which gives you 7 groups out of the

42 bits. 0..9, A..Z The only problem I see with this is that you will not be able to compress it so easy.. If you were to use a char/bit, you then could have variable length blocks because you simply don't send a charactor bit if the bit isn't on. This gives you the choice of sending lets say nothing but a marker char if for example all bits are off or just "0" if only the first bit is on and "5" if bit 5 is on or grouped together and so on.

Jamie

Reply to
M Philbrook

Ha! No, not enough *names* to choose from! :< "Lawanda Rae" "Bubba Leroy" "Bernard Thomas" etc.

Actually, it's not quite that humorous. A short message describes the problem (so folks familiar with the situation needn't be bothered with a *lengthy* message). If the user doesn't understand ("needs help"), a hot link to a diagnostic/explanatory section of the online manual is offered up.

Originally, I was just going to let users report the "manual page number" (this is an oversimplification) and, from that, understand the nature of

*their* problem. Or, a "section number" (6.1.5a, etc.).

But, I figured it would be easier for them to reference some "catch phrase" than to remember a string of digits, etc. E.g., folks recall a "blue screen of death" without really recounting the *contents* of that screen! With time/experience, users will learn that the "catch phrase" is the key to quicker problem resolution!

Reply to
Don Y

Nice one Don! Could limit it to three word sentences to make it easy for each sentence combination to make proper sense. Say 10 words in each of subject, verb and object data sets, or 4 sentences total, with one lone word at the end to polish off the last few bits.

--

Best Regards, 

ChesterW 
+++ 
Dr Chester Wildey 
Founder MRRA Inc. 
Electronic and Optoelectronic Instruments 
MRI Motion, fNIRS Brain Scanners, Counterfeit and Covert Marker Detection 
Fort Worth, Texas, USA 
www.mrrainc.com 
wildey at mrrainc dot com
Reply to
ChesterW

There's always a tradeoff between sizes of each "dictionary" and the possibility of a misremembered *word* "colliding" with another *good* word.

"Er, I think it was 'ten blue gulls'..." "Are you sure it wasn't 'ten blue penguins'? (because "gulls" isn't present in the smaller dictionary -- while it may well be (of necessity) in a larger one)

It really isn't hard to get a 5 or 6 word phrase together without resorting to huge dictionaries. But, you *do* need to exercise some care in how you choose the words for those dictionaries so folks don't

*confidently* misremember. Ideally, any faults on the user's part would result in things like "something about ten blue birds of some sort" (leaving *you* to conclude "gulls" -- or penguins, etc. -- as the other candidates were: snake, dog, lion, goldfish, mouse, ...)

Keep in mind the role the "key" (code) plays and the *mood* of the user when they encounter it. E.g., in my case (error code), they are *probably* frustrated and angry when they *need* to remember the "code". "Damn thing doesn't work!" (no, damn thing doesn't work the way you *expect* it to work in these circumstances; the *code* explains why!) Having them recite an N digit error code -- or worse, an ambiguous error message that may be issued in 5 different places for

10 different reasons! -- is probably NOT going to be greeted with cooperation or understanding. [Consider what it must be like for a Support person to ask an irate caller: "Are you sure it is PLUGGED IN?" I suspect most callers would hear that as an insult -- even those who HADN'T verified that it was plugged in! (no one wants to admit they started on a tirade only to discover THEY had done something stewpit). And, an "advanced" user surely doesn't want to wait for Support to lead him through a checklist of all the silly "check this first" stuff on the script!]

OTOH, if it is an odd enough phrase, there's a chance that it may have stuck in their mind -- especially if they *knew* to look for it! "I don't know... something about a green walrus..."

Even if they only remember part of it, as each "word" can represent more than a single "digit" (in an error code), the number of possible cases is greatly reduced.

Dunno how it *will* work. But, I *do* know that too many things leave you at cliff's edge when something unexpected happens. It's as if the designer's just through up their hands -- they were conscientious enough to *detect* the "condition"... but hadn't thought through what they were going to *do* about it! "Out of memory" or "Out of memory in Foobledydo() constructor". Wunnerful.

Reply to
Don Y

Uh-humm. Don.... Next time, I think I'm going to grab that cup of coffee first!! :)

Blue artichokes?

Nice idea, though. I will definitely tuck that idea away for possible use down the road....

Thanks again, everyone. Problem solved.

Reply to
mpm

Make sure it's *IRISH*! ;-)

Actually, if you ever get the chance to see an artichoke in bloom, they are spectacular! Bluish-purple.

(scroll through the photos -- worth the 60 seconds to do so!)

Makes you really reluctant to eat them if you grow them! (I've had no luck in my two attempts at growing them :< )

Reply to
Don Y

For verbally transmitted bits, see the word encoding in RFC1760. It is not the most efficient, but you can encode 42 bits to 4 english words.

formatting link

--
mikko OH2HVJ
Reply to
Mikko OH2HVJ

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.