Skybuck's Universal Code 4 (memory efficient)

- S
- Skybuck Flying
  
  Contact options for registered users
posted
16 years ago

Sat, May 26, 2007 7:01 AM

Skybuck's Universal Code 4:

The first version of Skybuck's Universal Code describes the basic idea but it's memory inefficient for large ammounts of data/information.

For each data bit a marker bit was necessary.

Also the interleaving of bits could be cumbersome for software and maybe even hardware ;)

Thus I present to you a somewhat more efficient universal code, which is still variable bit-based but more efficient, though I am not sure if I like the bitness of it... maybe I'll think up a byte version for intel system which are octal/byte based :)

For now however the basic concept for version 4 is:

Length1 + Length2 + Data

This represent 3 fields stuck together in the information stream.

The first field describes the length of the second field. The second field describes the length of the data.

The first field is like a marker for the second field, and the second field is length for the data in binary encoding.

So I could also be described as:

Marker + Length + Data

But this could confuse people with version 1... where the term marker ment a full marker...

So maybe describe it better as:

LengthMarker + Length + Data

or

LengthOfLength + Length + Data.

The first field consists out of zero's with a 1 terminator to describe the length of the second field, the second field uses binary encoding, and the data can be anything.

An example:

Suppose the following data has to be encoded/stored:

1234567890123 Hello World !

13 characters.

Length = 13 + 1 for null terminator = 14.

Bits needed to encoded 14 in binary: 0x1 + 1x2 + 1x4 + 1x8 = 4 bits.

Thus also 4 bits needed for the marker.

The information stream as stored as follows:

length marker:

0001

length:

0111

data: xxxxxxetc.

00010111xxxxxxetc

the length field describes the data length in bits to make it bit flexible.

So now software can read the information stream as follows:

// read length marker:

LengthMarkerCount := 0; repeat if ReadBit( Bit ) then begin LengthMarkerCount := LengthMarkerCount + 1; end else begin break; // error. end; until Bit = 1;

// allocate length field.

SetLengthInBits( Length, LengthMarkerCount );

// read length: LengthCount := 0; repeat if ReadBit( Bit ) then begin SetBit( Length, LengthCount, Bit ); LengthCount := LengthCount + 1; end else begin break; // error, or retry, inform, etc. end; until LengthCount = LengthMarkerCount;

// allocate data field in bits.

SetLengthInBits( Data, Length );

// read data DataCount := 0; repeat if ReadBit( Bit ) then begin SetBit( Data, DataCount, Bit ); DataCount := DataCount + 1; end else begin break; // error or retry. end; until DataCount = Length;

// ofcourse the example would not be complete without a write example how to encode information so let's do that too:

For example take the string as example.

DataCountInBits := Length( 'Hello World !'+#0 ) * 8; LengthCountInBits := BitNeededFor( DataCountInBits ); // use log functions to determine bits needed to encode the value in binary. LengthMarkerCountInBits := LengthCountInBits; // same

// now first write the marker:

if LengthMarkerCountInBits = 0 then exit; // strange error which should not happen.

LengthMarkerCount := 0;

repeat if LengthMarkerCount = LengthMarkerCountInBits then begin // write terminating bit. if WriteBit( 1 ) then begin LengthMarkerCount := LengthMarkerCount + 1; end else begin // error, or retry, inform, etc. end; end else begin // write leading bit(s) if WriteBit( 0 ) then begin LengthMarkerCount := LengthMarkerCount + 1; end else begin // error, or retry, inform, etc. end; end;

until LengthMarkerCount = LengthMarkerCountInBits;

// now write the binary length field.

if LengthCountInBits = 0 then exit; // strange error which should not happen.

LengthCount := 0;

repeat if LengthCount = LengthCountInBits then begin if GetBit( Length, LengthCount, Bit ) then begin // write binary length bit. if WriteBit( Bit ) then begin LengthCount := LengthCount + 1; end else begin // error, or retry, inform, etc. end; end else begin break; // memory error. end; end;

until LengthCount = LengthCountInBits;

// now write the data field:

if DataCountInBits = 0 then exit; // strange error which should not happen.

DataCount := 0;

repeat if DataCount = DataCountInBits then begin if GetBit( Data, DataCount, Bit ) then begin // write data bit. if WriteBit( Bit ) then begin DataCount := DataCount + 1; end else begin // error, or retry, inform, etc. end; end else begin break; // memory error. end; end;

until DataCount = DataCountInBits;

There, that should give you an idea of how to do it. Ofcourse this is only concept code... not really tested it... but should work nicely.

Some random and some crappy thoughts about using it in software development source codes (can be skipped for reading):

(* I am not sure if I like working with bits like this.. so as I wrote before... I might think up a byte version of the universal code... which might be more easy to program and handle...

not sure yet... working with bits can be straight forward to... but then again sometimes not... depends on the source/classes etc... don't know really, haven't tried yet...

Extra classess, and methods definetly needed... would be handy to have some code which can simply encoded any type of data really... like a pointer and a size or so... probably easy to make... but also everything has to be "serialized" and "deserialized" which is a bit weird and such... also offsets go out the window ;)

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, May 26, 2007 1:49 PM

On May 26, 12:01 am, "Skybuck Flying" wrote: [...]

If you go back and look in the responces to the earlier posts you made on this subject, you will find a couple far more efficient and less limiting suggestions were made back then. They would have allowed you to transmit 8 bit data in far less than the 14 bits your method is suggesting.

This method you are now posting is nothing new. I was something like it implemented in the 1980s.

BTW: You can do better if you leave the LSBs off the numbers.

- S
- Skybuck Flying
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, May 26, 2007 2:57 PM

I read all reactions and I can not remember seeing such a suggestion.

It's funny to see you did not get the method correct or maybe you just made a sloppy mistake.

For 8 bits the encoding will be:

8 bit for data.

Prefixed with:

The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.

Prefixed with:

The marker for the length field which is: 4 bits as well.

For a total of 8 + 8 = 16 bits :)

Maybe, maybe not.

You made same claims without backing them up with evidence :)

I do not understand what you are hinting at.

Leaving out the least significant bit leads to inprecision ?

I don't want that !

Thanks anyway, funny to see america wake up and see my post around this time :)

Bye, Skybuck.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, May 26, 2007 7:40 PM

No, it appears I gave your credit for understanding something you didn't.

That is in effiecient. You never need to code for a zero length so the length field would only need 3 bits.

Again it is not needed to allow for the zero length case so this is only 3 bits.

So it is worse than I thought.

Oh well, you can decide not to believe me if you like. Its up to you.

There is no need to have an LSB on the length specifier. The number can have an extra MSB in it for the case when needed to round it up to the next length that can be specified. This reduces the length field and the length of the length field.

It is still quite an inefficient way to do it. It is also limited in length. There are far better options. Take a look at Huffman codes you will see how they handle using differing numbers of bits. Any of several systems based on those ideas would work better for you.

- S
- Skybuck Flying
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 2:18 AM

Ok,

Now I understand what you're saying.

Marker:

1

Length

0

Data

Missing.

Encoding an empty universal code could be handy for software which uses classess to implement universal codes and simply uses this encoding to describe "no data".

You mention other systems, are they universal codes, meaning they describe their own length and scalable indefinetly ?

I know static huffman does not scale, it's static afterall ;)

Nor will dynamic huffman probably. How will you encode the new character ?

Again you do not seem to crash the powerfull concept of a universal code =D or you mistakenly think other systems are universal codes ;)

Nor will I modify the universal code just to save one bit and create a mismatch in binary system... cause that what your suggestion would mean.

0 would no longer represent zero... but 1

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 3:07 AM

You've used more than 2 bits to say "no data" and 3 bits to say 1 or zero.

Better coding based on the idea behind Huffman :

00 - Nodata 01 - True 10 - False 11 - All other cases

Now we have the three simple cases done now we can decide how most efficient path goes from here. I'm lazy so I will only state a possible path that stays more efficent than your suggestion:

At 2 bits, your method would take over 4 bits so we can step directly to the 4 bit groups and always be better than your way:

1100 - 0 plus the following bit as a pair 1101 - 1 plus the following bit as a pair 1110 - Three bits follow 1111 - All other cases

This method can be extended without limit and will always be better.

The last time you posted this, someone pointed out a better pattern than this but this one is good enough to prove the point.

- S
- Skybuck Flying
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 5:19 AM

Also,

Your reasoning must be inheritly flawed.

Huffman compression is based on the idea that some characters or data will occur more often then others.

Binary encoding is the most efficient "general" encoding where no occurence statistics are known before hand.

Then there is also dynamic huffman compression which learns the occurence as it views the data... however this requires describing new characters...

And a very important question is how do you describe/encode the new characters ?

Fixed bits for new characters/data is ofcourse flawed, because it's not variable bit, thus not an universal code :)

Some without completely understand what your little tiny encoding thingy was about.. it's probably based on flawed reasoning in the first place :)

:::

Bye, Skybuck ;)

P.S.: I like my universal encod>> Ok,

- J
- Jamie
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 3:55 PM

You're out of your league boy!. too many old timers here that's been around the barn more times than you can count!.. Read up on Huffman.

--
"I'm never wrong, once i thought i was, but was mistaken"
Real Programmers Do things like this.
http://webpages.charter.net/jamie_5

- J
- Jim Thompson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 4:08 PM

I'm so old I had Dave Huffman for an instructor ;-)

...Jim Thompson

--
|  James E.Thompson, P.E.                           |    mens     |
|  Analog Innovations, Inc.                         |     et      |
|  Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    |
|  Phoenix, Arizona            Voice:(480)460-2350  |             |
|  E-mail Address at Website     Fax:(480)460-2142  |  Brass Rat  |
|       http://www.analog-innovations.com           |    1962     |
             
         America: Land of the Free, Because of the Brave

- J
- Jim Thompson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 5:02 PM

[snip]

I remember Huffman as a truly outstanding instructor.

...Jim Thompson

--
|  James E.Thompson, P.E.                           |    mens     |
|  Analog Innovations, Inc.                         |     et      |
|  Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    |
|  Phoenix, Arizona            Voice:(480)460-2350  |             |
|  E-mail Address at Website     Fax:(480)460-2142  |  Brass Rat  |
|       http://www.analog-innovations.com           |    1962     |
             
         America: Land of the Free, Because of the Brave

- J
- Jamie
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 5:11 PM

Oh lucky you, he most likely got most of his idea's from students coming up with their idea's! :) an easy way of collecting idea's ! Not all student idea's are fantasies.

--
"I'm never wrong, once i thought i was, but was mistaken"
Real Programmers Do things like this.
http://webpages.charter.net/jamie_5

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 9:45 PM

Made everything as short as possible without loss of information?

--
  Keith

- R
- Rudy Velthuis
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 9:51 PM

LOL!

-- Rudy Velthuis

formatting link

"Maybe there is no actual place called hell. Maybe hell is just having to listen to our grandparents breathe through their noses when they're eating sandwiches." -- Jim Carrey.

- M
- Michael A. Terrell
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 10:00 PM

To hear you tell the story, you helped him invent dirt! ;-)

--
Service to my country? Been there, Done that, and I\'ve got my DD214 to
prove it.
Member of DAV #85.

Michael A. Terrell
Central Florida

- J
- Jim Thompson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 10:09 PM

I did! Read my patents ;-)

...Jim Thompson

--
|  James E.Thompson, P.E.                           |    mens     |
|  Analog Innovations, Inc.                         |     et      |
|  Analog/Mixed-Signal ASIC\'s and Discrete Systems  |    manus    |
|  Phoenix, Arizona            Voice:(480)460-2350  |             |
|  E-mail Address at Website     Fax:(480)460-2142  |  Brass Rat  |
|       http://www.analog-innovations.com           |    1962     |
             
         America: Land of the Free, Because of the Brave

- M
- Michael A. Terrell
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, May 27, 2007 10:30 PM

I can't. :( There's some brown, grity stuff all over them, and its rubbed off too much to make out anything. ;-)

--
Service to my country? Been there, Done that, and I\'ve got my DD214 to
prove it.
Member of DAV #85.

Michael A. Terrell
Central Florida

- R
- Rudy Velthuis
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, May 28, 2007 12:03 PM

But you have apparently not understood it.

-- Rudy Velthuis

formatting link

"We have art to save ourselves from the truth." - Friedrich Nietzsche (1844-1900)

- S
- Skybuck Flying
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, May 28, 2007 12:05 PM

I have studied the huffman tree's and it's algorithms, those not really suited ?

Maybe there is something else ?

Bye, Skybuck.

- R
- Rudy Velthuis
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, May 28, 2007 12:07 PM

Depends. Only you can decide what is suited for your application.

--
Rudy Velthuis        http://rvelthuis.de

"Never raise your hands to your kids. It leaves your
 groin unprotected." -- Red Button

- R
- Rudy Velthuis
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, May 28, 2007 12:16 PM

If you want to compress very short pieces of data, huffman is not suited, since it requires a translation table to be sent with the compressed data. If the table, compared to the size of the data, is negligible, it is a useful way to compress data, and very well suited.

But if you know that your data is pretty similar al the time, you can use a default translation table, and then it doesn't have to be sent along. Principle is that bytes (or words) that appear very often get a very low number of bits, while bytes that hardly ever appear, get a larger number of bits. Even then, your data sample should not be too short. Your example only contained a few bits. That can hardly be compressed usefully.

--
Rudy Velthuis        http://rvelthuis.de

"Multitasking /adj./ 3 PCs and a chair with wheels !" -- unknown