Where are Huffman encoding applications?

Hi, I want to know how many fields Huffman encoding thoery are used.

As I can counted,

  1. In Window operating system, similar encoding is used to conpress text files;
  2. Text compression in ZIP files;
  3. JPEG fram compression;
  4. MPEG format compression? which one uses it?

Wireless communication?

Any other applications?

Thank you.

Weng

Reply to
Weng Tianxiang
Loading thread data ...

I am sure in JPEG and MPEG.

Reply to
quickwayne

On 1 Aug 2006 04:02:31 -0700, "Weng Tianxiang" wrote in comp.programming:

FAX transmission, although the Huffman codes are predefined and not generated dynamically from the data.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
Reply to
Jack Klein

Hi Jack, Fax application cannot be counted as full Huffman encoding. The reason is the statistics for characters are not dynamically collected.

Thank you.

Weng

Jack Kle> On 1 Aug 2006 04:02:31 -0700, "Weng Tianxiang"

Reply to
Weng Tianxiang

Also for JPEG, etc. in most cases the predefined default Huffman-tables are used (Althought the file-format would support custom tables). I think the term "Huffman" is also widely used for such "static" applications.

Regards,

Thomas

P.S.: I am wondering what the reason for your question is?

formatting link

Reply to
Thomas Entner

Hi Thomas, When Huffman frequency table is available, the decoding procedure is stright forward.

I know many electronic books use Huffman encoding method to compress the book full context.

Usually in the first pass of full context, Huffman frequency table is established , in the 2nd pass, a Hash function is used to locate a new word entry, then encode, based on the Huffman frequency table, and output it.

I want to know how Hash function is used in the 2nd pass.

Any books or tips?

Thank you.

Weng

Thomas Entner wrote:

Reply to
Weng Tianxiang

I think you are asking: how do I generate the huffman codes given the frequency of each symbol. If so look at:

formatting link

in the section "basic technique".

Cheers,

Andy.

Reply to
Andy Ray

Andy, No. I want to know the hash function.

For example, I have a Huffman encoding table as follows:

name Fre encode ... Andy 15, "011", Ray 20, "110", ...

Now a text string "Andy Ray wrote" is incoming, I want to know how to match the incoming "Andy" and the table entry "Andy".

One must establish a 1-to-1 relationship between the incoming word and the table entry.

What I can imagine to do in software is to create a hash function, then if there are more entries responding to one entry, further string match must be made.

I am wondering about whether or not a hardware design can does it.

Thank you.

Weng

Andy Ray wrote:

Reply to
Weng Tianxiang

A trie might be an option. I once had a conversation with someone who worked at a company that needed to match strings quickly in hardware and I believe they might've used a trie for that, although I could be remembering wrong since it was a few years ago.

If you haven't encountered tries, think of them as deterministic finite-state automata shaped like a tree, where the root is the start state, and transitions (edges) from one node to the next correspond to symbols on your input.

For example, if you want to recognize the strings "car", "cat", and "cup", your trie would look like this:

(start)--->( )-+--->( )----->(end) c | u p | +--->( )-+--->(end) a | r | +--->(end) t

Basically, you start at the start state, absorb a token and follow the edge associated with it, and repeat until you either don't have a matching edge or hit an end state.

The nice thing about a trie is that you don't have to worry about collisions or worst-case behavior. The time to match is a linear function of the length of the string you're matching.

A straightforward trie where every input symbol is a character (hence, probably 8 bits or maybe even larger) might be a little wasteful of space. But there are ways around that. The most obvious one is to say that the input is a string of bits and each bit is an input symbol. This makes your tree 8 times as deep as with 8-bit characters as input symbols, but it has the advantage that each node can have no more than 2 children (as compared to 256). A nice middle ground might be to make your input symbols pairs of bits or sequences of

4 bits.

- Logan

Reply to
Logan Shaw

Hi Logan, Your method is interesting, but is far from what I am looking for.

It is too slow. My target is 64-bit inputs on every clock and one must find its entry within 1-2 clocks and resolve entry conflicts if any at the same time.

This design, if succeeded, can be widely used for any dynamic Huffman encoding.

I think there is no such hardware design so far in the world and why I found that almost all Huffman encoding applications are static as in FAX, electronic books, ZIP files, JPEG, MPEG and so on.

Thank you.

Weng

Logan Shaw wrote:

Reply to
Weng Tianxiang

Ok - you're asking about decoding, not encoding.

I don't think there's a hash function for huffman codes, though I'd love to be proven wrong.

You can, however, decode huffman codes with a binary search tree encoded into a rom/ram. This works because no code is a prefix of another. You can alter the tree to take more than one input bit at a time at the expense of hardware.

A fully parallel hardware solution needs CAM (with don't cares), but that's really expensive to implement.

Cheers,

Andy.

Reply to
Andy Ray

Hi Andy, "Ok - you're asking about decoding, not encoding."

No, It is encoding!

Weng

Reply to
Weng Tianxiang

You are wrong there. ZIP files and most EBooks use some variant of LZ encoding using a suffix tree (which might be feasible in hardware), followed by Huffman coding - static *or* dynamic - on the resultant symbols. Zip in particular uses the deflate algorithm, which can emit either static or dynamically-generated Huffman data on each block. But AFAIK, the Huffman table is *not* updated with each symbol. There was a kid on sci.crypt a while back banging on about a per-symbol dynamic Huffman technique he'd invented, perhaps look there.

JPEG and MPEG use a DCT (and optionally in MPEG, motion compensation) before applying further compression. The DCT is *not* huffman by another name, its a discrete cosine transform.

Also, look for information on Suffix Trees (used in Lempel-Zif encoding), and you might find it can be done in a limited way in hardware. The tree is built incrementally, and looks kinda like Logan's trie, except that the first symbol of a sequence is at the leaf, with the last symbol at the root. Kinda...

In general, though Huffman coding was thought to be "optimal" two decades ago, newer techniques have been shown to be better. For example, arithmetic coding (needs two passes over the data though), and the Burrows-Wheeler transform (used in bzip).

There should be enough terms for you to go googling and perhaps find a more suitable algorithm than Huffman. Don't forget - compression is simply the process of removing whatever is predictable from a data stream. The better you understand your data, the better an algorithm can predict it, and the better your compressor can be.

Good luck!

Clifford Heath.

Reply to
Clifford Heath

Hi Cliffor, I appreciate your response very much and you are an experienced expert in the field.

I am a newbie in the field and really like to learn more. But I am determined to learn all those things you have mentioned better.

What books can you suggest for me to buy and read about the fields?

I have a book "Data Compression Book" by Mark Nelson.

I want to learn two arithmatic algorithm too.

Thank you.

Weng

Clifford Heath wrote:

Reply to
Weng Tianxiang

Hmm. Not sure I'd recommend books - they tend to be too theoretical. If I were you I'd enter some of the search words into wikipedia and google and read any technical papers you find that look authoritative. is a good start. Code can also be useful, though often very cryptic - people who write compressors are often very bad at writing readable code.

What's the source of your data? Are the 64 bits broken into fields, say 4x16-bit fields? What patterns might you expect? Do the values tend to cluster around a certain range, or tend to be close to previous readings? All these things affect the predicability of your data and so are intimately connected with the compression approach you must choose.

Reply to
Clifford Heath

Please do not top-post. Your reply belongs after, or intermixed with, the *snipped* material to which you reply.

Huffman encoding is rarely used today. Adaptive Huffman is more common. For a complete review see: "The Compression Book" by Mark Nelson and Jean-Loup Gailly, M & T Books, ISBN 1-5581-434-1.

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@maineline.net)
   Available for consulting/temporary embedded and systems.
    USE maineline address!
Reply to
CBFalconer

Hi Chuck, Thank you for your advice.

I have bought the book "The Compression Book".

Weng

Reply to
Weng Tianxiang

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.