fast universal compression scheme and its implementation in VHDL

Jens Mander · 2005-06-10T06:24:00+00:00

Hi folks,my name is Jens and I am student of the Technical University Berlin. Throughmy course of study in microelectronics VHDL design is becoming my favoritehobby. My other interests are signal processing and compression in general.Lately I purchased an FPGA Evalution board second-hand (guess where?) and Iam now starting my first "private" implementations. Just to give you a shortintro... ;-)I am interested in implementing compression algorithms using VHDL on anFPGA. I want to build a data transmission system that compresses portions ofthe incoming data (not the whole data but "frames" of like 800 bytes)on-the-fly. In my search for a fast (i.e. real-time capable at a "desired"data rate of - let's say - 300 MHZ?) "universal" compression scheme I cameacross the following stepping stones:- is there any free example code for compression algorithms available inVHDL to get an overview and a first impression of implementation complexity?- what would you think are the most promising algorithms for my purpose(i.e. when statistics and semantics of the input data are unknown), first ofall I thought of delta encoding, sorted RLE, LZ, ....?- as the input data is unknown the álgorithm must be lossless, reducingredundancy (if possible), not irrelevancy. what are the theoretical limitsof "universal" compression, not emphazizing one particular statisticalproperty (like similar by values in succession)?- what is meant by the keyword "systolic implementations" and "pipeling" inthat particular context? I came across that very often lately- what if my code gains different compression ratios for consecutive dataportions? surely a FIFO can decouple input and output rate but eventuallythe FIFO will underflow?Thanks for you help + support in advance, any comments, hints and help isappreciated!Bye JensP.S.: I'm looking for the standard works "Sayood, Khalid: Introduction todata compression, Academic Press, 199x or 200x" and/or ". Salomon: DataCompression, Springer-Verlag, New York,...

A

Axel Schwenke vor 21 Jahren

...

Dein Score ist gerade ins Bodenlose gefallen.

XL

Vote

A

Arie de Muynck vor 21 Jahren

"Jens Mander" ...

AFAIK compression on such small blocks is often inefficient. The rest strongly depends on the type of data.

Try to find a good and simple compression for the bitstreams used to program FPGA's. Then publish a link to your thesis and the algorithm here please.

Arie de Muynck

Vote

D

Dr Engelbert Buxbaum vor 21 Jahren

Für Huffman-Codierung mit festen Tabellen ist das unmittelbar einsichtig. Gilt das aber auch für adaptive Huffman-Kodierung? Die könnte sich (zumindest bei hinreichend großen Datenmengen wenn sich der Datentyp hinreichend langsam ändert) recht gut auf die Daten einstellen. Und Platz für Codierungstabellen wird ja keiner verbraucht.

Dieser Beweis scheint mir eine Lücke zu haben: Falls die Ausgabe gerade so groß ist wie die Eingabe.

Vote

A

Axel Schwenke vor 21 Jahren

Da bekommt man ein Optimierungsproblem. Je kleiner die Blöcke sind, desto eher kann man auf Ungleichverteilungen in den Daten hoffen, desto größer ist aber auch der Codierungs-Overhead. Ein Symbol mit rechnerisch 1.7 Bit Informationsgehalt kann man praktisch nicht mit weniger als 2 Bit codieren. Deswegen betrachtet man besser längere Symbole, was in der Konsequenz zu arithmetischer Codierung (z.B. ARJ) führt.

Eine ziemlich starke Einschränkung für einen universellen Algorithmus. Praktisch führt eine Vergrößerung der Blöcke meist zur Nivellierung der Häufigkeiten der Symbole. IIRC sind typische Blockgrößen für Huffmann mit Byte-Symbolen so 4-8KB.

Klar. Das mußte ja kommen :-)

Darauf habe ich zwei Antworten:

Ein Kompressionsverfahren mit Kompressionsfaktor 1 ist a) trivial: es ist unter dem Namen 'cp' bzw. 'copy' implementiert Insofern ist ein Beweis für die Nichtexistenz *dieses* Verfahrens von vornherein zum Scheitern verurteilt ;-) b) nicht wirklich ein Kompressionsverfahren, also per Definition gar nicht in der Klasse der betrachteten Verfahren

In der Praxis verwenden alle Kompressionsverfahren *zusätzliche* Daten (Signatur, "Magic"), um den komprimierten Datenstrom überhaupt erst mal als solchen erkennbar zu machen. Selbst der cleverste Algorithmus müßte wenigstens 1 Bit (komprimiert/unkomprimiert) pro Block verwenden. 'cp' kommt nur deswegen ohne dieses Bit aus, weil das Wissen "die Daten sind nicht komprimiert" implizit ist.

XL

Vote

T

T. Beiermann vor 21 Jahren

s=2E

Mir scheint dass Du keine Ahnung von denen in der Graphentheorie, bzw. Kombinatorik =FCblichen Beweissf=FChrungen hast? Dort ist n=E4mlich die Verlustlose Datenkompression angesiedelt. Das "Pigeon-hole principle" ist dort ein durchaus =FCblicher Ansatz.

Ausf=FChrlicher gibt es das ganze in der comp.compression FAQ

Theorem: No program can compress without loss *all* files of size >=3D N bits, for any given integer N >=3D 0.

Proof: Assume that the program can compress without loss all files of size

bits. Compress with this program all the 2^N files which have exactly N bits. All compressed files have at most N-1 bits, so there are at most (2^N)-1 different compressed files [2^(N-1) files of size N-1,

2^(N-2) of size N-2, and so on, down to 1 file of size 0]. So at least two different input files must compress to the same output file. Hence the compression program cannot be lossless.

The proof is called the "counting argument". It uses the so-called pigeon-hole principle: you can't put 16 pigeons into 15 holes without using one of the holes twice.

Much stronger results about the number of incompressible files can be obtained, but the proofs are a little more complex. (The MINC page

formatting link

uses one file of strictly negative size to obtain 2^N instead of (2^N)-1 distinct files of size

Vote

C

colin vor 21 Jahren

Through

general.

I

short

of

complexity?

of

in

interesting project, it depends what your emphasis is, ie if its useful to have very high savings when its posible even if it doesnt hapen very often or wether its just necesary to make moderate savings as much of the time as possible. you might make very big savings on certain types of data, but the greater extent to wich u can recognise reduntant patterns etc increases overhead so that it becomes worse for data that is more random (or even already compressed).

if you have fixed diferent rate fifos you have to evelauate what needs to happen if or when your data is uncompresable.

the obvious thing to my mind is things like reducing strings of all zero (or al ones) bits or bytes, or alternate such bytes as seem to apear a lot in wave files etc, even strings of any identical bytes etc, if you take this further you end up with the usual huffman encoding. (wich many of the other schemes are based on).

it depends how much you can do in hardware, but you could try and do many different such simple proceses in parallel on the data and see wich ends up with most compresion. this is probably the key to taking advantage of the hardware aproach ?

many file storage systems use compresion to reduce blocks of data (i think ntfs uses 16k blocks). I was once interested in high level compresion, even going as far as finding identical files and eliminating the redundancy.

Colin =^.^=

Vote

fast universal compression scheme and its implementation in VHDL

Join the Discussion

Didn't find your answer?