data compression algorithm

In the vein of suggestions made by Martin Brown, David Eather, et al a while back, this is idea for a lossy compressor/decompressor designed to compress 10 bits of greyscale image data down to 8 in a semi-logarithmic way.

To avoid taking a square root on the output side the lossy "decompressor", such as it is, is two iterations of a modified Newton's method to compute the inverse.

Since the compression function is nearly linear when its input is small-valued using the 8 bit compressed value as the starting guess for the inverse helps prevent Newton from hunting over across the discontinuity in the input function, on the wrong side of the Y axis

#include

//family of curves of the form x = int( 0.5+k*y*(A + By)/(A + Cy))

constexpr auto compress(uint16_t y) -> uint8_t { //tweak "k" to taste to avoid overflow return 0.5 + 0.997 * y * (4096 + y) / (4096 + 32 * y); }

auto decompress(uint8_t x) -> uint16_t { auto f_y = [x](uint16_t y) { return compress(y) - x; };

auto f_y_prime = [](uint16_t y) { return (16351.2 + 7.984 * y + 0.0311875 * y * y) / ((128 + y) * (128 + y)); };

const uint16_t y_init = x;

auto y = y_init - f_y(y_init) / f_y_prime(y_init); y = y - f_y(y) / f_y_prime(0.5*(y + y_init)); auto y_prime = y - f_y(y) / f_y_prime(y); return y - f_y(y) / f_y_prime(0.5*(y + y_prime)); }

Reply to
bitrex
Loading thread data ...

Lookup tables would be easy.

We do FPGA and uP based functions with lookups, and sometimes a bit of interpolation. Like for DDS sine waves.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics
Reply to
John Larkin

Plenty of compute cycles available on both sides, it's the link that's the bottleneck, so the decode doesn't have to be especially fast. I do avoid pulling in a floating point square root library function this way. Can tweak the scaling pretty easy without making a new table

Reply to
bitrex

The New C++ can do a lot of work at compile time, and can even generate lookup-tables on the fly at compile-time. It's easier to do with a C++14-compliant compiler than C++11, it can be done with the latter but it's obtuse and messy. Most (open source at least) embedded target compilers (avr-gcc, arm-gcc) don't fully support C++14, yet.

Reply to
bitrex

Mutual admiration society?

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics
Reply to
John Larkin

I do make real-world $$$ doing real-world engineering.

there's no evidence to support "John Doe" does anything. why is he here, again?

Reply to
bitrex

You really ought to do it in scaled integer arithmetic! Or lookup table.

0.5 converted to a fraction over A+Cy gives

x = int( (A/2 + y*(A+C/2 + By))/(A' + Cy)

k can be absorbed into A' ~ 4108

+1

I had always assumed that on the decode side the mapping for the 256 values transmitted would be converted back using a lookup table computed locally or sent by the data source as a header on the image info.

Lookup tables are just so much easier. Also remember that you want to return the mid value of the band of source data that mapped to a particular 8 bit index or you will introduce bias.

--
Regards, 
Martin Brown
Reply to
Martin Brown

Actually, several people who post to sed do that too. Five maybe.

Post some links. We like to see what people actually do.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics
Reply to
John Larkin

Hardware floating point is liberating too. Scaled integers take too much thinking.

I wrote one math library that used signed 32.32 format, for a CPU without floats. That was fast and handles most any real problems.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics
Reply to
John Larkin

This is what I make and sell. Look up Multiservice Interface module.

formatting link

Sales are sporadic, but not insignificant. The hardware design is basic, but condensed with the module size only 4.5 x 0.85 inches. With the important stuff in an FPGA, a lot of functionality can be packed into a small module.

The only tricky part of the hardware design was getting the maximum voltage swing from a single +12 volt supply. I managed to get an 8 volt swing on each of the differential outputs driving a 50 ohm load.

Oh yeah, the FPGA also contains uLaw and Alaw compression.

--

  Rick C. 

  - Get 1,000 miles of free Supercharging 
  - Tesla referral code - https://ts.la/richard11209
Reply to
Rick C

I kinda just did, I mostly do contract design work for the entertainment/music/visual and performing arts sector. this project is for controlling custom LED arrays for the "performing arts." Good clients though they tend to have either limited funds or a huge excess of funds, depending. I like the latter type the best from a financial perspective.

"Well we all just wanna be big rock stars, live in hilltop houses, driving fifteen cars, the girls come easy and the drugs come cheap, and we'll all stay skinny cuz we just won't eat..."

Reply to
bitrex

I'm also a multi-instrumentalist (guitar, piano, drums when I'm feeling brave) and worked in the recording industry/music biz for a while as a younger man and had a small taste of that lifestyle but it's a younger man's game but it's way easier selling pick-axes than trying to mine gold yourself as you get older.

Reply to
bitrex

The decoding routine I posted runs in about 60 clock cycles even on an 8 bitter with no hardware FPU it's plenty fast as compared to a serial baud rate. the benefit of the hassle of a lookup table are marginal. the actual target device has a FPU.

I use lookup tables for the same use cases you do, like direct digital synthesis.

There are C++ libraries for fixed point, using templates, so that all the necessary operations for shifting and conversion get hardcoded at compile time under-the-hood without having to think too hard.

Reply to
bitrex

The encoding side is running on a desktop-class PC don't need any lookup table there.

The "decoder" algorithm runs so fast as compared to the serial link speed bottleneck that a lookup table is hardly worth the trouble, the uP spends most of its time sitting around doing nothing anyway.

Reply to
bitrex

If you have plenty of time on the encode side you might want to consider another more costly form of data adaptive 12 bit to 8 bit encoding which is based on greedy histogram equalisation. It requires an array to count the frequency of every pixel value in the image and then multiple sweeps of that to determine a histogram equalisation with ever increasing bin widths to allocate the bins.

Initially you start with binsize=1 and Nbins= 256

target = Npixels/Nbins

It works very well on images with a lot of baseline values and a few features. You have to transmit a lookup table with the byte values.

It is very good for looking for systematic errors on flat fields.

BTW You haven't said if the 8 bit comms was able to keep up.

--
Regards, 
Martin Brown
Reply to
Martin Brown

Yes, for reasonably-sized amounts of greyscale video data the serial link works extremely well. my test case was sendiing 256 pixels of 10 bit greyscale data over serial @ 57600 baud, 10 bits packed to 8 as described, and then writing out to a small LCD display on the fly.

It's written out line-by-line with COBS encoding and a two-byte Fletcher16 checksum appended to each line of data beforehand, with a "vertical sync" byte sent at the start of each frame, so the uP can sync up to the datastream and knows when to start filling the frame buffer.

There are two "threads" on the uP, one that handles writing lines into frame buffer 1, and one that handles writing out a full frame buffer to the display from frame buffer 2. when the input frame buffer is full and the output is empty the pointers are swapped. The data-reception thread is given priority to momentarily pause the write-out to the display while it grabs incoming data

The checksum-per-line system was necessary because at least with the hardware I have, at 57600 baud, maybe 5-8% of the packets show up with burst errors that make them unusable and have to be discarded. Trying to write out a whole frame of 256 bytes at once with a 4 byte checksum for the whole frame almost guaranteed there'd be a bungled bit somewhere and the checksum would be incorrect, and never got a correct frame.

With the system as I have it now, no lookup table just the code I posted, I can get about 20-25 fps of 256 pixel decoded 10 bit greyscale image data received, decoded, and put onto the LCD display on an 8 bit

16 MHz Arduino-type board no problem at all, running for several days hooked up to a PC encoding the data no stability issues at all.

The client's project is running on a 72 MHz 32 bit ARM, so I think the bottleneck for higher resolution will primarily be how fast the serial link can be driven without too many errors.

It was about the simplest system for stable transmission of this amount of data over a serial link I could come up with without going the full monte and writing a "real" mpeg-ish image compression/decompression algorithm for each frame of data, with proper ECC to deal with the (almost guaranteed) errors like Huffman coding or something

Reply to
bitrex

In summary, yeah works great man, got paid and client very happy :)

Reply to
bitrex

His true love is himself.

--

John Larkin Highland Technology, Inc picosecond timing precision measurement

jlarkin att highlandtechnology dott com

formatting link

Reply to
John Larkin

The image is only 8x8, pretty small!

--
 Thanks, 
    - Win
Reply to
Winfield Hill

16x16 but yeah, without some kind of true predictive coding compression and decompression scheme don't expect miracles over a 57600 or 115200 serial link. Remember watching video clips using Realplayer over a 56k modem?

Fortunately the application doesn't require anything near even standard def. KISS

Reply to
bitrex

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.