Reverse-engineering an LZSS compression routine (on a Hitachi H8)

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
[Xposted to comp.arch.embedded - hopefully there are a few H8 asm gurus
there who can help]

Hi,
   I'm currently trying to reverse-engineer the LZSS decompression
engine used in the Cybiko PDA to compress firmware updates (and small
ASM and C programs) that are sent over a serial link. I've disassembled
the bootloader and found the decompress() routine (in Hitachi H8
assembler), but I haven't managed to work out what I need to do to write
a C program to decompress the images.
   From the disassembly, I guessed that the code used is a variant of
Haruhiko Okumura's LZSS.C. I've also found the value of THRESHOLD, but I
don't know what the values of F and N (related to ring buffer sizing) are.
   All I know about the output file is that it should be 1288 bytes in
size. Using { THRESHOLD=2, N10%24, F18% } produces a file of that size,
but my disassembler reports that the decompressed data is not valid code.

Here's the disassembly:
Quoted text here. Click to load it

Quoted text here. Click to load it
Output pointer -> ER0
Quoted text here. Click to load it
pointer -> ER1
Quoted text here. Click to load it
pointer -> var_C
Quoted text here. Click to load it
decompress+86j
Quoted text here. Click to load it
on R6
Quoted text here. Click to load it
branch
Quoted text here. Click to load it
(counter)
Quoted text here. Click to load it
decoded everything
Quoted text here. Click to load it
input pointer
Quoted text here. Click to load it
ER5)
Quoted text here. Click to load it
pointer = ER5
Quoted text here. Click to load it
decompress+2Cj
Quoted text here. Click to load it
---------------------------------------------------------------------------
Quoted text here. Click to load it
decompress+4Ej
Quoted text here. Click to load it
next data byte (R2 = i)
Quoted text here. Click to load it
pointer
Quoted text here. Click to load it
[counter]
Quoted text here. Click to load it
decompress done
Quoted text here. Click to load it
pointer
Quoted text here. Click to load it
0xF00)
Quoted text here. Click to load it
THRESHOLD
Quoted text here. Click to load it
decompress+10Aj
Quoted text here. Click to load it
---------------------------------------------------------------------------
Quoted text here. Click to load it
decompress+32j
Quoted text here. Click to load it

And a hex dump of the input data:
Quoted text here. Click to load it

According to the header, this file should be 0x0508 bytes in size when
expanded.

I can upload a binary image of this data (and a copy of the full
disassembly) if it would help. Unfortunately I don't have any output
data - just the input to the compression algorithm and the length. The
first four bytes of the file should be 0x1234ABCD; I'm not sure about
the rest of the file.

Thanks.
Phil.
snipped-for-privacy@despammed.com  (valid address)
http://www.philpem.me.uk /

Re: Reverse-engineering an LZSS compression routine (on a Hitachi H8)
Quoted text here. Click to load it

I think you are attacking it in the wrong manner.  Decompression
is much easier than compression, you don't have to worry about
forming trees of phrases and revising them, etc.  I suggest you
start by getting "The Data Compression Book", by Mark Nelson and
Jean-Loup Gailly, M&T Books, ISBN 1-55851-434-1 (paperback, don't
know about hardback) and read up on the data format.  Then the
first few bytes of the compressed code should give you clues about
where to go next.  Your disassembly probably doesn't matter, the
data does.

This assumes the compressed data is not deliberately obscured, by
such things as encoding it with a pseudo random generator or
such.  If so the disassembly comes back into play.

You are lucky it is not LZ78 or LZW compression, which requires
intimate knowledge of the compressor to decompress.

answered in c.a.e

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline