Implementing the LZO decompression algorithm

Bring your beer, leave with a working implementation of the LZO decompression algorithm.

Perhaps you want to use it on your BornHack badge to cram more stuff into its tiny flash, showcase the superiority of your ( favoritest || fastest || dependently-typed-est) programming language, or merely go to Mars on your own terms.

In any case it's a fun little exercise with a somewhat usable end result.

I recently had to implement it in order to read packets sent to Robur's OpenVPN reimplementation from servers that insist on using it.

What is LZO

LZO is a sliding-window compression algorithm based on - well, Markus Oberhumer's take on Lempel-Ziv '77.

If that sounds a tad academical to you, have no fear - the algorithm is simple; it consists of two instructions:

  • copy X bytes from the input string, on the right of this instruction
  • copy X bytes from offset Y in the output buffer, looping each time we reach the end

There are eight ways to encode these single-byte instructions, some of them requiring additional operands to be read from the input string. Everything can be implemented with addition, subtraction, multiplication, bit shifts, and bit masks. No fancy math required.

Unfortunately it is pretty hard to find a description of how that works in practice, and most people end up using either the official C implementation or a directly translated, unreadable port, leading to gems like this (source):

There is no version of LZO in pure Java. The obvious solution is to take the C source code, 
and feed it to the Java compiler, modifying the Java compiler as necessary to make it compile.

We can do better, and we can do so in an afternoon.

What should you bring:

  • A working tool chain for whatever programming language you want to use
  • A skeleton project directory that contains the following:
    • decompress(compressed_string) function that returns the decompressed string
    • decompress_hex(compressed_string_in_hexadecimal) function that decodes the input before calling decompress() and returns the result in hexadecimal as well
    • read_file(filename) function that calls decompress() on the file contents and outputs it either on the terminal or to a file
  • (not required) Bonus points awarded for basic familiarity with the Test Anything Protocol (TAP) tooling for your language and a TAP producer for use with decompress_hex() so that we can do interoperability testing between our implementations

What will be provided:

  • A presentation about / walkthrough of how LZO decompression works
    • Slides for the presentation provided in A4-shaped compact dead trees.
  • Magic constants in pseudo-code, and my OCaml implementation for reference
  • Test vectors for your unit tests, both in binary files and as hex-encoded TAP documents
  • Q&A and encouraging words of comfort throughout the workshop

Will we be implementing a compression algorithm too?

If we have time! It is very easy to write a compressor that can compress repeated strings of a single character, so that we can definitely do; more advanced strategies will be left to the imagination of the participants. While the decompression algorithm is set in stone, compression depends on a strategy for achieving compression using chains of the aforementioned instructions and operands, so there are many ways to go about it.


Metadata

To be recorded?: No

URLs for Implementing the LZO decompression algorithm

No URLs found.


Instances

  • Tuesday Aug. 13 13:00 - 16:00 at Workshop Room 1

Host(s):

Joe