gzip lies #23

astralchan · 2021-05-15T01:21:46Z

astralchan
May 15, 2021
Maintainer

Topics of Discussion

In the interest of contributers, this discussion is for discussing everything relating to parsing / extracting the gzip format. Anything relating to the parsing of the gzip format and / or handling deflate data streams can be discussed here. Including strategies to handle Huffman tables / compression along with LZ 77 compression both used in the deflate algorithm.

CheetahPixie · 2021-05-15T01:23:04Z

CheetahPixie
May 15, 2021
Collaborator

And of course to outline quirks, weirdness and other stuff we encounter along the way. .tar has already proven complicated enough for its own lies discussion later, actually.

0 replies

astralchan · 2021-05-17T07:04:09Z

astralchan
May 17, 2021
Maintainer Author

I've been trying to read through RFC 1951 as to how the deflate compression method uses Huffman coding to compress the data blocks. Here's an example on page 7:

         / \
        0   1
       /      \
     / \        B
    0   1
  /       \
A        / \
        0   1
       /      \
      D        C

This produces the following Huffman codes for each symbol following the path each branch to the symbol of the binary tree:

Specifically within a deflate data block, the Huffman algorithm follows two rules:

All codes of a given bit length have lexicographically
consecutive values, in the same order as the symbols
they represent;
Shorter codes lexicographically precede longer codes.

The RFC then goes on to say that, applying these rules, they can recode that example to the following Huffman codes:

I guess I'm failing to see how the first example breaks the two rules? The alphabetical order in the example is, I assume, ABCD for symbols A, B, C, and D. In the first example, C and D have Huffman code lengths of 3 and are presented in order. The shorter code for B is after the longer code for A. Any thoughts?

0 replies

astralchan · 2021-05-17T09:09:18Z

astralchan
May 17, 2021
Maintainer Author

I came across two videos while looking for info on Huffman encoding and these two videos helped greatly:
https://www.youtube.com/watch?v=JsTptu56GM8
https://www.youtube.com/watch?v=iiGZ947Tcck
I know that the implementation of Huffman coding in deflate is a little special, but understanding the basic Huffman coding process feels nice and that it will help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gzip lies #23

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

gzip lies #23

Uh oh!

astralchan May 15, 2021 Maintainer

Topics of Discussion

Replies: 3 comments

Uh oh!

CheetahPixie May 15, 2021 Collaborator

Uh oh!

astralchan May 17, 2021 Maintainer Author

Uh oh!

astralchan May 17, 2021 Maintainer Author

astralchan
May 15, 2021
Maintainer

CheetahPixie
May 15, 2021
Collaborator

astralchan
May 17, 2021
Maintainer Author

astralchan
May 17, 2021
Maintainer Author