Skip to content

Poor compression & quality for difficult-to-compress data #87

@lindstro

Description

@lindstro

I am doing some compression studies that involve difficult-to-compress (even incompressible) data. Consider the chaotic data generated by the logistic map xi+1 = 4 xi (1 - xi):

#include <cstdio>

int main()
{
  double x = 1. / 3;
  for (int i = 0; i < 256 * 256 * 256; i++) {
    fwrite(&x, sizeof(x), 1, stdout);
    x = 4 * x * (1 - x);
  }
  return 0;
}

We wouldn't expect this data to compress at all, but the inherent randomness at least suggests a predictable relationship between (RMS) error, E, and rate, R. Let σ = 1/√8 denote the standard deviation of the input data and define the accuracy gain as

α = log₂(σ / E) - R.

Then each increment in storage, R, by one bit should result in a halving of E, so that α is essentially constant. The limit behavior is slightly different as R → 0 or E → 0, but over a large range α ought to be constant.

Below is a plot of α(R) for SZ 2.1.12.3 and other compressors applied to the above data interpreted as a 3D array of size 256 × 256 × 256. Here SZ's absolute error tolerance mode was used: sz -d -3 256 256 256 -M ABS -A tolerance -i input.bin -z output.sz. The tolerance was halved for each subsequent data point, starting with tolerance = 1.

The plot suggests an odd relationship between R and E, with very poor compression observed for small tolerances. For instance, when the tolerance is in {2-13, 2-14, 2-15, 2-16}, the corresponding rate is {13.9, 15.3, 18.2, 30.8}, while we would expect R to increase by one bit in each case. Is this perhaps a bug in SZ? Similar behavior is observed for other difficult-to-compress data sets (see rballester/tthresh#7).

logistic

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions