Skip to content

Compress data files further #56

@fingolfin

Description

@fingolfin

The data files bundled in primgrp take up 39 MB. The new data files for degrees up to 8191 take up ~1GB while .gz compressed.

I think we can reduce this by quite a bit, perhaps even by an order of magnitude or two, by changing how we store the groups: what takes up a lot of space is to write down permutations of degree, say, 5000 -- they are long.

But most of these groups admit faithful permutation representations of much smaller degree. So one could instead store generators for such a small degree representation. To recover the perm rep we actually want, we just need to encode a point stabilizer, by describing generators for it as words in the small degree generator.

I'll illustrate this with an example:

gap> G:=PrimitiveGroup(3600,2);
<permutation group of size 1296000 with 2 generators>
gap> Length(GeneratorsOfGroup(G));
2
gap> Length(String(G));
34503

So storing this group by printing its generators (as we currently do) it takes ~34 kilobytes.

gap> iso:=SmallerDegreePermutationRepresentation(G);
<action epimorphism>
gap> NrMovedPoints(Image(iso));
30
gap> H:=Image(iso);;
gap> Length(String(H));
192

So the group has a degree 30 perm rep which can be described in just 192 bytes.
But we also need a point stabilizer:

gap> S:=Image(iso,Stabilizer(G,1));;
gap> Length(String(S));
277

So that's another 277 bytes, for a total of 469, or 1.4% of the original size; a reducation by a factor 73. But we can do better by expressing S via words:

gap> hom:=EpimorphismFromFreeGroup(H:names:=["x","y"]);;
gap> PreImagesRepresentative(hom, S.1);
x^-4*y^-1*x*y^3*x^-1*y^-1*x^4*y^-2*(x*y)^2*y*x*y^-2*x*y*x^-1*y^-1
gap> PreImagesRepresentative(hom, S.2);
x^-2*(x^-1*y)^2*x*y^-1*x^-1*y^-4*x*y*x^3*y^-2*(x*y)^2*y*x*y^-2*x^-1*y
gap> PreImagesRepresentative(hom, S.3);
x^5*y*x^-1*y*x*y^-1*x^-1*y^-4*x*y*x^3*y^-2*(x*y)^2*y^3*x^-1*y^-1*x^-1
gap> Length(String(List(GeneratorsOfGroup(S), x -> PreImagesRepresentative(hom, x))));
211

But those words can be encoded much better... We probably have a function for that already int he meantime, but it's also not hard to code one, e.g. this:

EncodeWord:=function(w)
  local genexp, i, res, c, e;
  genexp := ExtRepOfObj(w);
  res:="";
  for i in [1, 3 .. Length(genexp)-1] do
    c := CharInt(IntChar('a') + genexp[i] - 1);
    e := genexp[i+1];
    if e < 0 then
      c := UppercaseChar(c);
      e := -e;
    fi;
    Append(res, ListWithIdenticalEntries(e, c));
  od;
  return res;
end;

And then:

gap> EncodeWord(PreImagesRepresentative(hom, S.1));
"AAAABabbbABaaaaBBababbaBBabAB"
gap> Length(String(List(GeneratorsOfGroup(S), x -> EncodeWord(PreImagesRepresentative(hom, x)))));
105

And with that we save a factor $34503 / (192+105) \approx 116$.

That said, such dramatic savings are certainly not always possible. E.g. for PrimitiveGroup(4095, 1) I only get down to a rep of degree 2016, which is still too big.

gap> G:=PrimitiveGroup(4095, 1);
<permutation group of size 208114637736580743168000 with 2 generators>
gap> Length(String(GeneratorsOfGroup(G)));
38923

# for the next one, it takes a while and I had to repeat it to get even to 2016
gap> iso:=SmallerDegreePermutationRepresentation(G);
<action epimorphism>
gap> H:=Image(iso);
gap> NrMovedPoints(H);
2016
gap> Length(String(GeneratorsOfGroup(H)));
18039

So that "only" saves a factor ~2

But this group is actually well-known, it is SO(13,2):

gap> IsSimpleGroup(G);
true
gap> IsomorphismTypeInfoFiniteSimpleGroup(G);
rec( name := "B(6,2) = O(13,2) ~ C(6,2) = S(12,2)", parameter := [ 6, 2 ],
  series := "B", shortname := "S12(2)" )

This is prime candidate for group recognition! Alas for now we can just brute force (slowly) an isomorphism:

gap> iso:=IsomorphismGroups(SO(13,2), H);; time;
7927

With that we can find words for the generators of a point stabilizer. The group itself can be stored as SO(13,2) so with just 8 bytes...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions