Skip to content

[idea] train on progressive graphics file format #3

@fenn

Description

@fenn

when an LLM generates bytes or tokens, it must consider the ones that have been previously generated, and it can't go back to make changes. this means that once an image-generating model decides on the dimensions of an image, it is then committed to generate the entire image, whether or not the user really wants to wait for such a long sequence of bytes to be generated. for large images this could take a long time, and it's analogous to downloading an image over a slow phone line. this used to be a big problem on the dial-up internet (and still is in many places) where you would see the image start loading from the top and it would appear one line at a time, and if the connection was interrupted or you got impatient then you never got to see the entire image. one solution to this problem is the "progressive" downloading of files, starting with a low resolution image and gradually filling in the details at higher and higher resolutions, and there have been several attempts at file formats to support this mode of viewing.

ideally, the image format that a byte-level LLM outputs could be stopped at any arbitrary point in the sequence during generation, leaving a usable and viewable image in a valid image file format, not an incomplete partial image. you would see the entire image fill in with detail as more bytes are added. with fill in the middle, the LLM could even go back and add detail to images while waiting for the user to respond. also, the output format should be robust to byte errors - invalid bytes should not render all subsequent information in the image file unreadable - while not requiring complex forward error correction algorithms like Reed-Solomon that an LLM can't calculate in a single forward pass.

i'd like to draw your attention to one progressive format in particular, the aptly named Progressive Graphics File format. (PGF.) this compression algorithm is based on wavelets, so it has no blocky pixel artifacts at low levels of detail. it supports multiple color models and transparency, and you can zoom in on a selected region. it has equivalent or better compression to JPEG. it may not be perfect as the native file format for an image generating byte level LLM, but it has the advantage of already existing.

unfortunately, PGF never really caught on, so there isn't a lot of training data in the wild. in the next version of EvaByte, please consider converting some fraction of your image training data to this format so that an instruction tuned model could then output partial images in this format and still be useful and responsive during an interactive chat session.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions