ggml.ai can quantize a model to int4/8, and can seed up the inference of a model.
ggml.ai can quantize a model to int4/8, and can seed up the inference of a model.