Skip to content

Reuse of quantized intermediate results #4

@Jzjerry

Description

@Jzjerry

Similar to #3 .

Currently, the forward pass is generated in a way that for each quantized layer, the activation will be quantized before entering the MatMul, no matter it was quantized before or not.

This means some layers are quantizing the same activation repeatedly, which is not necessary.

We need to check the reuse possibility during the codegen as well.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions