Similar to #3 .
Currently, the forward pass is generated in a way that for each quantized layer, the activation will be quantized before entering the MatMul, no matter it was quantized before or not.
This means some layers are quantizing the same activation repeatedly, which is not necessary.
We need to check the reuse possibility during the codegen as well.
Similar to #3 .
Currently, the forward pass is generated in a way that for each quantized layer, the activation will be quantized before entering the MatMul, no matter it was quantized before or not.
This means some layers are quantizing the same activation repeatedly, which is not necessary.
We need to check the reuse possibility during the codegen as well.