Releases: ermig1979/Simd
Releases · ermig1979/Simd
Simd v7.1.161
Algorithms
New features
- SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgrToHsv.
- SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv44pToHsl.
- SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv44pToHsv.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetPoolingMax16b.
- AVX-512BW optimizations of function AlphaBlending2x.
- AVX2 optimizations of function BgrToBayer.
- AVX2 optimizations of function BgraToBayer.
- Base implementation, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV2.
- NEON optimizations of function BgrToHsl.
- NEON optimizations of function BgrToLab.
- NEON optimizations of function GrayToY.
- NEON optimizations of function YToGray.
- NEON optimizations of function Yuv444pToRgbaV2.
- NEON optimizations of function SynetTiledScale2D32f.
- NEON optimizations of function SynetNormalizeLayerForward.
- NEON optimizations of function SynetNormalizeLayerForwardV2.
- NEON optimizations of function SynetNormalizeLayerForwardV3.
- NEON optimizations of function SynetNormalizeLayerForwardV4.
- NEON optimizations of function SynetAdd8i.
- NEON optimizations of function SynetConvert8uTo32f.
- NEON optimizations of function SynetDequantizeLinear.
- NEON optimizations of function SynetQuantizeLinear.
- NEON optimizations of class SynetQuantizedAdd.
- NEON optimizations of function SynetQuantizedConcatLayerForward.
- NEON optimizations of function SynetQuantizedPreluLayerForward.
- NEON optimizations of function SynetQuantizedScaleLayerForward.
- NEON optimizations of function SynetQuantizedShuffleLayerForward.
- NEON optimizations of class SynetScale8i.
- SSE4.1 optimizations of class SynetScale16b.
- Support of SVE extension (ARM/ARM64 platform).
- SVE optimizations of function AbsDifference.
- SimdCpuInfoSveSize member in SimdCpuInfoType enumeration.
- SVE optimizations of function OperationBinary8u.
- SVE optimizations of function OperationBinary16i.
- SVE optimizations of function AbsGradientSaturatedSum.
- SVE optimizations of function DeinterleaveUv.
- SVE optimizations of function InterleaveUv.
- SVE optimizations of function BgrToRgb.
Improving
- SSE4.1, AVX2, AVX-512BW optimizations of class SynetScale16b.
Bug fixing
- Error in NEON optimization of function TransformImage (BGR, T0 transform).
- Error in AVX-512BW optimizations of function SynetSoftmax16b (MSVS, Win32).
- Possible aligned load of unaligned memory in AVX2 optimizations of function AbsDifferenceSums3 (Windows 7, x64, gcc 8.1.0).
- Possible aligned store in file SimdExtract.h for SSE4.1, AVX2, AVX-512BW.
- Wrong assert condition in function Simd::Uyvy422ToBgr.
- Wrong assert condition in function Simd::Uyvy422ToYuv420p.
- Wrong assert condition in function Simd::Yuv420pToUyvy422.
- Error in function Base::CpuCacheSize on ARM64 platform.
- Error in NEON optimizations of function DescrInt::Encode32f.
- Error in SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgraToRgba (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
- Error in SSE4.1, AVX2, NEON optimizations of function RgbaToGray (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
- Error in SSE4.1, AVX2, NEON optimizations of function BgrToRgb (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
- Error in SSE4.1, NEON optimizations of function Yuv444pToBgrV2 (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
- Error in SSE4.1, NEON optimizations of function Yuv444pToBgraV2 (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
- Error in SSE4.1, NEON optimizations of function Yuv444pToRgbV2 (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
- Error in SSE4.1, NEON optimizations of function Yuv444pToRgbaV2 (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
- Error in SSE4.1, AVX2, NEON optimizations of function BgraToBgr (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
- Error in SSE4.1, AVX2, NEON optimizations of function BgraToRgb (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
Test framework
New features
- Github action 'Test Python' step in build_and_test_gcc_new in cmake.yml.
- Tests for verifying functionality of function SynetPoolingMax16b.
Bug fixing
- Error in Python test ShiftDetectorFunctionsTest.
- Error in Python test ShiftDetectorClassTest.
- Error in test SquaredDifferenceSum32f.
- Error in test NeuralAddConvolution2x2ForwardAutoTest.
- Error in parsing of 'testThreads' command line options.
- Error in performance report (Dupling of the same columns).
Infrastructure
New features
- Job build_and_test_arm64 in Github actions script for CMake.
- test.yml github action to test dev branch
Documentation
Bug fixing
- Syntax and lexical errors in description of Python wrapper.
- Wrong description of build process for Microsoft Visual Studio.
Simd v7.0.160
Algorithms
New features
- Possibility to use non constant B matrix in framework SynetInnerProduct32f.
- Function SimdSynetInnerProduct32fExternalBufferSize.
- Parameter 'activation' to function SimdSynetInnerProduct16bInit.
- Parameter 'params' to function SimdSynetInnerProduct16bSetParams.
- Base implementation of class SynetGatherElements.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetNormalizeLayerForward16bV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetSoftmax16b.
- Support of HVX extension (Hexagon platform).
- HVX optimizations of function AbsDifference.
- HVX optimizations of function AbsDifferenceSum.
- HVX optimizations of function AbsGradientSaturatedSum.
- HVX optimizations of function AddFeatureDifference.
- HVX optimizations of function BgrToGray.
- HVX optimizations of function BgrToRgb.
- HVX optimizations of function FillBgra.
- HVX optimizations of function FillPixel.
- HVX optimizations of function AbsSecondDerivativeHistogram.
- HVX optimizations of function HistogramMasked.
- HVX optimizations of function HistogramConditional.
- HVX optimizations of function OperationBinary8u.
- HVX optimizations of function GetStatistic.
- HVX optimizations of function GetRowSums.
- HVX optimizations of function GetColSums.
- HVX optimizations of function GetAbsDyRowSums.
- HVX optimizations of function GetAbsDxColSums.
- HVX optimizations of function ValueSum.
- HVX optimizations of function SquareSum.
- HVX optimizations of function ValueSquareSum.
- HVX optimizations of function ValueSquareSums.
- HVX optimizations of function CorrelationSum.
- SSE4.1, AVX2, AVX-512BW optimizations of function BgrToHsl.
Improving
- AMX-BF16 optimizations of class SynetMergedConvolution16bCdc.
- AMX-BF16 optimizations of class SynetMergedConvolution16bCd.
- AMX-BF16 optimizations of class SynetMergedConvolution16bDc.
- AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
- Error in SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedPreluLayerForward (possible aligned read of unaligned memory).
- Error in SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedScaleLayerForward (possible aligned read of unaligned memory).
- Error in SSE4.1 optimizations of class ResizerFloatBilinear (possible aligned read of unaligned memory).
- Error in SSE4.1 optimizations of class ResizerBf16Bilinear (possible aligned read of unaligned memory).
- Error in SSE4.1, AVX2 optimizations of class ResizerByteBilinear (possible aligned read of unaligned memory).
- Error in SSE4.1 optimizations of class ResizerByteBilinear (possible aligned write to unaligned memory).
- Error in SSE4.1 optimizations of class ResizerFloatBilinear (possible aligned write to unaligned memory).
- Memory leak in function Simd::Detection::LoadStringXml.
- Possible crash in function Simd::ImageLoadFromFile.
- Possible crash in function Simd::Base64Decode.
- Possible crash in AVX-512BW optimization of function TransformImageRotate270.
- Performance bug in AVX-512BW optimization of class SynetMergedConvolution32fCdc.
- Performance bug in AMX-BF16 optimization of class SynetMergedConvolution16bCdc.
- Compiler error in assert conditions in function Simd::DeinterleaveBgra.
- Compiler error in assert conditions in function Simd::DeinterleaveRgb.
- Compiler error in assert conditions in function Simd::DeinterleaveRgba.
- Compiler error in assert conditions in function Simd::GetObjectMoments.
- MSVS compiler bug in Base implementation of function BgrToHsl (Release, x64).
- MSVS compiler bug in Base implementation of function SynetQuantizedScaleLayerForward (Release, Win32).
Renaming
- Function SynetSoftmaxLayerForward to SynetSoftmax32f.
Test framework
New features
- Tests for verifying functionality of class SynetGatherElements.
- Tests for verifying functionality of function SynetNormalizeLayerForward16bV2.
- Tests for verifying functionality of function SynetSoftmax16b.
Improving
- Add thread save state to functions Test::Rand and Test::Srand.
- Add smoothing to function Test::CreateTestImage.
Bug fixing
- Wrong parsing of 'testStatistics' command line option.
- Wrong parsing of 'testRepeats' command line option.
- Too long github action test step in msbuild.yml.
- Too long github action test step in cmake.yml.
Documentation
Bug fixing
- Syntax and lexical errors in project documentation.
Simd v6.2.159
Algorithms
New features
- Base implementation, AMX-BF16 optimizations of class SynetConvolution16bNhwcSpecV2.
- Support of SimdSynetUnaryOperation32fRound in function SynetUnaryOperation32f.
- Support of SimdSynetUnaryOperation32fSign in function SynetUnaryOperation32f.
Bug fixing
- Error in AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1 (kernel Convolution16bNhwcGemm_Macro32x32).
- Error in Base implementation of class SynetQuantizedConvolutionNhwcDepthwiseV2 (multithread using of SimdSynetQuantizedConvolutionForward).
- Error in Base implementation of class SynetQuantizedConvolutionNhwcDepthwiseV3 (multithread using of SimdSynetQuantizedConvolutionForward).
- Error in AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1 (Inv2x2, kernel Convolution16bNhwcGemm_MacroNx32, unaligned dstH*dstW).
- Error in AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1 (Inv2x2, kernel Convolution16bNhwcGemm_MacroNx32, unaligned dstC).
- Error in SSE4.1, AVX2, AVX-512BW, NEON optimizations of function AbsDifference (wrong alignment checking).
- Error in AVX-512BW optimizations of class SynetConvolution32fGemmNN (case of extra large padding).
- Error in AVX-512BW optimizations of class ResizerNearest.
- Error in AMX-BF16 optimizations of class SynetMergedConvolution16bCdc (batch > 1, small input size).
- Error in functions BodyH, BodyW (file SimdSynetConvParam.h).
Simd v6.2.158
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MidpointFilterSquare3x3.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MidpointFilterSquare5x5.
- Base implementation of class SynetConvolution16bNhwcSpecV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MinFilterSquare3x3.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MinFilterSquare5x5.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MaxFilterSquare3x3.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MaxFilterSquare5x5.
Improving
- AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1.
Test framework
New features
- Tests for verifying functionality of function MidpointFilterSquare3x3.
- Tests for verifying functionality of function MidpointFilterSquare5x5.
- Tests for verifying functionality of function MinFilterSquare3x3.
- Tests for verifying functionality of function MinFilterSquare5x5.
- Tests for verifying functionality of function MaxFilterSquare3x3.
- Tests for verifying functionality of function MaxFilterSquare5x5.
Simd v6.2.157
Algorithms
New features
- Function Simd::Resize for Simd::Frame.
- Base implementation of function DrawLine.
- Base implementation of function DrawRectangle.
- Base implementation of function FontInit.
- Base implementation of function FontResize.
- Base implementation of function FontHeight.
- Base implementation of function FontMeasure.
- Base implementation of function FontDraw.
Improving
- Base implementation, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1.
- AVX-512BW optimizations of function SynetPoolingMax32f (case of SynetPoolingMax32f2DNhwcSolid2x2).
- AVX-512BW optimizations of function SynetMergedConvolution32f (InputConvolution1x1).
- AVX-512BW optimizations of function SynetMergedConvolution32f (DepthwiseConvolution_k3p1d1s1w6).
- Simd::DrawLine uses SimdDrawLine instead its own implementation.
- Simd::DrawRectangle uses SimdDrawRectangle instead its own implementation.
- Simd::Font uses functions SimdFontInit, SimdFontResize, SimdFontHeight, SimdFontMeasure, SimdFontDraw instead of its own implementation.
Python wrapper
New features
- Function Simd.ResizeFrame.
- Function Simd.ResizedFrame.
- Yuv444p member to Simd.FrameFormat enumeration.
- Method Simd.ImageFrame.Save.
- Method Simd.ImageFrame.Load.
- Function Simd.Lib.StretchGray2x2.
- Function Simd.StretchGray2x2.
- Function Simd.Lib.BgraToYuv444p.
- Function Simd.Lib.Yuv444pToRgb.
- Function Simd.Lib.ReduceGray2x2.
- Function Simd.ReduceGray2x2.
- Function Simd.Lib.BgrToYuv444p.
- Function Simd.Lib.BgraToYuv444p.
- Function Simd.Lib.Yuv444pToBgr.
- Function Simd.Lib.Yuv444pToRgba.
- Function Simd.Lib.DrawLine.
- Method Simd.Image.DrawLine.
- Function Simd.Lib.DrawRectangle.
- Method Simd.Image.DrawRectangle.
- Function Simd.Lib.FontInit.
- Function Simd.Lib.FontResize.
- Function Simd.Lib.FontHeight.
- Function Simd.Lib.FontMeasure.
- Function Simd.Lib.FontDraw.
- Class Simd.TextFont.
- Method Simd.Image.DrawFilledRectangle.
Improving
- Support of Simd.FrameFormat.Yuv444p in method Simd.ImageFrame.Recreate.
- Support of Simd.FrameFormat.Yuv444p in method Simd.ImageFrame.Convert.
Bug fixing
- Error in method Simd.Frame.Convert.
Renaming
- Function Simd.Resize to Simd.ResizeImage.
- Function Simd.Resized to Simd.ResizedImage.
Test framework
New features
- Tests for verifying functionality of function DrawLine.
- Tests for verifying functionality of function DrawRectangle.
Bug fixing
- Error in method Test::PerformanceMeasurerStorage::Clear.
Simd v6.2.156
Algorithms
New features
- Enumeration SimdShiftDetectorTextureType (С API of Simd::ShiftDetector).
- Enumeration SimdShiftDetectorDifferenceType (С API of Simd::ShiftDetector).
- Base implementation of function SimdShiftDetectorInitBuffers (С API of Simd::ShiftDetector).
- Base implementation of function SimdShiftDetectorSetBackground (С API of Simd::ShiftDetector).
- Base implementation of function SimdShiftDetectorEstimate (С API of Simd::ShiftDetector).
- Base implementation of function SimdShiftDetectorGetShift (С API of Simd::ShiftDetector).
- Base implementation, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1.
Improve
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VBMI, NEON optimizations of function DeinterleaveUv (some outputs can be NULL).
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VBMI, NEON optimizations of function DeinterleaveBgr (some outputs can be NULL).
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VBMI, NEON optimizations of function DeinterleaveBgra (some outputs can be NULL).
- C++ wrapper Simd::DeinterleaveUv (support of empty outputs).
- C++ wrapper Simd::DeinterleaveBgr (support of empty outputs).
- C++ wrapper Simd::DeinterleaveBgra (support of empty outputs).
- C++ wrapper Simd::DeinterleaveRgb (support of empty outputs).
- C++ wrapper Simd::DeinterleaveRgba (support of empty outputs).
- Parallelization in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class ResizerNearest.
Removing
- C++ wrapper Simd::DeinterleaveBgra with 4 arguments.
- C++ wrapper Simd::DeinterleaveRgba with 4 arguments.
Renaming
- Class SynetConvolution16bNhwcGemm to SynetConvolution16bNhwcGemmV0.
Test framework
Improve
- Tests for verifying functionality of function DeinterleaveUv (some outputs can be NULL).
- Tests for verifying functionality of function DeinterleaveBgr (some outputs can be NULL).
- Tests for verifying functionality of function DeinterleaveBgra (some outputs can be NULL).
Python wrapper
New features
- CurrentFrequency member to Simd.CpuInfo enumeration.
- Bf16 member to Simd.ResizeChannel enumeration.
- Function Simd.ShiftBilinear.
- Enumeration Simd.ShiftDetectorTexture.
- Enumeration Simd.ShiftDetectorDifference.
- Function Simd.Lib.ShiftDetectorInitBuffers.
- Function Simd.Lib.ShiftDetectorSetBackground.
- Function Simd.Lib.ShiftDetectorEstimate.
- Function Simd.Lib.ShiftDetectorGetShift.
- Function Simd.Lib.ShiftDetectorGetRefinedShift.
- Function Simd.Lib.ShiftDetectorGetStability.
- Function Simd.Lib.ShiftDetectorGetCorrelation.
- Class Simd.ShiftingDetector.
Improve
- Function Simd.Lib.SysInfo.
Simd v6.2.155
Algorithms
New features
- SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedScaleLayerForward.
- SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedPreluLayerForward.
- Arbitrary activation function in Base implementation of class SynetQuantizedConvolutionGemm.
- Arbitrary activation function in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm.
- Arbitrary activation function in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0.
- Arbitrary activation function in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV2.
- Arbitrary activation function in Base implementation, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV3.
Improve
- AMX-BF16 optimizations of class SynetConvolution16bNhwcGemm (case of small srcC).
Bug fixing
- Performance bug in AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm.
- Error in SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedInnerProductGemmNN.
- Error in SSE4.1 optimizations of class SynetQuantizedConvolutionNhwcSpecV0.
- Error in Base implementation of class SynetQuantizedConvolutionNhwcSpecV0.
- Error in Base implementation of class SynetQuantizedConvolutionNhwcGemm.
Simd v6.2.154
Algorithms
New features
- SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedMergedConvolutionCdc.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedMergedConvolutionCd.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedMergedConvolutionDc.
- Base implementation of function SynetQuantizedScaleLayerForward.
- Base implementation of function SynetQuantizedPreluLayerForward.
Improve
- AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV3.
- Performance of AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm (case of batch > 1).
- Performance of AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0 (case of batch > 1).
- Performance of AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm (case of small srcC).
- Performance of AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0 (case of small srcC).
Bug fixing
- Error in AVX-512BW optimizations of function SynetQuantizedConcatLayerForward.
- Error in function Base::CpuModel (Windows Server 2025).
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetQuantizedAddUniform.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function QuantizedMergedConvolutionAddInputToOutput.
- Error in AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm (case of batch > 1).
Test framework
New features
- Tests for verifying functionality of function SynetQuantizedScaleLayerForward.
- Tests for verifying functionality of function SynetQuantizedPreluLayerForward.
Infrastructure
Bug fixing
- Fix bug in step 'Host Properties' in Github actions script for MSBuild.
- Fix bug in step 'Host Properties' in Github actions script for CMake.
Removing
- Support of Microsoft Visual Studio 2019.
Simd v6.2.153
Algorithms
New features
- AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV0.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV1.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV2.
- Base implementation, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV3.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedShuffleLayerForward.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedConcatLayerForward.
- Base implementation of class SynetQuantizedMergedConvolutionRef.
- Base implementation of class SynetQuantizedMergedConvolutionCdc.
Improve
- SSE4.1, AVX2 optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV0.
Bug fixing
- Error in Base implementation of class SynetQuantizedConvolutionNhwcGemm.
Test framework
New features
- Tests for verifying functionality of function SynetQuantizedShuffleLayerForward.
- Tests for verifying functionality of function SynetQuantizedConcatLayerForward.
- Tests for verifying functionality of function SynetQuantizedMergedConvolutionForward.
Infrastructure
Improve
- Performance of Test step in Github actions script for MSBuild.
Simd v6.2.152
Algorithms
New features
- AVX2, AVX-512BW optimizations of class SynetQuantizedAddUniform.
- Base implementation of class SynetQuantizedInnerProductRef.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedInnerProductGemmNN.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0.
- Base implementation, SSE4.1, AVX2 optimizations of class SynetQuantizedConvolutionNhwcDepthwise.
Improve
- AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm.
Bug fixing
- Error in NEON optimization of function Float32ToBFloat16.
- Error in Base implementation of class SynetQuantizedConvolutionNhwcGemm.
- Error in Base implementation of class SynetQuantizedConvolutionGemm.
Test framework
New features
- Tests for verifying functionality of SynetQuantizedInnerProduct framework.