Skip to content

Releases: ermig1979/Simd

Simd v7.1.161

04 May 11:53

Choose a tag to compare

Algorithms

New features
  • SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgrToHsv.
  • SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv44pToHsl.
  • SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv44pToHsv.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetPoolingMax16b.
  • AVX-512BW optimizations of function AlphaBlending2x.
  • AVX2 optimizations of function BgrToBayer.
  • AVX2 optimizations of function BgraToBayer.
  • Base implementation, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV2.
  • NEON optimizations of function BgrToHsl.
  • NEON optimizations of function BgrToLab.
  • NEON optimizations of function GrayToY.
  • NEON optimizations of function YToGray.
  • NEON optimizations of function Yuv444pToRgbaV2.
  • NEON optimizations of function SynetTiledScale2D32f.
  • NEON optimizations of function SynetNormalizeLayerForward.
  • NEON optimizations of function SynetNormalizeLayerForwardV2.
  • NEON optimizations of function SynetNormalizeLayerForwardV3.
  • NEON optimizations of function SynetNormalizeLayerForwardV4.
  • NEON optimizations of function SynetAdd8i.
  • NEON optimizations of function SynetConvert8uTo32f.
  • NEON optimizations of function SynetDequantizeLinear.
  • NEON optimizations of function SynetQuantizeLinear.
  • NEON optimizations of class SynetQuantizedAdd.
  • NEON optimizations of function SynetQuantizedConcatLayerForward.
  • NEON optimizations of function SynetQuantizedPreluLayerForward.
  • NEON optimizations of function SynetQuantizedScaleLayerForward.
  • NEON optimizations of function SynetQuantizedShuffleLayerForward.
  • NEON optimizations of class SynetScale8i.
  • SSE4.1 optimizations of class SynetScale16b.
  • Support of SVE extension (ARM/ARM64 platform).
  • SVE optimizations of function AbsDifference.
  • SimdCpuInfoSveSize member in SimdCpuInfoType enumeration.
  • SVE optimizations of function OperationBinary8u.
  • SVE optimizations of function OperationBinary16i.
  • SVE optimizations of function AbsGradientSaturatedSum.
  • SVE optimizations of function DeinterleaveUv.
  • SVE optimizations of function InterleaveUv.
  • SVE optimizations of function BgrToRgb.
Improving
  • SSE4.1, AVX2, AVX-512BW optimizations of class SynetScale16b.
Bug fixing
  • Error in NEON optimization of function TransformImage (BGR, T0 transform).
  • Error in AVX-512BW optimizations of function SynetSoftmax16b (MSVS, Win32).
  • Possible aligned load of unaligned memory in AVX2 optimizations of function AbsDifferenceSums3 (Windows 7, x64, gcc 8.1.0).
  • Possible aligned store in file SimdExtract.h for SSE4.1, AVX2, AVX-512BW.
  • Wrong assert condition in function Simd::Uyvy422ToBgr.
  • Wrong assert condition in function Simd::Uyvy422ToYuv420p.
  • Wrong assert condition in function Simd::Yuv420pToUyvy422.
  • Error in function Base::CpuCacheSize on ARM64 platform.
  • Error in NEON optimizations of function DescrInt::Encode32f.
  • Error in SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgraToRgba (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
  • Error in SSE4.1, AVX2, NEON optimizations of function RgbaToGray (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
  • Error in SSE4.1, AVX2, NEON optimizations of function BgrToRgb (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
  • Error in SSE4.1, NEON optimizations of function Yuv444pToBgrV2 (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
  • Error in SSE4.1, NEON optimizations of function Yuv444pToBgraV2 (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
  • Error in SSE4.1, NEON optimizations of function Yuv444pToRgbV2 (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
  • Error in SSE4.1, NEON optimizations of function Yuv444pToRgbaV2 (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
  • Error in SSE4.1, AVX2, NEON optimizations of function BgraToBgr (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).
  • Error in SSE4.1, AVX2, NEON optimizations of function BgraToRgb (small width, internal call from ImageSaveToMemory, ImageLoadFromMemory).

Test framework

New features
  • Github action 'Test Python' step in build_and_test_gcc_new in cmake.yml.
  • Tests for verifying functionality of function SynetPoolingMax16b.
Bug fixing
  • Error in Python test ShiftDetectorFunctionsTest.
  • Error in Python test ShiftDetectorClassTest.
  • Error in test SquaredDifferenceSum32f.
  • Error in test NeuralAddConvolution2x2ForwardAutoTest.
  • Error in parsing of 'testThreads' command line options.
  • Error in performance report (Dupling of the same columns).

Infrastructure

New features
  • Job build_and_test_arm64 in Github actions script for CMake.
  • test.yml github action to test dev branch

Documentation

Bug fixing
  • Syntax and lexical errors in description of Python wrapper.
  • Wrong description of build process for Microsoft Visual Studio.

Simd v7.0.160

01 Apr 14:31

Choose a tag to compare

Algorithms

New features
  • Possibility to use non constant B matrix in framework SynetInnerProduct32f.
  • Function SimdSynetInnerProduct32fExternalBufferSize.
  • Parameter 'activation' to function SimdSynetInnerProduct16bInit.
  • Parameter 'params' to function SimdSynetInnerProduct16bSetParams.
  • Base implementation of class SynetGatherElements.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetNormalizeLayerForward16bV2.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetSoftmax16b.
  • Support of HVX extension (Hexagon platform).
  • HVX optimizations of function AbsDifference.
  • HVX optimizations of function AbsDifferenceSum.
  • HVX optimizations of function AbsGradientSaturatedSum.
  • HVX optimizations of function AddFeatureDifference.
  • HVX optimizations of function BgrToGray.
  • HVX optimizations of function BgrToRgb.
  • HVX optimizations of function FillBgra.
  • HVX optimizations of function FillPixel.
  • HVX optimizations of function AbsSecondDerivativeHistogram.
  • HVX optimizations of function HistogramMasked.
  • HVX optimizations of function HistogramConditional.
  • HVX optimizations of function OperationBinary8u.
  • HVX optimizations of function GetStatistic.
  • HVX optimizations of function GetRowSums.
  • HVX optimizations of function GetColSums.
  • HVX optimizations of function GetAbsDyRowSums.
  • HVX optimizations of function GetAbsDxColSums.
  • HVX optimizations of function ValueSum.
  • HVX optimizations of function SquareSum.
  • HVX optimizations of function ValueSquareSum.
  • HVX optimizations of function ValueSquareSums.
  • HVX optimizations of function CorrelationSum.
  • SSE4.1, AVX2, AVX-512BW optimizations of function BgrToHsl.
Improving
  • AMX-BF16 optimizations of class SynetMergedConvolution16bCdc.
  • AMX-BF16 optimizations of class SynetMergedConvolution16bCd.
  • AMX-BF16 optimizations of class SynetMergedConvolution16bDc.
  • AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
  • Error in SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedPreluLayerForward (possible aligned read of unaligned memory).
  • Error in SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedScaleLayerForward (possible aligned read of unaligned memory).
  • Error in SSE4.1 optimizations of class ResizerFloatBilinear (possible aligned read of unaligned memory).
  • Error in SSE4.1 optimizations of class ResizerBf16Bilinear (possible aligned read of unaligned memory).
  • Error in SSE4.1, AVX2 optimizations of class ResizerByteBilinear (possible aligned read of unaligned memory).
  • Error in SSE4.1 optimizations of class ResizerByteBilinear (possible aligned write to unaligned memory).
  • Error in SSE4.1 optimizations of class ResizerFloatBilinear (possible aligned write to unaligned memory).
  • Memory leak in function Simd::Detection::LoadStringXml.
  • Possible crash in function Simd::ImageLoadFromFile.
  • Possible crash in function Simd::Base64Decode.
  • Possible crash in AVX-512BW optimization of function TransformImageRotate270.
  • Performance bug in AVX-512BW optimization of class SynetMergedConvolution32fCdc.
  • Performance bug in AMX-BF16 optimization of class SynetMergedConvolution16bCdc.
  • Compiler error in assert conditions in function Simd::DeinterleaveBgra.
  • Compiler error in assert conditions in function Simd::DeinterleaveRgb.
  • Compiler error in assert conditions in function Simd::DeinterleaveRgba.
  • Compiler error in assert conditions in function Simd::GetObjectMoments.
  • MSVS compiler bug in Base implementation of function BgrToHsl (Release, x64).
  • MSVS compiler bug in Base implementation of function SynetQuantizedScaleLayerForward (Release, Win32).
Renaming
  • Function SynetSoftmaxLayerForward to SynetSoftmax32f.

Test framework

New features
  • Tests for verifying functionality of class SynetGatherElements.
  • Tests for verifying functionality of function SynetNormalizeLayerForward16bV2.
  • Tests for verifying functionality of function SynetSoftmax16b.
Improving
  • Add thread save state to functions Test::Rand and Test::Srand.
  • Add smoothing to function Test::CreateTestImage.
Bug fixing
  • Wrong parsing of 'testStatistics' command line option.
  • Wrong parsing of 'testRepeats' command line option.
  • Too long github action test step in msbuild.yml.
  • Too long github action test step in cmake.yml.

Documentation

Bug fixing
  • Syntax and lexical errors in project documentation.

Simd v6.2.159

03 Mar 09:15

Choose a tag to compare

Algorithms

New features
  • Base implementation, AMX-BF16 optimizations of class SynetConvolution16bNhwcSpecV2.
  • Support of SimdSynetUnaryOperation32fRound in function SynetUnaryOperation32f.
  • Support of SimdSynetUnaryOperation32fSign in function SynetUnaryOperation32f.
Bug fixing
  • Error in AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1 (kernel Convolution16bNhwcGemm_Macro32x32).
  • Error in Base implementation of class SynetQuantizedConvolutionNhwcDepthwiseV2 (multithread using of SimdSynetQuantizedConvolutionForward).
  • Error in Base implementation of class SynetQuantizedConvolutionNhwcDepthwiseV3 (multithread using of SimdSynetQuantizedConvolutionForward).
  • Error in AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1 (Inv2x2, kernel Convolution16bNhwcGemm_MacroNx32, unaligned dstH*dstW).
  • Error in AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1 (Inv2x2, kernel Convolution16bNhwcGemm_MacroNx32, unaligned dstC).
  • Error in SSE4.1, AVX2, AVX-512BW, NEON optimizations of function AbsDifference (wrong alignment checking).
  • Error in AVX-512BW optimizations of class SynetConvolution32fGemmNN (case of extra large padding).
  • Error in AVX-512BW optimizations of class ResizerNearest.
  • Error in AMX-BF16 optimizations of class SynetMergedConvolution16bCdc (batch > 1, small input size).
  • Error in functions BodyH, BodyW (file SimdSynetConvParam.h).

Simd v6.2.158

03 Feb 12:07

Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MidpointFilterSquare3x3.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MidpointFilterSquare5x5.
  • Base implementation of class SynetConvolution16bNhwcSpecV2.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MinFilterSquare3x3.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MinFilterSquare5x5.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MaxFilterSquare3x3.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function MaxFilterSquare5x5.
Improving
  • AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1.

Test framework

New features
  • Tests for verifying functionality of function MidpointFilterSquare3x3.
  • Tests for verifying functionality of function MidpointFilterSquare5x5.
  • Tests for verifying functionality of function MinFilterSquare3x3.
  • Tests for verifying functionality of function MinFilterSquare5x5.
  • Tests for verifying functionality of function MaxFilterSquare3x3.
  • Tests for verifying functionality of function MaxFilterSquare5x5.

Simd v6.2.157

02 Jan 14:23

Choose a tag to compare

Algorithms

New features
  • Function Simd::Resize for Simd::Frame.
  • Base implementation of function DrawLine.
  • Base implementation of function DrawRectangle.
  • Base implementation of function FontInit.
  • Base implementation of function FontResize.
  • Base implementation of function FontHeight.
  • Base implementation of function FontMeasure.
  • Base implementation of function FontDraw.
Improving
  • Base implementation, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1.
  • AVX-512BW optimizations of function SynetPoolingMax32f (case of SynetPoolingMax32f2DNhwcSolid2x2).
  • AVX-512BW optimizations of function SynetMergedConvolution32f (InputConvolution1x1).
  • AVX-512BW optimizations of function SynetMergedConvolution32f (DepthwiseConvolution_k3p1d1s1w6).
  • Simd::DrawLine uses SimdDrawLine instead its own implementation.
  • Simd::DrawRectangle uses SimdDrawRectangle instead its own implementation.
  • Simd::Font uses functions SimdFontInit, SimdFontResize, SimdFontHeight, SimdFontMeasure, SimdFontDraw instead of its own implementation.

Python wrapper

New features
  • Function Simd.ResizeFrame.
  • Function Simd.ResizedFrame.
  • Yuv444p member to Simd.FrameFormat enumeration.
  • Method Simd.ImageFrame.Save.
  • Method Simd.ImageFrame.Load.
  • Function Simd.Lib.StretchGray2x2.
  • Function Simd.StretchGray2x2.
  • Function Simd.Lib.BgraToYuv444p.
  • Function Simd.Lib.Yuv444pToRgb.
  • Function Simd.Lib.ReduceGray2x2.
  • Function Simd.ReduceGray2x2.
  • Function Simd.Lib.BgrToYuv444p.
  • Function Simd.Lib.BgraToYuv444p.
  • Function Simd.Lib.Yuv444pToBgr.
  • Function Simd.Lib.Yuv444pToRgba.
  • Function Simd.Lib.DrawLine.
  • Method Simd.Image.DrawLine.
  • Function Simd.Lib.DrawRectangle.
  • Method Simd.Image.DrawRectangle.
  • Function Simd.Lib.FontInit.
  • Function Simd.Lib.FontResize.
  • Function Simd.Lib.FontHeight.
  • Function Simd.Lib.FontMeasure.
  • Function Simd.Lib.FontDraw.
  • Class Simd.TextFont.
  • Method Simd.Image.DrawFilledRectangle.
Improving
  • Support of Simd.FrameFormat.Yuv444p in method Simd.ImageFrame.Recreate.
  • Support of Simd.FrameFormat.Yuv444p in method Simd.ImageFrame.Convert.
Bug fixing
  • Error in method Simd.Frame.Convert.
Renaming
  • Function Simd.Resize to Simd.ResizeImage.
  • Function Simd.Resized to Simd.ResizedImage.

Test framework

New features
  • Tests for verifying functionality of function DrawLine.
  • Tests for verifying functionality of function DrawRectangle.
Bug fixing
  • Error in method Test::PerformanceMeasurerStorage::Clear.

Simd v6.2.156

01 Dec 06:38

Choose a tag to compare

Algorithms

New features
  • Enumeration SimdShiftDetectorTextureType (С API of Simd::ShiftDetector).
  • Enumeration SimdShiftDetectorDifferenceType (С API of Simd::ShiftDetector).
  • Base implementation of function SimdShiftDetectorInitBuffers (С API of Simd::ShiftDetector).
  • Base implementation of function SimdShiftDetectorSetBackground (С API of Simd::ShiftDetector).
  • Base implementation of function SimdShiftDetectorEstimate (С API of Simd::ShiftDetector).
  • Base implementation of function SimdShiftDetectorGetShift (С API of Simd::ShiftDetector).
  • Base implementation, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemmV1.
Improve
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VBMI, NEON optimizations of function DeinterleaveUv (some outputs can be NULL).
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VBMI, NEON optimizations of function DeinterleaveBgr (some outputs can be NULL).
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VBMI, NEON optimizations of function DeinterleaveBgra (some outputs can be NULL).
  • C++ wrapper Simd::DeinterleaveUv (support of empty outputs).
  • C++ wrapper Simd::DeinterleaveBgr (support of empty outputs).
  • C++ wrapper Simd::DeinterleaveBgra (support of empty outputs).
  • C++ wrapper Simd::DeinterleaveRgb (support of empty outputs).
  • C++ wrapper Simd::DeinterleaveRgba (support of empty outputs).
  • Parallelization in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class ResizerNearest.
Removing
  • C++ wrapper Simd::DeinterleaveBgra with 4 arguments.
  • C++ wrapper Simd::DeinterleaveRgba with 4 arguments.
Renaming
  • Class SynetConvolution16bNhwcGemm to SynetConvolution16bNhwcGemmV0.

Test framework

Improve
  • Tests for verifying functionality of function DeinterleaveUv (some outputs can be NULL).
  • Tests for verifying functionality of function DeinterleaveBgr (some outputs can be NULL).
  • Tests for verifying functionality of function DeinterleaveBgra (some outputs can be NULL).

Python wrapper

New features
  • CurrentFrequency member to Simd.CpuInfo enumeration.
  • Bf16 member to Simd.ResizeChannel enumeration.
  • Function Simd.ShiftBilinear.
  • Enumeration Simd.ShiftDetectorTexture.
  • Enumeration Simd.ShiftDetectorDifference.
  • Function Simd.Lib.ShiftDetectorInitBuffers.
  • Function Simd.Lib.ShiftDetectorSetBackground.
  • Function Simd.Lib.ShiftDetectorEstimate.
  • Function Simd.Lib.ShiftDetectorGetShift.
  • Function Simd.Lib.ShiftDetectorGetRefinedShift.
  • Function Simd.Lib.ShiftDetectorGetStability.
  • Function Simd.Lib.ShiftDetectorGetCorrelation.
  • Class Simd.ShiftingDetector.
Improve
  • Function Simd.Lib.SysInfo.

Simd v6.2.155

10 Nov 08:24

Choose a tag to compare

Algorithms

New features
  • SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedScaleLayerForward.
  • SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedPreluLayerForward.
  • Arbitrary activation function in Base implementation of class SynetQuantizedConvolutionGemm.
  • Arbitrary activation function in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm.
  • Arbitrary activation function in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0.
  • Arbitrary activation function in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV2.
  • Arbitrary activation function in Base implementation, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV3.
Improve
  • AMX-BF16 optimizations of class SynetConvolution16bNhwcGemm (case of small srcC).
Bug fixing
  • Performance bug in AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm.
  • Error in SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedInnerProductGemmNN.
  • Error in SSE4.1 optimizations of class SynetQuantizedConvolutionNhwcSpecV0.
  • Error in Base implementation of class SynetQuantizedConvolutionNhwcSpecV0.
  • Error in Base implementation of class SynetQuantizedConvolutionNhwcGemm.

Simd v6.2.154

01 Oct 08:30

Choose a tag to compare

Algorithms

New features
  • SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedMergedConvolutionCdc.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedMergedConvolutionCd.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedMergedConvolutionDc.
  • Base implementation of function SynetQuantizedScaleLayerForward.
  • Base implementation of function SynetQuantizedPreluLayerForward.
Improve
  • AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV3.
  • Performance of AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm (case of batch > 1).
  • Performance of AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0 (case of batch > 1).
  • Performance of AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm (case of small srcC).
  • Performance of AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0 (case of small srcC).
Bug fixing
  • Error in AVX-512BW optimizations of function SynetQuantizedConcatLayerForward.
  • Error in function Base::CpuModel (Windows Server 2025).
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetQuantizedAddUniform.
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function QuantizedMergedConvolutionAddInputToOutput.
  • Error in AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm (case of batch > 1).

Test framework

New features
  • Tests for verifying functionality of function SynetQuantizedScaleLayerForward.
  • Tests for verifying functionality of function SynetQuantizedPreluLayerForward.

Infrastructure

Bug fixing
  • Fix bug in step 'Host Properties' in Github actions script for MSBuild.
  • Fix bug in step 'Host Properties' in Github actions script for CMake.
Removing
  • Support of Microsoft Visual Studio 2019.

Simd v6.2.153

01 Sep 12:05

Choose a tag to compare

Algorithms

New features
  • AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV0.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV1.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV2.
  • Base implementation, AVX-512VNNI optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV3.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedShuffleLayerForward.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizedConcatLayerForward.
  • Base implementation of class SynetQuantizedMergedConvolutionRef.
  • Base implementation of class SynetQuantizedMergedConvolutionCdc.
Improve
  • SSE4.1, AVX2 optimizations of class SynetQuantizedConvolutionNhwcDepthwiseV0.
Bug fixing
  • Error in Base implementation of class SynetQuantizedConvolutionNhwcGemm.

Test framework

New features
  • Tests for verifying functionality of function SynetQuantizedShuffleLayerForward.
  • Tests for verifying functionality of function SynetQuantizedConcatLayerForward.
  • Tests for verifying functionality of function SynetQuantizedMergedConvolutionForward.

Infrastructure

Improve
  • Performance of Test step in Github actions script for MSBuild.

Simd v6.2.152

01 Aug 08:24

Choose a tag to compare

Algorithms

New features
  • AVX2, AVX-512BW optimizations of class SynetQuantizedAddUniform.
  • Base implementation of class SynetQuantizedInnerProductRef.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedInnerProductGemmNN.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0.
  • Base implementation, SSE4.1, AVX2 optimizations of class SynetQuantizedConvolutionNhwcDepthwise.
Improve
  • AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm.
Bug fixing
  • Error in NEON optimization of function Float32ToBFloat16.
  • Error in Base implementation of class SynetQuantizedConvolutionNhwcGemm.
  • Error in Base implementation of class SynetQuantizedConvolutionGemm.

Test framework

New features
  • Tests for verifying functionality of SynetQuantizedInnerProduct framework.