diff --git a/GCM-for-SM4.md b/GCM-for-SM4.md index 25c36bb..c9ab87c 100644 --- a/GCM-for-SM4.md +++ b/GCM-for-SM4.md @@ -58,4 +58,35 @@ gcmSm4Enc方法,初步完成AMD64架构非AVX(2)版本开发,正进行更多 **2022年1月18日** -gcmSm4Enc, gcmSm4Dec, 已完成AMD64架构下非AVX(2)版本及AVX(2)版本,代码有点臃肿;ARM64版本也已完成,优化的方向为矩阵行列转换。 \ No newline at end of file +gcmSm4Enc, gcmSm4Dec, 已完成AMD64架构下非AVX(2)版本及AVX(2)版本,代码有点臃肿;ARM64版本也已完成,优化的方向为矩阵行列转换。 + +**ARM64矩阵转换** +* [Zip vectors](https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ZIP1--Zip-vectors--primary--) +* [Add SM4 ARMv8/AArch64 assembly implementation](https://lists.gnupg.org/pipermail/gcrypt-devel/2022-February/005257.html) +```asm +// 从高位到低位 +// s0 = s0.S3, s0.S2, s0.S1, s0.S0 +// s1 = s1.S3, s1.S2, s1.S1, s1.S0 +// s2 = s2.S3, s2.S2, s2.S1, s2.S0 +// s3 = s3.S3, s3.S2, s3.S1, s3.S0 +#define transpose_4x4(s0, s1, s2, s3) \ + zip1 RTMP0.4s, s0.4s, s1.4s; \ // RTMP0 = s1.S1, s0.S1, s1.S0, s0.S0 + zip1 RTMP1.4s, s2.4s, s3.4s; \ // RTMP1 = s3.S1, s2.S1, s3.S0, s2.S0 + zip2 RTMP2.4s, s0.4s, s1.4s; \ // RTMP2 = s1.S3, s0.S3, s1.S2, s0.S2 + zip2 RTMP3.4s, s2.4s, s3.4s; \ // RTMP3 = s3.S3, s2.S3, s3.S2, s2.S2 + zip1 s0.2d, RTMP0.2d, RTMP1.2d; \ // s0 = s3.S0, s2.S0, s1.S0, s0.S0 + zip2 s1.2d, RTMP0.2d, RTMP1.2d; \ // s1 = s3.S1, s2.S1, s1.S1, s0.S1 + zip1 s2.2d, RTMP2.2d, RTMP3.2d; \ // s2 = s3.S2, s2.S2, s1.S2, s0.S2 + zip2 s3.2d, RTMP2.2d, RTMP3.2d; // s3 = s3.S3, s2.S3, s1.S3, s0.S3 + +#define rotate_clockwise_90(s0, s1, s2, s3) \ + zip1 RTMP0.4s, s1.4s, s0.4s; \ // RTMP0 = s0.S1, s1.S1, s0.S0, s1.S0 + zip2 RTMP1.4s, s1.4s, s0.4s; \ // RTMP1 = s0.S3, s1.S3, s0.S2, s1.S2 + zip1 RTMP2.4s, s3.4s, s2.4s; \ // RTMP2 = s2.S1, s3.S1, s2.S0, s3.S0 + zip2 RTMP3.4s, s3.4s, s2.4s; \ // RTMP3 = s2.S3, s3.S3, s2.S2, s3.S2 + zip1 s0.2d, RTMP2.2d, RTMP0.2d; \ // s0 = s0.S0, s1.S0, s2.S0, s3.S0 + zip2 s1.2d, RTMP2.2d, RTMP0.2d; \ // s1 = s0.S1, s1.S1, s2.S1, s3.S1 + zip1 s2.2d, RTMP3.2d, RTMP1.2d; \ // s2 = s0.S2, s1.S2, s2.S2, s3.S2 + zip2 s3.2d, RTMP3.2d, RTMP1.2d; // s3 = s0.S3, s1.S3, s2.S3, s3.S3 +``` +但VZIP1/VZIP2好像golang asm还没支持。 \ No newline at end of file