Updated GCM for SM4 (markdown)

Sun Yimin 2022-07-22 16:42:00 +08:00
parent 2e7463ece5
commit 971759d84c

@ -58,4 +58,35 @@ gcmSm4Enc方法初步完成AMD64架构非AVX(2)版本开发,正进行更多
**2022年1月18日** **2022年1月18日**
gcmSm4Enc, gcmSm4Dec, 已完成AMD64架构下非AVX(2)版本及AVX(2)版本代码有点臃肿ARM64版本也已完成优化的方向为矩阵行列转换。 gcmSm4Enc, gcmSm4Dec, 已完成AMD64架构下非AVX(2)版本及AVX(2)版本代码有点臃肿ARM64版本也已完成优化的方向为矩阵行列转换。
**ARM64矩阵转换**
* [Zip vectors](https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ZIP1--Zip-vectors--primary--)
* [Add SM4 ARMv8/AArch64 assembly implementation](https://lists.gnupg.org/pipermail/gcrypt-devel/2022-February/005257.html)
```asm
// 从高位到低位
// s0 = s0.S3, s0.S2, s0.S1, s0.S0
// s1 = s1.S3, s1.S2, s1.S1, s1.S0
// s2 = s2.S3, s2.S2, s2.S1, s2.S0
// s3 = s3.S3, s3.S2, s3.S1, s3.S0
#define transpose_4x4(s0, s1, s2, s3) \
zip1 RTMP0.4s, s0.4s, s1.4s; \ // RTMP0 = s1.S1, s0.S1, s1.S0, s0.S0
zip1 RTMP1.4s, s2.4s, s3.4s; \ // RTMP1 = s3.S1, s2.S1, s3.S0, s2.S0
zip2 RTMP2.4s, s0.4s, s1.4s; \ // RTMP2 = s1.S3, s0.S3, s1.S2, s0.S2
zip2 RTMP3.4s, s2.4s, s3.4s; \ // RTMP3 = s3.S3, s2.S3, s3.S2, s2.S2
zip1 s0.2d, RTMP0.2d, RTMP1.2d; \ // s0 = s3.S0, s2.S0, s1.S0, s0.S0
zip2 s1.2d, RTMP0.2d, RTMP1.2d; \ // s1 = s3.S1, s2.S1, s1.S1, s0.S1
zip1 s2.2d, RTMP2.2d, RTMP3.2d; \ // s2 = s3.S2, s2.S2, s1.S2, s0.S2
zip2 s3.2d, RTMP2.2d, RTMP3.2d; // s3 = s3.S3, s2.S3, s1.S3, s0.S3
#define rotate_clockwise_90(s0, s1, s2, s3) \
zip1 RTMP0.4s, s1.4s, s0.4s; \ // RTMP0 = s0.S1, s1.S1, s0.S0, s1.S0
zip2 RTMP1.4s, s1.4s, s0.4s; \ // RTMP1 = s0.S3, s1.S3, s0.S2, s1.S2
zip1 RTMP2.4s, s3.4s, s2.4s; \ // RTMP2 = s2.S1, s3.S1, s2.S0, s3.S0
zip2 RTMP3.4s, s3.4s, s2.4s; \ // RTMP3 = s2.S3, s3.S3, s2.S2, s3.S2
zip1 s0.2d, RTMP2.2d, RTMP0.2d; \ // s0 = s0.S0, s1.S0, s2.S0, s3.S0
zip2 s1.2d, RTMP2.2d, RTMP0.2d; \ // s1 = s0.S1, s1.S1, s2.S1, s3.S1
zip1 s2.2d, RTMP3.2d, RTMP1.2d; \ // s2 = s0.S2, s1.S2, s2.S2, s3.S2
zip2 s3.2d, RTMP3.2d, RTMP1.2d; // s3 = s0.S3, s1.S3, s2.S3, s3.S3
```
但VZIP1/VZIP2好像golang asm还没支持。