mirror of
https://github.com/emmansun/gmsm.git
synced 2025-05-10 19:16:18 +08:00
Page:
GCM for SM4
Pages
Armv8.2 SM3和SM4
Efficient Software Implementations of ZUC
GCM for SM4
Golang ppc64x asm Reference
Golang s390x asm Reference
High‐Throughput Elliptic Curve Cryptography using AVX2 Vector Instructions
High‐assurance field inversion for curve‐based cryptography
Home
Intel CPU supports SM3 SM4
PQC: ML‐DSA
PQC: ML‐KEM
PQC: SLH‐DSA
SM2 WWMM (2)
SM2 WWMM
SM2加解密性能
SM2性能优化
SM2性能优化(续)
SM3中的FF2和GG2函数
SM3性能优化
SM4 with AESENCLAST
SM4 with GFNI
SM4性能优化
SM9实现及优化
go crypto and BoringCrypto
is my code constant time?
sbox generation
stealth addresses (隐身地址)
关于CTR模式
关于证书和TLS支持
实现Kyber所需的多项式和线性代数知识
实现ML‐DSA所需的多项式和线性代数知识
无进位乘法和GHASH
门限签名
Clone
Table of Contents
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
为sm4实现的GCM汇编代码是从AES GCM实现中摘抄的。主要为以下三个函数:
//go:noescape
func precomputeTableAsm(productTable *[256]byte, src *[16]byte)
//go:noescape
func gcmSm4Data(productTable *[256]byte, data []byte, T *[16]byte)
//go:noescape
func gcmSm4Finish(productTable *[256]byte, tagMask, T *[16]byte, pLen, dLen uint64)
- gcmSm4Finish没有改变,和gcmAesFinish一模一样;
- precomputeTableAsm和gcmAesInit的区别在于前者没有加密部分,输入参数就是加密结果;
- gcmSm4Data和gcmAesData的差别在于前者那个T参数同时作为输入输出,而后者只作为输出。
加密和GHASH结合ASM优化
主要困难:
- 底层的CTR加密数据不是block对齐的,更不是4、8blocks对齐的,所以尾加密及异或运算处理比较麻烦;
- AMD64同时支持AVX2和NON-AVX2,代码量比较大,比较复杂;
- 和GHASH的混合处理,提高性能; 计划先把下面方法转成ASM:
// counterCrypt crypts in to out using g.cipher in counter mode.
func (g *gcm) counterCrypt(out, in []byte, counter *[gcmBlockSize]byte) {
mask := make([]byte, g.cipher.blocksSize)
counters := make([]byte, g.cipher.blocksSize)
for len(in) >= g.cipher.blocksSize {
for i := 0; i < g.cipher.batchBlocks; i++ {
copy(counters[i*gcmBlockSize:(i+1)*gcmBlockSize], counter[:])
gcmInc32(counter)
}
g.cipher.EncryptBlocks(mask, counters)
xor.XorWords(out, in, mask[:])
out = out[g.cipher.blocksSize:]
in = in[g.cipher.blocksSize:]
}
if len(in) > 0 {
blocks := (len(in) + gcmBlockSize - 1) / gcmBlockSize
for i := 0; i < blocks; i++ {
copy(counters[i*gcmBlockSize:], counter[:])
gcmInc32(counter)
}
g.cipher.EncryptBlocks(mask, counters)
xor.XorBytes(out, in, mask[:blocks*gcmBlockSize])
}
}
最后再处理和GHASH混合处理。
2022年1月14日
gcmSm4Init方法,已完成AMD64和ARM64开发,为了支持golang1.15.x,不得不放弃使用VMOVQ指令。
gcmSm4Enc方法,初步完成AMD64架构非AVX(2)版本开发,正进行更多测试和优化。
2022年1月18日
gcmSm4Enc, gcmSm4Dec, 已完成AMD64架构下非AVX(2)版本及AVX(2)版本,代码有点臃肿;ARM64版本也已完成,优化的方向为矩阵行列转换。
ARM64矩阵转换
// 从高位到低位
// s0 = s0.S3, s0.S2, s0.S1, s0.S0
// s1 = s1.S3, s1.S2, s1.S1, s1.S0
// s2 = s2.S3, s2.S2, s2.S1, s2.S0
// s3 = s3.S3, s3.S2, s3.S1, s3.S0
#define transpose_4x4(s0, s1, s2, s3) \
zip1 RTMP0.4s, s0.4s, s1.4s; \ // RTMP0 = s1.S1, s0.S1, s1.S0, s0.S0
zip1 RTMP1.4s, s2.4s, s3.4s; \ // RTMP1 = s3.S1, s2.S1, s3.S0, s2.S0
zip2 RTMP2.4s, s0.4s, s1.4s; \ // RTMP2 = s1.S3, s0.S3, s1.S2, s0.S2
zip2 RTMP3.4s, s2.4s, s3.4s; \ // RTMP3 = s3.S3, s2.S3, s3.S2, s2.S2
zip1 s0.2d, RTMP0.2d, RTMP1.2d; \ // s0 = s3.S0, s2.S0, s1.S0, s0.S0
zip2 s1.2d, RTMP0.2d, RTMP1.2d; \ // s1 = s3.S1, s2.S1, s1.S1, s0.S1
zip1 s2.2d, RTMP2.2d, RTMP3.2d; \ // s2 = s3.S2, s2.S2, s1.S2, s0.S2
zip2 s3.2d, RTMP2.2d, RTMP3.2d; // s3 = s3.S3, s2.S3, s1.S3, s0.S3
#define rotate_clockwise_90(s0, s1, s2, s3) \
zip1 RTMP0.4s, s1.4s, s0.4s; \ // RTMP0 = s0.S1, s1.S1, s0.S0, s1.S0
zip2 RTMP1.4s, s1.4s, s0.4s; \ // RTMP1 = s0.S3, s1.S3, s0.S2, s1.S2
zip1 RTMP2.4s, s3.4s, s2.4s; \ // RTMP2 = s2.S1, s3.S1, s2.S0, s3.S0
zip2 RTMP3.4s, s3.4s, s2.4s; \ // RTMP3 = s2.S3, s3.S3, s2.S2, s3.S2
zip1 s0.2d, RTMP2.2d, RTMP0.2d; \ // s0 = s0.S0, s1.S0, s2.S0, s3.S0
zip2 s1.2d, RTMP2.2d, RTMP0.2d; \ // s1 = s0.S1, s1.S1, s2.S1, s3.S1
zip1 s2.2d, RTMP3.2d, RTMP1.2d; \ // s2 = s0.S2, s1.S2, s2.S2, s3.S2
zip2 s3.2d, RTMP3.2d, RTMP1.2d; // s3 = s0.S3, s1.S3, s2.S3, s3.S3
Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode