Updated SM4性能优化 (markdown)

Sun Yimin 2021-03-22 09:38:14 +08:00
parent 7177a07d38
commit 96a9696ad0

@ -12,6 +12,7 @@ Go语言的对称加密实现分离了加密模式和Block级别的加密
# 未优化之前
CPU: i5-9500
goos: windows
goarch: amd64
pkg: github.com/emmansun/gmsm/sm4
@ -34,6 +35,7 @@ Go语言的对称加密实现分离了加密模式和Block级别的加密
# Block级别使用AES-NI
CPU: i5-9500
goos: windows
goarch: amd64
pkg: github.com/emmansun/gmsm/sm4_test
@ -59,16 +61,19 @@ Go语言的对称加密实现分离了加密模式和Block级别的加密
# CBC模式解密并行优化
没有写一个单独的asm函数偷懒。
CPU: i5-9500
BenchmarkSM4CBCDecrypt1K-6 292531 4103 ns/op 249.56 MB/s 0 B/op 0 allocs/op
# CTR模式并行优化
CPU: i5-9500
BenchmarkSM4CTR1K-6 292522 4121 ns/op 247.30 MB/s 0 B/op 0 allocs/op
BenchmarkSM4CTR8K-6 36483 33203 ns/op 246.57 MB/s 0 B/op 0 allocs/op
# GCM模式优化
这个先做加密并行优化GHASH部分优化得慢慢做。
CPU: i5-9500
BenchmarkSM4GCMSeal1K-6 153688 7904 ns/op 129.56 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMOpen1K-6 149971 7896 ns/op 129.69 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMSign1K-6 315027 3753 ns/op 272.85 MB/s 0 B/op 0 allocs/op
@ -77,6 +82,7 @@ Go语言的对称加密实现分离了加密模式和Block级别的加密
# GCM模式GHASH ASM优化
asm部分改造自aes的实现优化结果很惊人
CPU: i5-9500
BenchmarkSM4GCMSeal1K-6 273218 4491 ns/op 228.00 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMOpen1K-6 250770 4516 ns/op 226.73 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMSign1K-6 3321482 359 ns/op 2853.54 MB/s 0 B/op 0 allocs/op
@ -127,3 +133,35 @@ Golang没提供这两种模式的优化接口可能这两种模式不怎么
BenchmarkSM4GCMOpen8K-8 26265 53390 ns/op 153.44 MB/s 0 B/op 0 allocs/op
PASS
ok github.com/emmansun/gmsm/sm4_test 47.862s
CPU: i5-9500
goos: windows
goarch: amd64
BenchmarkAESCBCEncrypt1K-6 1000000 1006 ns/op 1017.63 MB/s 0 B/op 0 allocs/op
BenchmarkSM4CBCEncrypt1K-6 87804 13595 ns/op 75.32 MB/s 0 B/op 0 allocs/op
BenchmarkAESCBCDecrypt1K-6 1240671 964 ns/op 1061.74 MB/s 0 B/op 0 allocs/op
BenchmarkSM4CBCDecrypt1K-6 300069 4037 ns/op 253.68 MB/s 0 B/op 0 allocs/op
BenchmarkAESCFBEncrypt1K-6 876500 1425 ns/op 714.92 MB/s 0 B/op 0 allocs/op
BenchmarkSM4CFBEncrypt1K-6 86581 13843 ns/op 73.61 MB/s 0 B/op 0 allocs/op
BenchmarkAESCFBDecrypt1K-6 878245 1338 ns/op 761.56 MB/s 0 B/op 0 allocs/op
BenchmarkSM4CFBDecrypt1K-6 86564 13823 ns/op 73.72 MB/s 0 B/op 0 allocs/op
BenchmarkAESCFBDecrypt8K-6 112794 10522 ns/op 778.09 MB/s 0 B/op 0 allocs/op
BenchmarkSM4CFBDecrypt8K-6 10000 110776 ns/op 73.91 MB/s 0 B/op 0 allocs/op
BenchmarkAESOFB1K-6 1343679 892 ns/op 1142.41 MB/s 0 B/op 0 allocs/op
BenchmarkSM4OFB1K-6 89094 13409 ns/op 76.00 MB/s 0 B/op 0 allocs/op
BenchmarkAESCTR1K-6 1000000 1036 ns/op 984.00 MB/s 0 B/op 0 allocs/op
BenchmarkSM4CTR1K-6 292957 4098 ns/op 248.66 MB/s 0 B/op 0 allocs/op
BenchmarkAESCTR8K-6 149863 8200 ns/op 998.46 MB/s 0 B/op 0 allocs/op
BenchmarkSM4CTR8K-6 36595 32699 ns/op 250.38 MB/s 0 B/op 0 allocs/op
BenchmarkAESGCMSeal1K-6 4802740 249 ns/op 4113.39 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMSeal1K-6 267092 4385 ns/op 233.52 MB/s 0 B/op 0 allocs/op
BenchmarkAESGCMOpen1K-6 5665056 212 ns/op 4836.19 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMOpen1K-6 273200 4380 ns/op 233.80 MB/s 0 B/op 0 allocs/op
BenchmarkAESGCMSign1K-6 9603033 124 ns/op 8258.43 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMSign1K-6 3725722 322 ns/op 3183.11 MB/s 0 B/op 0 allocs/op
BenchmarkAESGCMSign8K-6 1570182 764 ns/op 10723.55 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMSign8K-6 1244473 964 ns/op 8498.90 MB/s 0 B/op 0 allocs/op
BenchmarkAESGCMSeal8K-6 768501 1619 ns/op 5058.99 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMSeal8K-6 36162 33197 ns/op 246.77 MB/s 0 B/op 0 allocs/op
BenchmarkAESGCMOpen8K-6 944479 1325 ns/op 6183.50 MB/s 0 B/op 0 allocs/op
BenchmarkSM4GCMOpen8K-6 36162 33197 ns/op 246.77 MB/s 0 B/op 0 allocs/op