diff --git a/SM9实现及优化.md b/SM9实现及优化.md index fb1bc1f..6091934 100644 --- a/SM9实现及优化.md +++ b/SM9实现及优化.md @@ -41,6 +41,37 @@ Go语言相对简单,但是为了简单,编译器做了很多额外的操作 ## 应用SIMD复制值 也就是Set操作的汇编实现,同时也尽量减少Set操作(这个“优化”导致了实现的复杂性、影响了代码的可维护性,可能不值得)。 +## Neg改用Sub实现 +无意中发现Neg方法不如后来实现的Sub性能好,这个挺奇怪的,单独测试,gfpNeg性能(BenchmarkGfPNeg-6)要比gfpSub()性能好(BenchmarkGfPNeg2-6): +``` +goos: windows +goarch: amd64 +pkg: github.com/emmansun/gmsm/sm9/bn256 +cpu: Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz +BenchmarkGfPNeg-6 349538827 3.399 ns/op 0 B/op 0 allocs/op +BenchmarkGfPNeg2-6 282038318 4.208 ns/op 0 B/op 0 allocs/op +``` + +但是应用到gfP2的MulUNC方法: +gfpNeg +``` +goos: windows +goarch: amd64 +pkg: github.com/emmansun/gmsm/sm9/bn256 +cpu: Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz +BenchmarkGfP2MulU-6 8290990 141.1 ns/op 64 B/op 1 allocs/op +BenchmarkGfP2SquareU-6 10009350 117.0 ns/op 64 B/op 1 allocs/op +``` +gfpSub +``` +goos: windows +goarch: amd64 +pkg: github.com/emmansun/gmsm/sm9/bn256 +cpu: Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz +BenchmarkGfP2MulU-6 12727611 92.70 ns/op 0 B/op 0 allocs/op +BenchmarkGfP2SquareU-6 17728008 66.35 ns/op 0 B/op 0 allocs/op +``` + ## 下一步 * 参考《New software speed records for cryptographic pairings》使用浮点运算和SIMD实现? * [High-Speed Software Implementation of the Optimal Ate Pairing over Barreto–Naehrig Curves](https://eprint.iacr.org/2010/354.pdf),平方扩域上的运算优化,不过由于他的p选择,有其特殊性。