mirror of
https://github.com/emmansun/gmsm.git
synced 2025-05-10 19:16:18 +08:00
Page:
无进位乘法和GHASH
Pages
Armv8.2 SM3和SM4
Efficient Software Implementations of ZUC
GCM for SM4
Golang ppc64x asm Reference
Golang s390x asm Reference
High‐Throughput Elliptic Curve Cryptography using AVX2 Vector Instructions
High‐assurance field inversion for curve‐based cryptography
Home
Intel CPU supports SM3 SM4
PQC: ML‐DSA
PQC: ML‐KEM
PQC: SLH‐DSA
SM2 WWMM (2)
SM2 WWMM
SM2加解密性能
SM2性能优化
SM2性能优化(续)
SM3中的FF2和GG2函数
SM3性能优化
SM4 with AESENCLAST
SM4 with GFNI
SM4性能优化
SM9实现及优化
go crypto and BoringCrypto
is my code constant time?
sbox generation
stealth addresses (隐身地址)
关于CTR模式
关于证书和TLS支持
实现Kyber所需的多项式和线性代数知识
实现ML‐DSA所需的多项式和线性代数知识
无进位乘法和GHASH
门限签名
Clone
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
概述
参考Reference 1,Page 35 - 39
- The PCLMULQDQ + AES-NI combination for AES-GCM
- AES-NI facilitate high performance AES encryption and decryption
- PCLMULQDQ
64 \times 64 \rightarrow 128
(carry-less)- Binary polynomial multiplication; speed up computations in binary fields
- Using it for AES-GCM:
- To use it for GHASH computations:
GF(2^{128})
multiplication:- Compute
128 \times 128 \rightarrow 256
via carry-less multiplication (of 64-bit operands) - Reduction:
{256 \rightarrow 128} \ modulo \ {x^{128} + x^7 + x^2 + x + 1}
(done efficiently via software)
- Compute
- 128-bit Carry-less Multiplication using PCLMULQDQ
(Gueron Kounavis, 2009) Multiply128 \times 128 \rightarrow 256 \ [A_1 : A_0]\cdot[B_1 : B_0]
- Schoolbook (4 PCLMULQDQ invocations)
A_0 \cdot B_0 = [C_1 : C_0], \ A_1 \cdot B_1 = [D_1 : D_0]
A_0 \cdot B_1 = [E_1 : E_0], \ A_1 \cdot B_0 = [F_1 : F_0]
[A_1 : A_0] \cdot [B_1 : B_0] = [D_1:D_0 \oplus E_1 \oplus F_1:C_1 \oplus E_0 \oplus F_0 : C_0]
- Carry-less Karatsuba (3 PCLMULQDQ invocations)
A_1 \cdot B_1 = [C_1 : C_0], \ A_0 \cdot B_0 = [D_1 : D_0]
(A_1 \oplus A_0) \cdot (B_1 \oplus B_0) = [E_1 : E_0]
[A_1 : A_0] \cdot [B_1 : B_0] = [C_1:C_0 \oplus C_1 \oplus D_1 \oplus E_1 : D_1 \oplus C_0 \oplus D_0 \oplus E_0 : D_0]
- Schoolbook (4 PCLMULQDQ invocations)
- A new interpretation to GHASH operations
- GHASH does not use
GF(2^{128})
computations "as expected"- Not in the usual polynomial representation convention
- The bits inside the 128-bit operands are reflected
- Actually - it is an operation on a permutation of elements of
GF(2^{128})
T1 = reflect(A)
T2 = reflect(B)
T3 \times T2 \ modulo \ {x^{128} + x^7 + x^2 + x + 1}
(aGF(2^{128})
multiplication)reflect(T3)
- It can be proved that this operation is:
A \times B \times x^{-127} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1
- A weird Montgomery Multiplication in
GF(2^{128})
modulo a reversed poly
- A weird Montgomery Multiplication in
- Better written as
A \times (B \times x) \times x^{-128} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1
- GHASH does not use
- Fast reduction modulo
x^{128} + x^{127} + x^{126} + x^{121} + 1
- Input 256-bit operand
[X_3:X_2:X_1:X_0]
[A_1:A_0] = X_0 \cdot 0xc200000000000000
[B_1:B_0] = [X_0 \oplus A_1 : X_1 \oplus A_0]
[C_1:C_0] = B_0 \cdot 0xc200000000000000
[D_1:D_0] = [B_0 \oplus C_1 : B_1 \oplus C_0]
- Output:
[D_1 \oplus X_3 : D_0 \oplus X_2]
- Input 256-bit operand
; Input is in T1:T0
vmodqa T3, [W] ; poly
vpclmulqda T2, T3, T0, 0x01
vpshufd T4, T0, 78
vpxor T4, T4, T2
vpclmulqda T2, T3, T4, 0x01
vpshufd T4, T4, 78
vpxor T4, T4, T2
vpxor T1, T1, T4 ; result in T1
- Aggregated Reduction
- In a Horner form (iterative computation)
Y_i = MM[(X_i \oplus Y_{i-1}), Hx]
... everytingmod \ x^{128} + x^{127} + x^{126} + x^{121} + 1
- 4-way expanded Horner form (aggregate results to defer the reduction)
MM[X_i , Hx] \oplus MM[X_{i-1} , {(Hx)}^2] \oplus MM[X_{i-2} , {(Hx)}^3] \oplus MM[(X_{i-3} \oplus Y_{i-4}, {(Hx)}^4]
- Can be expanded to N > 4 blocks, we use 8 blocks now.
- Overhead: pre-calculate the powers of
H \cdot x
(amortized for reasonably long buffer) - The gain: reduction deferred to once per "N" blocks
- In a Horner form (iterative computation)
参考
- Cryptographic Hardware and Software and useful architectures
- The Galois/Counter Mode of Operation (GCM)
- Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC
- Intel® Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode
- Optimized Galois-Counter Mode implementation on Intel® Architecture Processors
- Enabling High Performance Galois-Counter Mode on Intel® Architecture Processors