Updated 无进位乘法和GHASH (markdown)

Sun Yimin 2023-08-21 14:39:05 +08:00
parent d9d7d2722c
commit 12d43bee8a

@ -5,7 +5,7 @@
* PCLMULQDQ $64 \times 64 \rightarrow 128$ (carry-less)
* Binary polynomial multiplication; speed up computations in binary fields
* Using it for AES-GCM:
* To use it for GHASH computations: GF($2^{128}$) multiplication:
* To use it for GHASH computations: $GF(2^{128})$ multiplication:
1. Compute $128 \times 128 \rightarrow 256$ via carry-less multiplication (of 64-bit operands)
2. Reduction: ${256 \rightarrow 128} \ modulo \ {x^{128} + x^7 + x^2 + x + 1}$ (done efficiently via software)
* 128-bit Carry-less Multiplication using PCLMULQDQ
@ -18,7 +18,38 @@ $[A_1 : A_0] \cdot [B_1 : B_0] = [D_1:D_0 \oplus E_1 \oplus F_1:C_1 \oplus E_0 \
$A_1 \cdot B_1 = [C_1 : C_0], \ A_0 \cdot B_0 = [D_1 : D_0]$
$(A_1 \oplus A_0) \cdot (B_1 \oplus B_0) = [E_1 : E_0]$
$[A_1 : A_0] \cdot [B_1 : B_0] = [C_1:C_0 \oplus C_1 \oplus D_1 \oplus E_1 : D_1 \oplus C_0 \oplus D_0 \oplus E_0 : D_0]$
* A new interpretation to GHASH operations
* GHASH does not use $GF(2^{128})$ COMPUTATIONS "as expected"
* Not in the usual polynomial representation convention
* The bits inside the 128-bit operands are reflected
* Actually - it is an operation on a permutation of elements of $GF(2^{128})$
* $T1 = reflect(A)$
* $T2 = reflect(B)$
* $T3 \times T2 \ modulo \ {x^{128} + x^7 + x^2 + x + 1}$ (a $GF(2^{128})$ multiplication)
* $reflect(T3)$
* It can be proved that this operation is:
* $A \times B \times x^{-127} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$
* A weird Montgomery Multiplication in $GF(2^{128})$ modulo a reversed poly
* Better written as
* $A \times (B \times x) \times x^{-128} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$
* Fast reduction modulo $x^{128} + x^{127} + x^{126} + x^{121} + 1$
* Input 256-bit operand $[X_3:X_2:X_1:X_0]$
* $[A_1:A_0] = X_0 \cdot 0xc200000000000000 $
* $[B_1:B_0] = [X_0 \oplus A_1 : X_1 \oplus A_0]$
* $[C_1:C_0] = B_0 \cdot 0xc200000000000000 $
* $[D_1:D_0] = [B_0 \oplus C_1 : B_1 \oplus C_0]$
* Output: $[D_1 \oplus X_3 : D_0 \oplus X_2]$
```asm
; Input is in T1:T7
vmodqa T3, [W]
vpclmulqda T2, T3, T7, 0x01
vpshufd T4, T7, 78
vpxor T4, T4, T2
vpclmulqda T2, T3, T4, 0x01
vpshufd T4, T4, 78
vpxor T4, T4, T2
vpxor T1, T4 ; result in T1
```
# 参考
* [Cryptographic Hardware and Software and useful architectures](https://www.esat.kuleuven.be/cosic/events/ecrypt-net-school-2018/wp-content/uploads/sites/23/2018/10/kos-school-gueron-2.pdf)