diff --git a/无进位乘法和GHASH.md b/无进位乘法和GHASH.md index 4064b19..cf62c2d 100644 --- a/无进位乘法和GHASH.md +++ b/无进位乘法和GHASH.md @@ -5,7 +5,7 @@ * PCLMULQDQ $64 \times 64 \rightarrow 128$ (carry-less) * Binary polynomial multiplication; speed up computations in binary fields * Using it for AES-GCM: - * To use it for GHASH computations: GF($2^{128}$) multiplication: + * To use it for GHASH computations: $GF(2^{128})$ multiplication: 1. Compute $128 \times 128 \rightarrow 256$ via carry-less multiplication (of 64-bit operands) 2. Reduction: ${256 \rightarrow 128} \ modulo \ {x^{128} + x^7 + x^2 + x + 1}$ (done efficiently via software) * 128-bit Carry-less Multiplication using PCLMULQDQ @@ -18,7 +18,38 @@ $[A_1 : A_0] \cdot [B_1 : B_0] = [D_1:D_0 \oplus E_1 \oplus F_1:C_1 \oplus E_0 \ $A_1 \cdot B_1 = [C_1 : C_0], \ A_0 \cdot B_0 = [D_1 : D_0]$ $(A_1 \oplus A_0) \cdot (B_1 \oplus B_0) = [E_1 : E_0]$ $[A_1 : A_0] \cdot [B_1 : B_0] = [C_1:C_0 \oplus C_1 \oplus D_1 \oplus E_1 : D_1 \oplus C_0 \oplus D_0 \oplus E_0 : D_0]$ - +* A new interpretation to GHASH operations + * GHASH does not use $GF(2^{128})$ COMPUTATIONS "as expected" + * Not in the usual polynomial representation convention + * The bits inside the 128-bit operands are reflected + * Actually - it is an operation on a permutation of elements of $GF(2^{128})$ + * $T1 = reflect(A)$ + * $T2 = reflect(B)$ + * $T3 \times T2 \ modulo \ {x^{128} + x^7 + x^2 + x + 1}$ (a $GF(2^{128})$ multiplication) + * $reflect(T3)$ + * It can be proved that this operation is: + * $A \times B \times x^{-127} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$ + * A weird Montgomery Multiplication in $GF(2^{128})$ modulo a reversed poly + * Better written as + * $A \times (B \times x) \times x^{-128} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$ +* Fast reduction modulo $x^{128} + x^{127} + x^{126} + x^{121} + 1$ + * Input 256-bit operand $[X_3:X_2:X_1:X_0]$ + * $[A_1:A_0] = X_0 \cdot 0xc200000000000000 $ + * $[B_1:B_0] = [X_0 \oplus A_1 : X_1 \oplus A_0]$ + * $[C_1:C_0] = B_0 \cdot 0xc200000000000000 $ + * $[D_1:D_0] = [B_0 \oplus C_1 : B_1 \oplus C_0]$ + * Output: $[D_1 \oplus X_3 : D_0 \oplus X_2]$ +```asm +; Input is in T1:T7 +vmodqa T3, [W] +vpclmulqda T2, T3, T7, 0x01 +vpshufd T4, T7, 78 +vpxor T4, T4, T2 +vpclmulqda T2, T3, T4, 0x01 +vpshufd T4, T4, 78 +vpxor T4, T4, T2 +vpxor T1, T4 ; result in T1 +``` # 参考 * [Cryptographic Hardware and Software and useful architectures](https://www.esat.kuleuven.be/cosic/events/ecrypt-net-school-2018/wp-content/uploads/sites/23/2018/10/kos-school-gueron-2.pdf)