Updated 无进位乘法和GHASH (markdown)

2025-09-18 04:43:49 +08:00 · 2023-08-21 14:39:05 +08:00 · 2023-08-21 14:39:05 +08:00 · 12d43bee8a
commit 12d43bee8a
parent d9d7d2722c
1 changed files with 33 additions and 2 deletions
--- a/无进位乘法和GHASH.md
+++ b/无进位乘法和GHASH.md
@ -5,7 +5,7 @@
  * PCLMULQDQ            $64 \times 64 \rightarrow 128$ (carry-less)
    * Binary polynomial multiplication; speed up computations in binary fields
  * Using it for AES-GCM:
-  * To use it for GHASH computations: GF($2^{128}$) multiplication:
+  * To use it for GHASH computations: $GF(2^{128})$ multiplication:
    1.  Compute $128 \times 128 \rightarrow 256$ via carry-less multiplication (of 64-bit operands)
    2.  Reduction: ${256 \rightarrow 128} \ modulo \ {x^{128} + x^7 + x^2 + x + 1}$ (done efficiently via software)
 * 128-bit Carry-less Multiplication using PCLMULQDQ  
@ -18,7 +18,38 @@ $[A_1 : A_0] \cdot [B_1 : B_0] = [D_1:D_0 \oplus E_1 \oplus F_1:C_1 \oplus E_0 \
 $A_1 \cdot B_1 = [C_1 : C_0], \ A_0 \cdot B_0 = [D_1 : D_0]$  
 $(A_1 \oplus A_0) \cdot (B_1 \oplus B_0) = [E_1 : E_0]$  
 $[A_1 : A_0] \cdot [B_1 : B_0] = [C_1:C_0 \oplus C_1 \oplus D_1 \oplus E_1 : D_1 \oplus C_0 \oplus D_0 \oplus E_0 : D_0]$
-
+* A new interpretation to GHASH operations
+  * GHASH does not use $GF(2^{128})$ COMPUTATIONS "as expected"
+    * Not in the usual polynomial representation convention
+    * The bits inside the 128-bit operands are reflected
+    * Actually - it is an operation on a permutation of elements of $GF(2^{128})$
+      * $T1 = reflect(A)$
+      * $T2 = reflect(B)$
+      * $T3 \times T2 \ modulo \ {x^{128} + x^7 + x^2 + x + 1}$ (a $GF(2^{128})$ multiplication)
+      * $reflect(T3)$
+  * It can be proved that this operation is:
+    * $A \times B \times x^{-127} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$
+      * A weird Montgomery Multiplication in $GF(2^{128})$ modulo a reversed poly
+    * Better written as  
+      * $A \times (B \times x) \times x^{-128} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$
+* Fast reduction modulo   $x^{128} + x^{127} + x^{126} + x^{121} + 1$
+  * Input 256-bit operand $[X_3:X_2:X_1:X_0]$
+  * $[A_1:A_0] = X_0 \cdot 0xc200000000000000 $
+  * $[B_1:B_0] = [X_0 \oplus A_1 : X_1 \oplus A_0]$
+  * $[C_1:C_0] = B_0 \cdot 0xc200000000000000 $
+  * $[D_1:D_0] = [B_0 \oplus C_1 : B_1 \oplus C_0]$
+  * Output: $[D_1  \oplus X_3 : D_0  \oplus X_2]$
+```asm
+; Input is in T1:T7
+vmodqa     T3, [W]
+vpclmulqda T2, T3, T7, 0x01
+vpshufd    T4, T7, 78
+vpxor      T4, T4, T2
+vpclmulqda T2, T3, T4, 0x01
+vpshufd    T4, T4, 78
+vpxor      T4, T4, T2
+vpxor      T1, T4            ; result in T1
+```

 # 参考
 * [Cryptographic Hardware and Software and useful architectures](https://www.esat.kuleuven.be/cosic/events/ecrypt-net-school-2018/wp-content/uploads/sites/23/2018/10/kos-school-gueron-2.pdf)