From 77a16990ec3c8eda4d6ab4863a69618c55bc4737 Mon Sep 17 00:00:00 2001 From: Sun Yimin Date: Mon, 21 Aug 2023 15:09:54 +0800 Subject: [PATCH] =?UTF-8?q?Updated=20=E6=97=A0=E8=BF=9B=E4=BD=8D=E4=B9=98?= =?UTF-8?q?=E6=B3=95=E5=92=8CGHASH=20(markdown)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 无进位乘法和GHASH.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/无进位乘法和GHASH.md b/无进位乘法和GHASH.md index 55b7e6c..70be01b 100644 --- a/无进位乘法和GHASH.md +++ b/无进位乘法和GHASH.md @@ -50,6 +50,14 @@ vpshufd T4, T4, 78 vpxor T4, T4, T2 vpxor T1, T1, T4 ; result in T1 ``` +* Aggregated Reduction + * In a [Horner form (iterative computation)](https://en.wikipedia.org/wiki/Horner%27s_method) + * $Y_i = MM[(X_i \oplus Y_{i-1}), Hx]$ ... everyting $mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$ + * 4-way expanded Horner form (aggregate results to defer the reduction) + * $MM[X_i , Hx] \oplus MM[X_{i-1} , {(Hx)}^2] \oplus MM[X_{i-2} , {(Hx)}^3] \oplus MM[(X_{i-3} \oplus Y_{i-4}, {(Hx)}^4] $ + * Can be expanded to N > 4 blocks, we use 8 blocks now. + * Overhead: pre-calculate the powers of Hx (amortized for reasonably long buffer) + * The gain: reduction deffered to once per "N" blocks # 参考 * [Cryptographic Hardware and Software and useful architectures](https://www.esat.kuleuven.be/cosic/events/ecrypt-net-school-2018/wp-content/uploads/sites/23/2018/10/kos-school-gueron-2.pdf)