From e0cde326bd1d7d30caddb011567af1a7e95e7cd0 Mon Sep 17 00:00:00 2001 From: Sun Yimin Date: Thu, 23 Dec 2021 16:37:52 +0800 Subject: [PATCH] Updated MFMM (markdown) --- MFMM.md | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/MFMM.md b/MFMM.md index 3b5081a..cf307d2 100644 --- a/MFMM.md +++ b/MFMM.md @@ -148,9 +148,41 @@ acc0, acc1, acc2, acc3, acc4, acc5是64位寄存器 (carry6, acc4) = acc4 + t1 acc5 = carry6 - ====== +考虑以下算法(主要就是一轮加法,一轮减法): + + acc4, acc3, acc2, acc1 + + acc0, 0, 0, (acc0 - L(acc0*2^32)) + - H(acc0*2^32) L(acc0*2^32) H(acc0*2^32) + + =》继续优化 + acc4, acc3, acc2, acc1 + + (acc0 - H(acc0*2^32)), 0, 0, (acc0 - L(acc0*2^32)) + - L(acc0*2^32) H(acc0*2^32) + +acc0 - H(acc0 * 2^32) >= 0, acc0 - L(acc0 * 2^32) >= 0 显然。 + + MOVQ acc0, AX + MOVQ acc0, DX + SHLQ $32, AX + SHRQ $32, DX + MOVQ acc0, t0 + SUBQ AX, t0 + MOVQ acc0, t1 + SUBQ DX, t1 + + ADDQ t0, acc1 + ADCQ $0, acc2 + ADCQ $0, acc3 + ADCQ t1, acc4 + ADCQ $0, acc5 + + SUBQ DX, acc2 + SBBQ AX, acc3 + SBBQ $0, acc4 + SBBQ $0, acc5 + ### 第三步,计算 X * Y1,并且和tmp相加 tmp = tmp + X * Y1,按逐个64位字相加的原则: