Updated MFMM (markdown)

Sun Yimin 2021-12-23 16:37:52 +08:00
parent 874056a862
commit e0cde326bd

34
MFMM.md

@ -148,9 +148,41 @@ acc0, acc1, acc2, acc3, acc4, acc5是64位寄存器
(carry6, acc4) = acc4 + t1
acc5 = carry6
======
考虑以下算法(主要就是一轮加法,一轮减法)
acc4, acc3, acc2, acc1
+ acc0, 0, 0, (acc0 - L(acc0*2^32))
- H(acc0*2^32) L(acc0*2^32) H(acc0*2^32)
=》继续优化
acc4, acc3, acc2, acc1
+ (acc0 - H(acc0*2^32)), 0, 0, (acc0 - L(acc0*2^32))
- L(acc0*2^32) H(acc0*2^32)
acc0 - H(acc0 * 2^32) >= 0, acc0 - L(acc0 * 2^32) >= 0 显然。
MOVQ acc0, AX
MOVQ acc0, DX
SHLQ $32, AX
SHRQ $32, DX
MOVQ acc0, t0
SUBQ AX, t0
MOVQ acc0, t1
SUBQ DX, t1
ADDQ t0, acc1
ADCQ $0, acc2
ADCQ $0, acc3
ADCQ t1, acc4
ADCQ $0, acc5
SUBQ DX, acc2
SBBQ AX, acc3
SBBQ $0, acc4
SBBQ $0, acc5
### 第三步,计算 X * Y1并且和tmp相加
tmp = tmp + X * Y1按逐个64位字相加的原则