Veltkamp/Dekker algorithm
Veltkamp/Dekker algorithm to compute the exact error of the multiplication with only floating-point operations.
Authors: Sylvie Boldo
Topics: Floating-Point Computations
Tools: Frama-C / Jessie / Gappa
References: NSV-3 Benchmarks
see also the index (by topic, by tool, by reference, by year)
The following C function indeed computes the exact error of the multiplication [4, 3] with only floating-point operations (and no FMA). This requires 16 operations and each of them has a specific role to play to get the exact error. This algorithm was already proved with a generic radix and precise underflow restrictions and overflow restrictions so that no infinity will be created [1]. The question was how to specify it as a program to get the full benefit of the proof.
See also [2]
References
- [1]
- Sylvie Boldo. Pitfalls of a full floating-point proof: example on the formal proof of the Veltkamp/Dekker algorithms. In Ulrich Furbach and Natarajan Shankar, editors, Third International Joint Conference on Automated Reasoning, volume 4130 of Lecture Notes in Computer Science, pages 52–66, Seattle, USA, August 2006. Springer.
- [2]
- Sylvie Boldo and Claude Marché. Formal verification of numerical programs: from C annotated programs to mechanical proofs. Mathematics in Computer Science, 2011.
- [3]
- Theodorus J. Dekker. A floating point technique for extending the available precision. Numerische Mathematik, 18(3):224–242, 1971.
- [4]
- Gerhard W. Veltkamp. Algolprocedures voor het berekenen van een inwendig product in dubbele precisie. RC-Informatie 22, Technische Hogeschool Eindhoven, 1968.
/*@ requires xy == \round_double(\NearestEven,x*y) && @ \abs(x) <= 0x1.p995 && @ \abs(y) <= 0x1.p995 && @ \abs(x*y) <= 0x1.p1021; @ ensures ((x*y == 0 || 0x1.p-969 <= \abs(x*y)) @ ==> x*y == xy+\result); @*/ double Dekker(double x, double y, double xy) { double C,px,qx,hx,py,qy,hy,tx,ty,r2; C=0x8000001p0; /*@ assert C == 0x1p27+1; */ px=x*C; qx=x-px; hx=px+qx; tx=x-hx; py=y*C; qy=y-py; hy=py+qy; ty=y-hy; r2=-xy+hx*hy; r2+=hx*ty; r2+=hy*tx; r2+=tx*ty; return r2; }