Quadruple 128 bit Floating Point Library 1.0

License: Free ‎File size: 83.97 KB
‎Users Rating: 2.3/5 - ‎9 ‎votes

Signed 128-bit floating point data type library, with 64 effective bits of precision (vs. 53 for the built-in Double type) and a 64 bit exponent (vs. 11 for Doubles). With greater precision and far greater range, Quads are especially useful when dealing with very large or very small values, such as those in probabilistic models. Adopting a larger fixed precision rather than an arbitrary precision type (such as Java''s BigDecimal) means that, while still slower than built-in arithmetic, the penalty is only an order of magnitude or less and thus still feasible in many math-heavy applications. For example, on an Intel Core i5-2410M laptop, a billion multiplications takes 17 seconds with Double values, 135 seconds with Quad values using the overloaded * operator, and just 76 seconds using the Multiply() method (the higher overhead of * is due to the poor inlining logic of the .Net compiler/JIT optimizer). By comparison, the commonly-used workaround for multiplication underflow and overflow, summing logarithms, takes 130 seconds. In addition to being faster and more precise than log arithmetic, Quads also simplify code by eliminating the need to remember which variables are log''d and converting back and forth to log''d values. The Quadruple library is written in C# (source code included) and targets .Net 4.0; it should also be easily portable to .Net 2.0 and similar languages (such as Java) with straightforward modifications.

VERSION HISTORY

  • Version 1.0 posted on 2011-06-15
    Initial release

Program Details