Math in V8 is Broken
Athan Reines / @kgryte
Hi! Thank you for coming to this talk.
I am here today to talk about math. I should note that, while the title of this talk explicitly mentions V8, much of the content applies to other JavaScript engines, as well. And further, the concept that math is broken extends beyond any particular engine and also includes JavaScript at the specification level.
Without further ado...
Standard Library
The JavaScript standard library is often a developer's first introduction to math in JavaScript and its math functions are commonly used in JavaScript applications.
To review...
ES5
acos
asin
atan
atan2
cos
sin
tan
abs
exp
log (ln)
pow
sqrt
In ES5, the standard math library consisted of 18 functions and 8 constants. These functions can be categorized as trigonometric, other special functions, including rounding, and miscellaneous, including a PRNG.
ES2015/ES6
acosh
asinh
atanh
cosh
sinh
tanh
sign
cbrt
expm1
log10
log1p
log2
ES2015/ES6 added 17 more functions to the standard math library. Again, these functions are broken down into trigonometric, other special functions, including rounding, and miscellaneous.
The addition of these 17 functions means the JavaScript standard math library has 35 functions.
Comparison
For comparison, let us compare the JavaScript standard Math library to another language.
Golang
Abs
Acosh
Asin
Asinh
Atan
Atan2
Atanh
Cbrt
Ceil
Copysign
Cos
Cosh
Dim
Erf
Erfc
Exp
Exp2
Expm1
Float32bits
Float64bits
Floor
Frexp
Gamma
Hypot
Ilogb
J0
J1
Jn
Ldexp
Lgamma
Log
Log10
Log1p
Log2
Logb
Max
Min
Mod
Modf
Nextafter
Once again, we find trigonometric functions, various other special functions, and lower level functions for manipulating and generating floating-point numbers.
Golang (cont.)
NextAfter32
Pow
Pow10
Signbit
Sin
Sincos
Sinh
Sqrt
Tan
Tanh
Trunc
Y0
Y1
Yn
ExpFloat64
Float64
Int
Int63
NormFloat64
Perm
Uint32
Zipf
(Complex)
(Big)
In total, the Golang standard math library has 54 functions (JS has 35), 8+ PRNGs, as well as support for 64-bit and 128-bit complex numbers. Notably, Golang has equivalents for most C/C++ standard math library functions.
So what?
You may be asking why you should care that some other language has more built-in math functions than JavaScript. The reasons are two-fold.
First, just to point out that JavaScript, a 20+ year old language, is pretty deficient in terms of basic math functionality.
Second, these type of basic math functions are critical, fundamental building blocks for other higher order mathematical functions. Many numeric computing libraries, especially for computing statistics and machine learning, rely extensively on these basic math functions. Thus, their availability is an important factor in enabling more complex numeric computing applications. The more tools you have at your disposal, the fewer your limitations and the more things you can create.
And because of their importance and widespread use, ensuring that basic math functions are performant and accurate is critical. If these functions are not accurate, their errors propagate through downstream consumers and inevitably lead to significant deviations from expected results.
And speaking of errors...
V8 and Node
Node
EOL
V8
Release
0.10.44
10-2016
3.14.5.9
05-2013
0.12.17
01-2017
3.28.71.19
11-2014
4.6.2
04-2018
4.5.103.42
08-2015
5.12.0
06-2016
4.6.85.32
11-2015
6.9.1
04-2019
5.1.281.84
07-2016
7.2.0
06-2017
5.4.500.43
11-2016
To set the stage, the provided table shows, for each Node version, the underlying V8 engine. Note that this table is not fixed. Node versions having the same major version may include different versions of V8, as Node.js maintainers pull in updated V8 versions.
Of particular interest is the fact that a V8 version has a relatively long life due to its association with Node. For instance, by the time Node v4 reaches its end-of-life in 2018, the associated V8 engine will be more than two years old. Accordingly, any bug fixes and enhancements which happen during the interim (e.g., in V8 versions 5 and later) will not be present in Node v4. And given that Node versions find use long after they have been EOL'd, bugs may linger for many years to come.
In short: old bugs are still bugs.
With this in mind...
Math.sin/Math.cos
var x = Math.pow( 2, 120 );
// returns 1.329227995784916e+36
var y = Math.sin( x );
// returns: -0.8783788442551665
// expected: 0.377820109360752
y = Math.cos( x );
// returns: 0.47796506772457525
// expected: -0.9258790228548378
For certain values, Math.sin returns inaccurate results.
Who does it affect? Node version 0.10.
See V8 issues 3006 and 3089 .
NaN
var nan1 = 0.0 / 0.0;
var nan2 = Number.POSITIVE_INFINITY / Number.POSITIVE_INFINITY;
var arr = [ nan1, nan2 ];
var FLOAT64_VIEW = new Float64Array( arr );
var INT32_VIEW = new Int32Array( FLOAT64_VIEW.buffer );
var isnan1 = ( FLOAT64_VIEW[0] !== FLOAT64_VIEW[0] );
// returns true
var isnan2 = ( FLOAT64_VIEW[1] !== FLOAT64_VIEW[1] );
// returns true
var bool = ( nan1 !== nan2 );
// returns true
var int11 = INT32_VIEW[ 0 ];
var int12 = INT32_VIEW[ 1 ];
var int21 = INT32_VIEW[ 2 ];
var int22 = INT32_VIEW[ 3 ];
console.log( '%d|%d and %d|%d', int11, int12, int21, int22 );
// => 0|2146959360 and 0|-524288
The existence of distinguishable NaNs is not necessarily a bad thing. The IEEE 754 specification discusses use cases, and, if you review older numeric computing libraries, you will see prevalent use of signaling and non-signaling NaNs. However, most developers are more than likely not aware that NaN values are not canonicalized on write (for reasons of performance). And that lack of awareness is a potential source of bugs.
Note, also, that, on some Windows systems and Node v0.10, you can manipulate the bit sequence within typed arrays to generate a distinguishable NaN, where `NaN == NaN`.
See also "Canonical NaNs" and get-nans .
Math.pow
var x = Math.pow( 10, 308 );
// returns: 1.0000000000000006e+308
// expected: 1.0e+308
Admittedly, the error is only 3 ulp; however, those 3 ulp matter. Inaccurate results for integer inputs a) is counterintuitive because 10**308 is representable as a double (so not the same counterintuitive as 0.1 + 0.2 != 0.3) and b) ends up causing downstream accuracy issues for higher order mathematical functions as we discovered while writing such functions.
See V8 issues 3599 and 5157 .
Algorithm
function pow( x, y ) {
var m = x;
var n = y;
var p = 1;
while ( n !== 0 ) {
if ( ( n & 1 ) !== 0 ) {
p *= m;
}
m *= m;
if ( ( n & 2 ) !== 0 ) {
p *= m;
}
m *= m;
n >>= 2;
}
return p;
}
If we examine the algorithm used to perform exponentiation for integer inputs, the reason for the imprecision becomes evident. If you were to log each iteration of the while loop, you would discover that results start going awry by the third iteration due to error accumulation.
So why did implementers opt for this algorithm over a more accurate pow algorithm? The reason is because a rigorous pow algorithm is complex, involving a fair amount of special cases and low-level floating-point bit manipulation.
So why does a more accurate pow algorithm matter? The complexity is not for the sake of complexity. Considerable effort is needed to compute accurate results. And those accurate results are needed by downstream consumers. Otherwise, anyone who builds on top of Math.pow must sacrifice accuracy and perform various adjustments to ensure results "match" reference implementations.
Math.atanh
var y = Math.atanh( 1.0e-10 );
// returns: 1.000000082640371e-10
// expected: 1.0e-10
For certain values, Math.atanh returns inaccurate results. Fixed in Node v7.
See V8 issue 3511 .
Math.acosh
var y = Math.acosh( 1.0 + 1.0e-10 );
// returns: 0.000014142136208733941
// expected: 0.000014142136208675862
For certain values, Math.acosh returns inaccurate results. Fixed in Node v7.
See V8 issue 3509 .
Math.asinh
var y = Math.asinh( 1.0e-50 );
// returns: 0.0
// expected: 1.0e-50
y = Math.asinh( 1.0e200 );
// returns +infinity
// expected: 461.2101657793691
For certain values, Math.asinh returns inaccurate results. Fixed in Node v7.
See V8 issue 3496 .
Algorithm
function asinh( x ) {
if ( x === 0 || !isFinite( x ) ) {
return x;
}
if ( x > 0 ) {
return Math.log( x + Math.sqrt( x*x + 1 ) );
}
return -Math.log( -x + Math.sqrt( x*x + 1 ) );
}
If we examine the algorithm for Math.asinh, we see that the implementation consists of a single approximation for ALL input values. For small values, becomes log(1) which is 0. For large values, squaring x overflows.
For other ES2015/ES6 implementations in V8 prior to Node v7, see the source .
Algorithm
function asinh( x ) {
var sgn;
var xx;
var t;
if ( isnan( x ) || isinfinite( x ) ) {
return x;
}
if ( x < 0.0 ) {
x = -x;
sgn = true;
}
// Case: |x| < 2**-28
if ( x < NEAR_ZERO ) {
t = x;
}
// Case: |x| > 2**28
else if ( x > HUGE ) {
t = ln( x ) + LN2;
}
// Case: 2**28 > |x| > 2.0
else if ( x > 2.0 ) {
t = ln( 2.0*x + 1.0/(sqrt(x*x + 1.0) + x) );
}
// Case: 2.0 > |x| > 2**-28
else {
xx = x * x;
t = log1p( x + xx/(1.0 + sqrt(1.0 + xx)) );
}
return sgn ? -t : t;
}
Compare to FDLIBM. In this algorithm, we do not treat all input values the same; instead, we apply different approximations based on magnitude, leading to more accurate results.
Math.exp
var y = Math.exp( 100.0 );
// returns: 2.6881171418161485e+43
// expected: 2.6881171418161356e+43
Error of 26 ulp. For many values, Math.exp returns inaccurate results. Regression in Node v0.12. Fixed in Node v7.
See V8 issue 3468 .
Math.random
uint32_t state0 = 1;
uint32_t state1 = 2;
uint32_t mwc1616() {
state0 = 18030 * (state0 & 0xffff) + (state0 >> 16);
state1 = 30903 * (state1 & 0xffff) + (state1 >> 16);
return state0 << 16 + (state1 & 0xffff);
}
MWC1616: multiply-with-carry combining two 16-bit parts.
Issues:
Outdated version of a low-quality PRNG.
Limited number of distinct values: 2**32.
The more significant bits are almost entirely dependent on state0
Vulnerable to a poorly chosen initial state.
Fails many of the statistical tests used to evaluate PRNGs
See also Betable post and V8 follow-up .
Example
var ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_';
function randomString( length ) {
var randint;
var str;
var i;
str = '';
for ( i = 0; i < length; i++ ) {
randint = Math.floor( Math.random() * ALPHABET.length );
str += ALPHABET.substring( randint, randint+1 );
}
return str;
}
Why does having a good PRNG matter? Consider this function which is designed to generate a random string. If you do the math, the function should be capable of generating 2**132 different 22 character strings, which means the possibility of a collision is approximately nil.
However, note how the random integer is generated. We floor the result of multiplying a pseudorandom number on the interval [0,1) by the length, thus throwing away the less significant bits.
Consequently, we end up relying on ONLY the top 6 bits of the PRNG output. In which case, the number of possible 22 character strings is less than 2**30, and the possibility of a collision is consequently much greater.
Note, if you need a robust PRNG for generating pseudorandom integers, DO NOT do this. While common, this approach more often than not leads to biased results.
See Stack Overflow for another example of how Math.random is used: generating UUIDs. Also, note that Math.random is not cryptographically secure; see Hacking the JavaScript Lottery .
Just to re-emphasize the non-randomness of Math.random, here is the before and after when the V8 team switched from MWC1616 to xorshift. Note the considerable structure apparent in the before image compared to the after image. One of the hallmarks of a good PRNG is the appearance of randomness. The Math.random implementation in Node versions 4 and earlier clearly fails in this regard.
JavaScript
Now that we have discussed the particulars, let us turn our attention to how math in JavaScript is broken at the specification level.
Underspecification
Cross-browser variability
No single codebase
Versioning
Required shims
Globals
Testing
No golden algorithms
Timescale
Trust
Polyfills
var isFinite = require( 'is-finite' );
module.exports = function asinh( x ) {
if ( x === 0 || !isFinite( x ) ) {
return x;
}
return x < 0 ? -asinh( -x ) : Math.log( x + Math.sqrt( x*x + 1 ) );
}
Maybe you are thinking you can just polyfill all the problematic math functions. Wrong.
If this implementation looks familiar, you are correct. The implementation is the same as we saw earlier.
Unfortunately, many of the "polyfills" (and "ponyfills") are written by people who, quite frankly, have no business publishing such packages. Some of these polyfills are even written by "highly regarded" members of the community, giving the implementations an air of legitimacy, when, in fact, they are low quality implementations.
Libraries
function asinh( x ) {
return Math.log( Math.sqrt( x*x + 1 ) + x );
}
Okay. So maybe you are thinking you can use a popular math library. Surely a highly popular dedicated math library will have good implementations! Wrong.
Look familiar? This implementation is lifted directly from a highly popular (nearly 4000 stars, >150 watching, >50 contributors) JavaScript math library.
Lesson: social indicators are poor proxies for evaluating numeric computing libraries. Most JavaScript math libraries that you can find on npm are low or very low quality. The only reliable way to evaluate code quality is to READ THE SOURCE! Do your research: always know what you are buying!
:(
So if you cannot trust built-ins, polyfills, and most libraries, what are you to do?
stdlib is a project that myself and collaborators having been working on for the past year to provide robust, accurate, and performant numeric computing libraries in JavaScript.
See repository (in particular base special (asinh, riemann-zeta, ldexp), random, dist, utils).
Still very much a work in progress.
Get Involved!
Subscribe to the repo, follow on Twitter, and reach out.
Advocate
Int64/Uint64
Int128/Uint128
Typed Objects (complex)
SIMD (long)
Parallelism
GPGPU
BigFloat/BigInt
Lobby the standards body. Advocate for language features which can only be implemented at the specification level.
Intentionally left blank.
What can be done at the specification level?
Counting things, 64bit IDs which are not strings, money,...
Efficient serialization
GPGPU
Current support is for short SIMD, which is good for games and image processing, but less so for large vectors.
Lightweight threading model (similar to Go), rather than IPC
Allows for more compact syntax (MATLAB)
While most of the time, coding algos in JS should suffice, a compilation target for performance critical code will be awesome to have.