Node.js Add-ons for High Performance Numeric Computing
Survey
Who here has heard of Node.js native add-ons?
Anyone here hearing about Node.js native add-ons for the first time?
Who here has written a native add-on?
For those who have written native add-ons, why have you done so?
How has your experience been writing native add-ons? Negative? Positive? Somewhere in between?
Who here has used a Node.js native add-on for numeric computing?
Overview
Intro
Toolchain
Numeric Computing
Basic Example
BLAS
Performance
Challenges
N-API
Conclusions
This talk will be technical and contains many slides displaying source code. I won't spend much time on those slides, only pausing long enough to highlight key points. If I move too quickly, this talk is on-line with notes so you can revisit during and after this conference.
The talk overview is as follows...
First, I will provide an overview of Node.js native add-ons.
Next, I will introduce the current toolchain for authoring add-ons.
Then, I will discuss why native add-ons are important for numeric computing.
I'll follow by showing a basic native add-on example.
After the basic example, I'll show a more complex example where we need to write an add-on which links a BLAS library written in Fortran to the JavaScript runtime.
Next, I will show performance comparisons.
Then, I'll discuss some of the challenges we have faced writing native add-ons for numeric computing and how we have addressed them.
Before concluding, I will mention N-API, an application binary interface, or ABI, which will provide a stable abstraction layer over JavaScript engines.
And finally, I will offer some conclusions and additional resources you can use to get started using Node.js native add-ons for high-performance numeric computing.
Interface between JS running in Node.js and C/C++ libraries
A Node.js native add-on provides an interface between JavaScript running in Node.js and, primarily, C/C++ libraries.
From the perspective of Node.js applications, an add-on is just another module which an application can require
.
APIs
V8
libuv
Internal Libraries
Dependencies
Why?
Why would you choose a native add-on over plain JavaScript?
Leverage existing codebases
Access lower-level APIs
Non-JavaScript features
Performance
Four primary reasons...
One reason is that you want to link to existing C/C++ libraries. This allows you to avoid having to port and re-implement functionality in JavaScript, which, for larger libraries, can require significant time and investment.
Next, you may want to access lower-level APIs, such as worker threads.
Third, you may need language features not available in JavaScript, such as 64-bit integers or SIMD.
Last, you need a performance boost, including leveraging hardware optimization.
Now that we have motivated why we may want to use add-ons, how do we go about doing so?
Toolchain
This brings us to the native add-on toolchain.
node-gyp
We begin with node-gyp, which is a cross-platform command-line tool written primarily in JavaScript and is used for compiling native add-ons for Node.js.
node-gyp bundles GYP and automatically downloads necessary development files and headers for target Node.js versions.
GYP
GYP, which stands for "generate your project", is a meta-build system, which builds other build systems, depending on the target platform.
The aim of GYP is to replicate, as closely as possible, the native build setup of a target platform IDE. For example, on MacOS, that means generating XCode projects. On Windows, Visual Studio projects.
And once GYP generates the build system, we can compile our add-on.
Historically, developing native add-ons has been a difficult process.
Challenges
Here have been some of the challenges.
The foremost challenge has been handling breaking changes in V8.
Each Node.js major release entailed a new V8. In the past, the V8 team was not concerned about backward compatibility and would often introduce significant changes, removing, replacing, and adding interfaces and functionality. These changes would force add-on authors to rewrite their packages, publish a new major version, and make providing backward compatibility extremely difficult.
To alleviate some of the "pain" of native add-ons, members of the Node.js community created a package NAN, which stands for Native Abstractions for Node.js.
NAN attempts to provide a stable abstraction layer that native add-on authors can target. Internally, NAN handles the complex logic required to maintain functionality from one V8 version to the next.
And while NAN has been beneficial, even it has had breaking changes in its API.
Another issue is GYP. GYP was designed with a particular use case in mind: Chrome. It was not explicitly designed for Node.js add-ons.
Further, GYP documentation is either poor or incomplete, presenting significant challenges whenever you want to do something beyond simple "hello world" type examples.
Because of poor documentation, you spend considerable time searching the Internet and looking for other projects using GYP to see how those projects handle special configurations. And in particular, anytime you want to use GYP to compile languages other than C/C++, e.g., Fortran, Cuda, Rust, and Golang, good luck.
Resources are few and far between.
A more forward looking concern is that node-gyp is biased toward V8. Meaning the toolchain is not engine neutral. This means compiling Node.js and Node.js native add-ons with alternative engines, such as Chakra, is less straightforward, requiring shims like Chakrashim.
Despite these challenges, native add-ons are highly important for numeric computing.
Numeric Computing
Native add-ons are important for numeric computing because they allow us to interface with high-performance numeric computing libraries written in Fortran/C/C++.
What you find when reading the source code of Julia, R, and Python libraries like NumPy and SciPy is that a substantial amount of the functionality they expose relies on providing wrappers for existing numeric computing code bases written in C/C++ and Fortran.
For example, for high-performance linear algebra, these platforms wrap BLAS and LAPACK. For fast Fourier transforms, they wrap FFTW. For BigInt, Julia wraps GMP. For BigFloat, Julia wraps MPFR.
Node.js native add-ons allow us to do something similar; namely, we can expose high-performance numeric computing functionality to Node.js and to JavaScript.
This means we can leverage highly optimized libraries which have been used with great success for decades and not spend time rewriting implementations.
In summary, native add-ons allow us to do in Node.js what other environments used for numeric computing can do.
At this point, we have discussed, at a high-level, what native add-ons are, their toolchain, some challenges, and motivated why they are important for numeric computing. Let's now discuss a basic example...
/* hypot.h */
#ifndef C_HYPOT_H
#define C_HYPOT_H
#ifdef __cplusplus
extern "C" {
#endif
double c_hypot( const double x, const double y );
#ifdef __cplusplus
}
#endif
#endif
The example I am going to use is a simple function to compute the hypotenuse, avoiding underflow and overflow.
We first define a basic header file defining the interface to the function exported to the JavaScript runtime, taking care to guard against name mangling and ensuring similar behavior as might be observed when using a standard C compiler.
/* hypot.c */
#include <math.h>
#include "hypot.h"
double c_hypot( const double x, const double y ) {
double tmp;
double a;
double b;
if ( isnan( x ) || isnan( y ) ) {
return NAN;
}
if ( isinf( x ) || isinf( y ) ) {
return INFINITY;
}
a = x;
b = y;
if ( a < 0.0 ) {
a = -a;
}
if ( b < 0.0 ) {
b = -b;
}
if ( a < b ) {
tmp = b;
b = a;
a = tmp;
}
if ( a == 0.0 ) {
return 0.0;
}
b /= a;
return a * sqrt( 1.0 + (b*b) );
}
Next, we write our implementation. This is a standard C implementation which imports the standard math library and includes a function which accepts two arguments, x and y, and returns a numeric result.
/* addon.cpp */
#include <nan.h>
#include "hypot.h"
namespace addon_hypot {
using Nan::FunctionCallbackInfo;
using Nan::ThrowTypeError;
using Nan::ThrowError;
using v8::Number;
using v8::Local;
using v8::Value;
void node_hypot( const FunctionCallbackInfo<Value>& info ) {
if ( info.Length() != 2 ) {
ThrowError( "invalid invocation. Must provide 2 arguments." );
return;
}
if ( !info[ 0 ]->IsNumber() ) {
ThrowTypeError( "invalid input argument. First argument must be a number." );
return;
}
if ( !info[ 1 ]->IsNumber() ) {
ThrowTypeError( "invalid input argument. Second argument must be a number." );
return;
}
const double x = info[ 0 ]->NumberValue();
const double y = info[ 1 ]->NumberValue();
Local<Number> h = Nan::New( c_hypot( x, y ) );
info.GetReturnValue().Set( h );
}
NAN_MODULE_INIT( Init ) {
Nan::Export( target, "hypot", node_hypot );
}
NODE_MODULE( addon, Init )
}
Once our implementation is finished, we create a wrapper written in C++ which calls the C function.
Note the inclusion of NAN and recall that NAN provides a stable API across V8 versions.
Most of the C++ is unwrapping and wrapping object values. The function wrapper takes a single argument: the arguments object. We perform basic input value sanity checks and then proceed to unwrap the individual arguments x and y. Once we have x and y, we call our C function and set the return value.
And finally, we end by exporting an initialization function, which is required of all Node.js native add-ons.
I should note that we did not have to write the implementation in a separate C file. We could have written directly in our add-on file; however, using a separate file a) facilitates re-usability of source files in non-add-on contexts and b) is a more common scenario when working with existing codebases.
# binding.gyp
{
'targets': [
{
'target_name': 'addon',
'sources': [
'addon.cpp',
'hypot.c'
],
'include_dirs': [
'<!(node -e "require(\'nan\')")',
'./'
]
}
]
}
We then create a minimal binding.gyp file, which GYP uses to generate the build project for the target platform.
In GYP terminology, we define a target--here, the target name is "addon".
The sources field indicates each source file which should be compiled.
And finally, we list the directories which contain header files. Here, we use GYP command expansion to instruct Node to evaluate the string containing the require
statement. When NAN is required, it prints the location of its header files to stdout, thus allowing us to dynamically resolve the NAN header file location.
# Navigate to add-on directory:
$ cd path/to/hypot/binding.gyp
# Generate build files:
$ node-gyp configure
# On Windows:
# node-gyp configure --msvs_version=2015
# Build add-on:
$ node-gyp build
Once we have created the four files, we can now build our add-on.
To do so, we navigate to the add-on directory containing the binding.gyp
file.
Then we generate the build files using the configure subcommand. This step generates the build files. On Unix platforms, this will be a Makefile. On Windows, this will be a vcxproj file.
Finally, we run build to actually compile the add-on.
/* hypot.js */
var hypot = require( './path/to/build/Release/addon.node' ).hypot;
var h = hypot( 5.0, 12.0 );
// returns 13.0
After compiling an add-on, we can use the exported function in JavaScript.
As may be observed in the require
statement, the add-on exports an object with a method, whose name matches the export we defined when we wrote our addon.cpp
file.
When invoked, this function behaves just like a comparable function implemented in JavaScript, accepting numeric arguments and returning a number.
ops/sec
perf
Builtin
3,954,799
1x
Native
4,732,108
1.2x
JavaScript
7,337,790
1.85x
Okay, so we have implemented our add-on and we're ready to use it, but we should ask ourselves: how does the performance compare to an implementation written purely in JavaScript?
Here are benchmark results run on my laptop running Node.js version 8 which is using one of the latest versions of V8.
In the first row, I am showing the results for the builtin hypot
function provided by the JavaScript standard math library. We see that, on my machine, we can compute the hypotenuse around 4 million times per second.
On the next row, I am showing the results of our native add-on. We see that we get a slight performance boost of around 800,000 operations per second.
Lastly, on the third row, I am showing the results of an equivalent implementation written purely in JavaScript. We can see that, when compared to add-on performance, we can perform 2.5 million more operations per second, which is a significant performance boost.
Two comments. First, simply because we can write code in C, this does not mean we will achieve better performance by doing so, due to overhead when calling into an add-on. Second, simply because something is a standard, this does not mean the function is fast, in an absolute sense. You can often achieve better performance in JavaScript via userland implementations by restricting the domain of input argument types and choosing your algorithms wisely.
BLAS , or Basic Linear Algebra Subprograms, are routines that provide standard implementations for performing basic vector and matrix operations.
BLAS routines are split into three levels. Level 1 BLAS routines perform scalar, vector and vector-vector operations. Level 2 BLAS routines perform matrix-vector operations. Level 3 BLAS routines perform matrix-matrix operations.
BLAS routines are commonly used in the development of high quality linear algebra software (e.g., LAPACK) due to their efficiency, portability, and wide availability. And they are foundational for most modern numeric computing environments.
! dasum.f
! Computes the sum of absolute values.
double precision function dasum( N, dx, stride )
implicit none
integer :: stride, N
double precision :: dx(*)
double precision :: dtemp
integer :: nstride, mp1, m, i
intrinsic dabs, mod
! ..
dasum = 0.0d0
dtemp = 0.0d0
! ..
if ( N <= 0 .OR. stride <= 0 ) then
return
end if
! ..
if ( stride == 1 ) then
m = mod( N, 6 )
if ( m /= 0 ) then
do i = 1, m
dtemp = dtemp + dabs( dx( i ) )
end do
if ( N < 6 ) then
dasum = dtemp
return
end if
end if
mp1 = m + 1
do i = mp1, N, 6
dtemp = dtemp + &
dabs( dx( i ) ) + dabs( dx( i+1 ) ) + &
dabs( dx( i+2 ) ) + dabs( dx( i+3 ) ) + &
dabs( dx( i+4 ) ) + dabs( dx( i+5 ) )
end do
else
nstride = N * stride
do i = 1, nstride, stride
dtemp = dtemp + dabs( dx( i ) )
end do
end if
dasum = dtemp
return
end function dasum
An example BLAS routine is the Level 1 function dasum
, which stands for double absolute value sum. As suggested by the name, the function computes the sum of absolute values for a vector whose values are of type double
.
The algorithm is not particularly interesting, but should provide an illustrative example as to how we might expose this or a similar function to JavaScript.
! dasumsub.f
! Wraps dasum as a subroutine.
subroutine dasumsub( N, dx, stride, sum )
implicit none
! ..
interface
double precision function dasum( N, dx, stride )
integer :: stride, N
double precision :: dx(*)
end function dasum
end interface
! ..
integer :: stride, N
double precision :: sum
double precision :: dx(*)
! ..
sum = dasum( N, dx, stride )
return
end subroutine dasumsub
Recall that Node.js native add-ons are intended to provide C/C++ bindings. In which case, the first thing we need to do is provide a C interface to the Fortran function.
The first obstacle in doing so is that we cannot use dasum
directly from C because Fortran expects arguments to be passed by reference rather than by value. Furthermore, while not applicable here, Fortran functions can only return scalar values, not arrays. Thus, the general best practice is to wrap Fortran functions as subroutines (equivalent of a C function returning (void)
), where we can pass a pointer for storing the output return value.
In which case, the first thing we have to do is wrap our Fortran function as a Fortran subroutine.
/* dasum_fortran.h */
#ifndef DASUM_FORTRAN_H
#define DASUM_FORTRAN_H
#ifdef __cplusplus
extern "C" {
#endif
void dasumsub( const int *, const double *, const int *, double * );
#ifdef __cplusplus
}
#endif
#endif
Similar to our hypot
example, we create a header file defining the subroutine prototype.
/* dasum.h */
#ifndef DASUM_H
#define DASUM_H
#ifdef __cplusplus
extern "C" {
#endif
double c_dasum( const int N, const double *X, const int stride );
#ifdef __cplusplus
}
#endif
#endif
We can now create the header file containing the prototype for our C wrapper, using the naming convention in which we attach c_
as a prefix to the function name.
/* dasum_f.c */
#include "dasum.h"
#include "dasum_fortran.h"
double c_dasum( const int N, const double *X, const int stride ) {
double sum;
dasumsub( &N, X, &stride, &sum );
return sum;
}
Our C wrapper is fairly straightforward, passing all arguments by reference to the Fortran subroutine.
/* addon.cpp */
#include <nan.h>
#include "dasum.h"
namespace addon_dasum {
using Nan::FunctionCallbackInfo;
using Nan::TypedArrayContents;
using Nan::ThrowTypeError;
using Nan::ThrowError;
using v8::Number;
using v8::Local;
using v8::Value;
void node_dasum( const FunctionCallbackInfo<Value>& info ) {
if ( info.Length() != 3 ) {
ThrowError( "invalid invocation. Must provide 3 arguments." );
return;
}
if ( !info[ 0 ]->IsNumber() ) {
ThrowTypeError( "invalid input argument. First argument must be a number." );
return;
}
if ( !info[ 2 ]->IsNumber() ) {
ThrowTypeError( "invalid input argument. Third argument must be a number." );
return;
}
const int N = info[ 0 ]->Uint32Value();
const int stride = info[ 2 ]->Uint32Value();
TypedArrayContents<double> X( info[ 1 ] );
Local<Number> sum = Nan::New( c_dasum( N, *X, stride ) );
info.GetReturnValue().Set( sum );
}
NAN_MODULE_INIT( Init ) {
Nan::Export( target, "dasum", node_dasum );
}
NODE_MODULE( addon, Init )
}
Now that we have our C interface, we can create our add-on wrapper.
Similar to before, we use NAN.
And similar to before, we perform some basic input argument sanity checks before unwrapping input values. One thing to note is that we need to re-purpose the underlying TypedArray buffer as a C vector. This can be a relatively expensive operation, especially for small vectors.
Once we have unwrapped our input arguments, we pass them to our C function and set the return value.
$ gfortran \
-std=f95 \
-ffree-form \
-O3 \
-Wall \
-Wextra \
-Wimplicit-interface \
-fno-underscoring \
-pedantic \
-fPIC \
-c \
-o dasum.o \
dasum.f
$ gfortran \
-std=f95 \
-ffree-form \
-O3 \
-Wall \
-Wextra \
-Wimplicit-interface \
-fno-underscoring \
-pedantic \
-fPIC \
-c \
-o dasumsub.o \
dasumsub.f
$ gcc \
-std=c99 \
-O3 \
-Wall \
-pedantic \
-fPIC \
-I ../include \
-c \
-o dasum_f.o \
dasum_f.c
$ gcc -o dasum dasum_f.o dasumsub.o dasum_f.o -lgfortran
Compiling our add-on is not as straightforward as before. Recall that I mentioned that GYP is oriented toward C/C++, and, here, we have to compile Fortran. Accordingly, we'll need to teach GYP how to compile Fortran, and our configuration will become considerably more complex.
Forgetting the add-on for a second, if we were going to compile just the C and Fortran, we would do something like the following.
First, we would need to compile our Fortran files, specifying various command-line options.
Next, we would compile our C files, once again specifying various command-line options.
After compiling our source files, we would link them together into a single library, making sure to include the standard Fortran libraries.
To compile our add-on, we will need to translate this sequence, or something similar, to a GYP configuration file.
# binding.gyp
{
'variables': {
'addon_target_name%': 'addon',
'addon_output_dir': './src',
'fortran_compiler%': 'gfortran',
'fflags': [
'-std=f95',
'-ffree-form',
'-O3',
'-Wall',
'-Wextra',
'-Wimplicit-interface',
'-fno-underscoring',
'-pedantic',
'-c',
],
'conditions': [
[
'OS=="win"',
{
'obj': 'obj',
},
{
'obj': 'o',
}
],
],
},
We begin by defining variables.
While GYP automatically sets C/C++ compiler flags, we must explicitly list Fortran compiler flags and explicitly define the Fortran compiler we want to use.
# binding.gyp (cont.)
'targets': [
{
'target_name': '<(addon_target_name)',
'dependencies': [],
'include_dirs': [
'<!(node -e "require(\'nan\')")',
'../include',
],
'sources': [
'dasum.f',
'dasumsub.f',
'dasum_f.c',
'addon.cpp'
],
'link_settings': {
'libraries': [
'-lgfortran',
],
'library_dirs': [],
},
'cflags': [
'-Wall',
'-O3',
],
'cflags_c': [
'-std=c99',
],
'cflags_cpp': [
'-std=c++11',
],
'ldflags': [],
'conditions': [
[
'OS=="mac"',
{
'ldflags': [
'-undefined dynamic_lookup',
'-Wl,-no-pie',
'-Wl,-search_paths_first',
],
},
],
[
'OS!="win"',
{
'cflags': [
'-fPIC',
],
},
],
],
After defining variables, we can begin defining targets.
Similar to before, we define the target name, this time using variable expansion, and list the source files to compile.
We then define various command-line flags depending on the host platform.
# binding.gyp (cont.)
'rules': [
{
'extension': 'f',
'inputs': [
'<(RULE_INPUT_PATH)'
],
'outputs': [
'<(INTERMEDIATE_DIR)/<(RULE_INPUT_ROOT).<(obj)'
],
'conditions': [
[
'OS=="win"',
{
'rule_name': 'compile_fortran_windows',
'process_outputs_as_sources': 0,
'action': [
'<(fortran_compiler)',
'<@(fflags)',
'<@(_inputs)',
'-o',
'<@(_outputs)',
],
},
{
'rule_name': 'compile_fortran_linux',
'process_outputs_as_sources': 1,
'action': [
'<(fortran_compiler)',
'<@(fflags)',
'-fPIC',
'<@(_inputs)',
'-o',
'<@(_outputs)',
],
}
],
],
},
],
},
In order to compile the Fortran files, we have to tell GYP how to process them, and we do so by defining a rule which is triggered based on a file's filename extension.
We explicitly specify the input and output arguments which will be used in command execution using GYP defined variables.
Next, we define the action to take (i.e., the compile command to invoke) based on the target platform.
Now, when GYP creates the add-on target, it will compile Fortran files using the specified Fortran compiler and flags.
# binding.gyp (cont.)
{
'target_name': 'copy_addon',
'type': 'none',
'dependencies': [
'<(addon_target_name)',
],
'actions': [
{
'action_name': 'copy_addon',
'inputs': [],
'outputs': [
'<(addon_output_dir)/<(addon_target_name).node',
],
'action': [
'cp',
'<(PRODUCT_DIR)/<(addon_target_name).node',
'<(addon_output_dir)/<(addon_target_name).node',
],
},
],
},
],
}
Lastly, we add one more target to our binding.gyp
file, and the purpose of this target is to move the compiled add-on to a standard location.
The main takeaway here is that GYP supports target dependencies. Here, the copy_addon
target will not run until after the add-on has been compiled. For those familiar with make , this is similar to Makefile prerequisites.
$ cd path/to/dasum/binding.gyp
$ node-gyp configure
# node-gyp configure --msvs_version=2015
$ node-gyp build
Similar to hypot
, to build the add-on, we navigate to the add-on directory containing the binding.gyp
file, generate the build files using the configure subcommand, and run build to compile the add-on.
/* dasum.js */
var dasum = require( './path/to/src/addon.node' ).dasum;
var x = new Float64Array( [ 1.0, -2.0, 3.0, -4.0, 5.0 ] );
var s = dasum( x.length, x, 1 );
// returns 15.0
To use the add-on, we require the add-on and invoke the exported method.
Length
JavaScript
Native
Perf
10
22,438,020
7,435,590
0.33x
100
4,350,384
4,594,292
1.05x
1,000
481,417
827,513
1.71x
10,000
28,186
97,695
3.46x
100,000
1,617
9,471
5.85x
1,000,000
153
873
5.7x
To measure add-on performance, we benchmark against an equivalent implementation written in plain JavaScript. Each row in the table corresponds to an input array length. The two middle columns correspond to operations per second. And the last column is the relative performance of the native add-on to the JavaScript implementation.
As we can see, for small arrays, JavaScript is significantly faster, but that advantage disappears as soon as an input array has 100 elements.
As I mentioned earlier, array unwrapping and reinterpretation as a C vector can have a significant impact on performance for small arrays. However, that cost is largely constant, becoming negligible as array length increases.
For large input arrays, the add-on is significantly more performant, nearly 6 times more performant than the equivalent JavaScript implementation.
/* dasum_cblas.h */
#ifndef DASUM_CBLAS_H
#define DASUM_CBLAS_H
#ifdef __cplusplus
extern "C" {
#endif
double cblas_dasum( const int N, const double *X, const int stride );
#ifdef __cplusplus
}
#endif
#endif
Our BLAS journey is not, however, over. The Fortran reference implementation does not take into account hardware capabilities or chip architecture and, thus, is not the most performant.
For optimal performance, we would rather use hardware optimized BLAS libraries, if available. For instance, on MacOS, we could use the Apple Accelerate Framework. On Intel chips, we could use Intel's Math Kernel Library (MKL). For a cross-platform hardware optimized library, we could use OpenBLAS.
As an example, if we wanted to use the Apple Accelerate Framework, we could proceed as follows.
First, we need to create a header file defining the prototype of the function we want to use. The function signature is the same, as before, but now we are using the CBLAS naming convention.
/* dasum_cblas.c */
#include "dasum.h"
#include "dasum_cblas.h"
double c_dasum( const int N, const double *X, const int stride ) {
return cblas_dasum( N, X, stride );
}
Next, to prevent having to create multiple add-on files, we create a wrapper having the same name c_dasum
as before.
# binding.gyp
{
'variables': {
'addon_target_name%': 'addon',
'addon_output_dir': './src',
},
'targets': [
{
'target_name': '<(addon_target_name)',
'dependencies': [],
'include_dirs': [
'<!(node -e "require(\'nan\')")',
'./../include',
],
'sources': [
'dasum_cblas.c',
'addon.cpp'
],
'link_settings': {
'libraries': [
'-lblas',
],
'library_dirs': [],
},
'cflags': [
'-Wall',
'-O3',
],
'cflags_c': [
'-std=c99',
],
'cflags_cpp': [
'-std=c++11',
],
'ldflags': [
'-undefined dynamic_lookup',
'-Wl,-no-pie',
'-Wl,-search_paths_first'
],
},
{
'target_name': 'copy_addon',
'type': 'none',
'dependencies': [
'<(addon_target_name)',
],
'actions': [
{
'action_name': 'copy_addon',
'inputs': [],
'outputs': [
'<(addon_output_dir)/<(addon_target_name).node',
],
'action': [
'cp',
'<(PRODUCT_DIR)/<(addon_target_name).node',
'<(addon_output_dir)/<(addon_target_name).node',
],
},
],
},
],
}
We can modify the binding.gyp
file to no longer include configuration settings and rules for compiling Fortran files.
Instead, we specify the library we want to link to and update the source file list.
Building and compiling the add-on follows the same procedure as before.
Length
JavaScript
wasm
Native
Perf
10
22,438,020
18,226,375
7,084,870
0.31x
100
4,350,384
6,428,586
6,428,626
1.47x
1,000
481,417
997,234
3,289,090
6.83x
10,000
28,186
110,540
355,172
12.60x
100,000
1,617
11,157
30,058
18.58x
1,000,000
153
979
1,850
12.09x
When we benchmark the hardware optimized BLAS libraries against equivalent implementations in JavaScript, we get the following results.
As with the reference implementation, the add-on is slower for short array lengths.
However, as we increase the array length, the add-on achieves significantly better performance even for an array length of 100 and better performance when compared to the reference implementation.
Note that I have also included WebAssembly benchmarks. For those hoping that WebAssembly will remove the need for native add-ons and provide equivalent performance, you are mistaken.
The main conclusion of these results is to use a hardware optimized library when available. These results are simply not possible otherwise.
Challenges
Bugs
Standards
Proprietary
Windows
Portability
Complexity
At this point, you may be excited seeing a 20x improvement. One small problem, however: detecting and/or installing hardware optimized libraries is hard.
The first problem is that some hardware optimized libraries contain bugs, so you need to provide patches; e.g., Apple Accelerate Framework.
Next, resolving library installation locations in a robust cross-platform way is difficult, as no standard locations or naming conventions exist.
Third, some hardware optimized libraries are proprietary and cannot be guaranteed to exist on a target platform.
Fourth, hardware optimized BLAS on Windows is especially painful. And in fact, in general, Fortran BLAS is painful on Windows, and node-gyp cannot compile Fortran on Windows due to node-gyp's dependency on Microsoft Visual Studio, which does not include a Fortran compiler.
Fifth, while OpenBLAS is close, there is no fully robust and fully cross-platform hardware optimized BLAS library that you can install alongside your add-on.
...which means that you always need to ship a reference implementation fallback, and, for those environments where you cannot compile your native add-on, you also need to ship a pure JavaScript fallback.
In short, to handle cross-platform complexity, your binding.gyp
files become complex very quickly.
There is one other issue, and it is an issue perhaps a bit peculiar to the world of JavaScript. And that issue is modularity .
Modularity
In short, issues arise when you want to use source files from other packages, similar to "requiring" a module dependency.
For example, in stdlib, we have BLAS implementations, which are often used in other BLAS implementations. In a single library like BLAS, resolving individual implementations is straightforward as dependencies often reside in the same directory. In our case, dependencies are not co-localized, and, in fact, in the general case, we cannot assume dependency locations due to variability in the package dependency tree (e.g., a dependency could be a sibling or a descendant or even reside in a global package directory).
And further, dependency source files can change based on the environment (e.g., whether a system library exists, or a third party library, or a reference implementation fallback, or lack of a Fortran compiler, etc.).
If we were only concerned with BLAS, one solution to this problem might be to simply include all of BLAS with each individual package and only expose the desired functionality. After all, a linker is the original tree shaker.
Just two problems. First, we don't want to have to download all of BLAS in order to build a package exporting a single function. And we certainly don't want to ship all of BLAS with each individual package, as that could lead to a massive amount of duplicated code being sent over the network.
Second, this solution does not scale. If your add-on depends on 10 functions, each from a different monolithic library, then you would have to download and install 10 different monolithic libraries for each build (intelligent caching aside).
Some might argue that this is an argument against hypermodularity and in favor of kitchen-sink type libraries.
This retort is, however, incorrect. The principle of modularity is an integral part of good software: only ship what you need when you need it, nothing more, end of story.
In this case, what is needed is a set of tools to help us better think about modularity in the context of native add-ons.
{
"options": {
"os": "linux",
"blas": "",
"wasm": false
},
"fields": [
{
"field": "src",
"resolve": true,
"relative": true
},
{
"field": "include",
"resolve": true,
"relative": true
},
{
"field": "libraries",
"resolve": false,
"relative": false
},
{
"field": "libpath",
"resolve": true,
"relative": false
}
],
"confs": [
{
"os": "linux",
"blas": "",
"wasm": false,
"src": [
"./src/dasum.f",
"./src/dasumsub.f",
"./src/dasum_f.c"
],
"include": [
"./include"
],
"libraries": [],
"libpath": [],
"dependencies": []
},
{
"os": "linux",
"blas": "openblas",
"wasm": false,
"src": [
"./src/dasum_cblas.c"
],
"include": [
"./include"
],
"libraries": [
"-lopenblas",
"-lpthread"
],
"libpath": [],
"dependencies": []
},
{
"os": "mac",
"blas": "",
"wasm": false,
"src": [
"./src/dasum.f",
"./src/dasumsub.f",
"./src/dasum_f.c"
],
"include": [
"./include"
],
"libraries": [],
"libpath": [],
"dependencies": []
},
{
"os": "mac",
"blas": "apple_accelerate_framework",
"wasm": false,
"src": [
"./src/dasum_cblas.c"
],
"include": [
"./include"
],
"libraries": [
"-lblas"
],
"libpath": [],
"dependencies": []
},
{
"os": "mac",
"blas": "openblas",
"wasm": false,
"src": [
"./src/dasum_cblas.c"
],
"include": [
"./include"
],
"libraries": [
"-lopenblas",
"-lpthread"
],
"libpath": [],
"dependencies": []
},
{
"os": "win",
"blas": "",
"wasm": false,
"src": [
"./src/dasum.c"
],
"include": [
"./include"
],
"libraries": [],
"libpath": [],
"dependencies": []
},
{
"os": "",
"blas": "",
"wasm": true,
"src": [
"./src/dasum.c"
],
"include": [
"./include"
],
"libraries": [],
"libpath": [],
"dependencies": []
}
]
}
To address this challenge in stdlib, we leverage the same algorithm used by `require` to resolve dependencies.
Namely, we create a manifest.json
file for each package which lists source files based on environment conditions as well as any package dependencies containing source files we want to use.
When compiling a package, we load the manifest.json
, walk the dependency tree, resolve source files tailored to both configuration and environment, and then dynamically populate GYP variables before compiling add-ons.
This allows us to decompose traditionally monolithic libraries into separate components, while maintaining dependency resolution.
I should note that the approach outlined is applicable more generally to all native add-ons, including those outside of stdlib.
If a third party add-on were to include a `manifest.json` which advertised source files, a stdlib add-on would be able to use the functionality contained therein in its implementation.
I should also mention, the approach I just outlined is not wholly a new idea--several people have tried their hands at building a C/C++ package manager, often inspired by npm--but I have yet to see an approach which allows explicitly resolving add-on dependencies within a node_modules dependency tree.
If you are interested in learning more about we do things, see stdlib.
A Node API for Node.js native add-ons.
Features
Stability
Compatibility
VM Neutrality
A stable API abstraction. Similar in its goals as NAN.
Compatibility across Node versions.
Key differentiator: same API across Node VMs; e.g., V8, Chakra, etc.
In short, N-API promises native add-ons which simply work. :)
/* addon.cpp */
#include <node_api.h>
#include <assert.h>
#include "hypot.h"
namespace addon_hypot {
napi_value node_hypot( napi_env env, napi_callback_info info ) {
napi_status status;
size_t argc = 2;
napi_value argc[ 2 ];
status = napi_get_cb_info( env, info, &argc, args, nullptr, nullptr );
assert( status == napi_ok );
if ( argc < 2 ) {
napi_throw_type_error( env, "invalid invocation. Must provide 2 arguments." );
return nullptr;
}
napi_value vtype0;
status = napi_typeof( env, args[ 0 ], &vtype0 );
assert( status == napi_ok );
if ( vtype0 != napi_number ) {
napi_throw_type_error( env, "invalid input argument. First argument must be a number." );
return nullptr;
}
napi_value vtype1;
status = napi_typeof( env, args[ 0 ], &vtype1 );
assert( status == napi_ok );
if ( vtype1 != napi_number ) {
napi_throw_type_error( env, "invalid input argument. Second argument must be a number." );
return nullptr;
}
const double x;
status = napi_get_value_double( env, args[ 0 ], &x );
assert( status == napi_ok );
const double y;
status = napi_get_value_double( env, args[ 1 ], &y );
assert( status == napi_ok );
napi_value h;
status = napi_create_number( env, c_hypot( x, y ), &h );
assert( status == napi_ok );
return h;
}
#define DECLARE_NAPI_METHOD( name, func ) { name, 0, func, 0, 0, 0, napi_default, 0 }
void Init( napi_env env, napi_value exports, napi_value module, void* priv ) {
napi_status status;
napi_property_descriptor addDescriptor = DECLARE_NAPI_METHOD( "hypot", node_hypot );
status = napi_define_properties( env, exports, 1, &addDescriptor );
assert( status == napi_ok );
}
NAPI_MODULE( addon, Init )
}
As an example of what an N-API add-on might look like, and I say might because the implementation is still experimental, here is the hypot
add-on refactored from NAN to N-API.
The first notable difference is that we no longer directly call V8 methods, and, instead, everything goes through N-API.
The second notable difference is the usage of return value references and the returning of status
values.
Otherwise, we still need to export an initialization function and an add-on still follows the same general structure.
Conclusions
Parity
Performance
Progress
Intentionally left blank.