Node.js Add-ons for High Performance Numeric Computing 
				
			 
			
			
			
				Survey 
				
					
						Who here has heard of Node.js native add-ons? 
						Anyone here hearing about Node.js native add-ons for the first time? 
						Who here has written a native add-on? 
						For those who have written native add-ons, why have you done so? 
						How has your experience been writing native add-ons? Negative? Positive? Somewhere in between? 
						Who here has used a Node.js native add-on for numeric computing? 
					 
				 
			 
			
				Overview 
				
					
					
						Intro 
						Toolchain 
						Numeric Computing 
						Basic Example 
						BLAS 
						Performance 
						Challenges 
						N-API 
						Conclusions 
					 
				 
				
					
						This talk will be technical and contains many slides displaying source code. I won't spend much time on those slides, only pausing long enough to highlight key points. If I move too quickly, this talk is on-line with notes so you can revisit during and after this conference.
					
					
						The talk overview is as follows...
					
					
   						
   							First, I will provide an overview of Node.js native add-ons.
   						 
   						
   							Next, I will introduce the current toolchain for authoring add-ons.
   						 
   						
   							Then, I will discuss why native add-ons are important for numeric computing.
   						 
   						
  							I'll follow by showing a basic native add-on example.
  						 
   						
   							After the basic example, I'll show a more complex example where we need to write an add-on which links a BLAS library written in Fortran to the JavaScript runtime.
   						 
   						
   							Next, I will show performance comparisons.
   						 
   						
   							Then, I'll discuss some of the challenges we have faced writing native add-ons for numeric computing and how we have addressed them.
   						 
   						
   							Before concluding, I will mention N-API, an application binary interface, or ABI, which will provide a stable abstraction layer over JavaScript engines.
   						 
   						
   							And finally, I will offer some conclusions and additional resources you can use to get started using Node.js native add-ons for high-performance numeric computing.
   						 
   					 
				 
			 
			
			
			
			
				
					Interface between JS running in Node.js and C/C++ libraries
				
				
					
						A Node.js native add-on provides an interface between JavaScript running in Node.js and, primarily, C/C++ libraries.
					
					
						From the perspective of Node.js applications, an add-on is just another module which an application can require.
					
				 
			 
			
			
				APIs 
				
					
					
						V8 
						libuv 
						Internal Libraries 
						Dependencies 
					 
				 
				
			 
			
			
			
			
				Why? 
				
					Why would you choose a native add-on over plain JavaScript?
				 
			 
			
				
					
					
						Leverage existing codebases 
						Access lower-level APIs 
						Non-JavaScript features 
						Performance 
					 
				 
				
					
						Four primary reasons...
					
					
						One reason is that you want to link to existing C/C++ libraries. This allows you to avoid having to port and re-implement functionality in JavaScript, which, for larger libraries, can require significant time and investment.
					
					
     					Next, you may want to access lower-level APIs, such as worker threads.
     				
     				
     					Third, you may need language features not available in JavaScript, such as 64-bit integers or SIMD.
     				
     					Last, you need a performance boost, including leveraging hardware optimization.
     				
				 
			 
			
				
					
						Now that we have motivated why we may want to use add-ons, how do we go about doing so?
					
				 
			 
			
				Toolchain 
				
					This brings us to the native add-on toolchain.
				 
			 
			
			
				node-gyp 
				
					
						We begin with node-gyp, which is a cross-platform command-line tool written primarily in JavaScript and is used for compiling native add-ons for Node.js.
					
					
						node-gyp bundles GYP and automatically downloads necessary development files and headers for target Node.js versions.
					
				 
			 
			
				GYP 
				
					
     					GYP, which stands for "generate your project", is a meta-build system, which builds other build systems, depending on the target platform.
     				
     				
     					The aim of GYP is to replicate, as closely as possible, the native build setup of a target platform IDE. For example, on MacOS, that means generating XCode projects. On Windows, Visual Studio projects.
     				
     				
						And once GYP generates the build system, we can compile our add-on.
     				
				 
			 
			
				
					Historically, developing native add-ons has been a difficult process.
				 
			 
			
				Challenges 
				
				
					
						Here have been some of the challenges.
					
					
						The foremost challenge has been handling breaking changes in V8.
					
    				
    					Each Node.js major release entailed a new V8. In the past, the V8 team was not concerned about backward compatibility and would often introduce significant changes, removing, replacing, and adding interfaces and functionality. These changes would force add-on authors to rewrite their packages, publish a new major version, and make providing backward compatibility extremely difficult.
    				
					
     					To alleviate some of the "pain" of native add-ons, members of the Node.js community created a package NAN, which stands for Native Abstractions for Node.js.
     				
     				
     					NAN attempts to provide a stable abstraction layer that native add-on authors can target. Internally, NAN handles the complex logic required to maintain functionality from one V8 version to the next.
     				
     				
     					And while NAN has been beneficial, even it has had breaking changes in its API.
     				
					
    					Another issue is GYP. GYP was designed with a particular use case in mind: Chrome. It was not explicitly designed for Node.js add-ons.
    				
    				
    					Further, GYP documentation is either poor or incomplete, presenting significant challenges whenever you want to do something beyond simple "hello world" type examples.
    				
    				
    					Because of poor documentation, you spend considerable time searching the Internet and looking for other projects using GYP to see how those projects handle special configurations. And in particular, anytime you want to use GYP to compile languages other than C/C++, e.g., Fortran, Cuda, Rust, and Golang, good luck.
    				
    				
    					Resources are few and far between.
    				
					
    					A more forward looking concern is that node-gyp is biased toward V8. Meaning the toolchain is not engine neutral. This means compiling Node.js and Node.js native add-ons with alternative engines, such as Chakra, is less straightforward, requiring shims like Chakrashim.
					
				 
			 
			
				
					Despite these challenges, native add-ons are highly important for numeric computing.
				 
			 
			
				Numeric Computing 
				
					
						Native add-ons are important for numeric computing because they allow us to interface with high-performance numeric computing libraries written in Fortran/C/C++.
					
   					
   						What you find when reading the source code of Julia, R, and Python libraries like NumPy and SciPy is that a substantial amount of the functionality they expose relies on providing wrappers for existing numeric computing code bases written in C/C++ and Fortran.
   					
   					
   						For example, for high-performance linear algebra, these platforms wrap BLAS and LAPACK. For fast Fourier transforms, they wrap FFTW. For BigInt, Julia wraps GMP. For BigFloat, Julia wraps MPFR.
   					
   					
   						Node.js native add-ons allow us to do something similar; namely, we can expose high-performance numeric computing functionality to Node.js and to JavaScript.
   					
   					
   						This means we can leverage highly optimized libraries which have been used with great success for decades and not spend time rewriting implementations.
   					
   					
						In summary, native add-ons allow us to do in Node.js what other environments used for numeric computing can do.
					
				 
			 
			
				
					At this point, we have discussed, at a high-level, what native add-ons are, their toolchain, some challenges, and motivated why they are important for numeric computing. Let's now discuss a basic example...
				 
			 
			
			
				
/* hypot.h */
#ifndef C_HYPOT_H
#define C_HYPOT_H
#ifdef __cplusplus
extern "C" {
#endif
double c_hypot( const double x, const double y );
#ifdef __cplusplus
}
#endif
#endif
				 
				
					
						The example I am going to use is a simple function to compute the hypotenuse, avoiding underflow and overflow.
					
					
						We first define a basic header file defining the interface to the function exported to the JavaScript runtime, taking care to guard against name mangling and ensuring similar behavior as might be observed when using a standard C compiler.
					
				 
			 
			
				
/* hypot.c */
#include <math.h>
#include "hypot.h"
double c_hypot( const double x, const double y ) {
    double tmp;
    double a;
    double b;
    if ( isnan( x ) || isnan( y ) ) {
        return NAN;
    }
    if ( isinf( x ) || isinf( y ) ) {
        return INFINITY;
    }
    a = x;
    b = y;
    if ( a < 0.0 ) {
        a = -a;
    }
    if ( b < 0.0 ) {
        b = -b;
    }
    if ( a < b ) {
        tmp = b;
        b = a;
        a = tmp;
    }
    if ( a == 0.0 ) {
        return 0.0;
    }
    b /= a;
    return a * sqrt( 1.0 + (b*b) );
}
				 
				
					
						Next, we write our implementation. This is a standard C implementation which imports the standard math library and includes a function which accepts two arguments, x and y, and returns a numeric result.
					
				 
			 
			
				
/* addon.cpp */
#include <nan.h>
#include "hypot.h"
namespace addon_hypot {
    using Nan::FunctionCallbackInfo;
    using Nan::ThrowTypeError;
    using Nan::ThrowError;
    using v8::Number;
    using v8::Local;
    using v8::Value;
    void node_hypot( const FunctionCallbackInfo<Value>& info ) {
        if ( info.Length() != 2 ) {
            ThrowError( "invalid invocation. Must provide 2 arguments." );
            return;
        }
        if ( !info[ 0 ]->IsNumber() ) {
            ThrowTypeError( "invalid input argument. First argument must be a number." );
            return;
        }
        if ( !info[ 1 ]->IsNumber() ) {
            ThrowTypeError( "invalid input argument. Second argument must be a number." );
            return;
        }
        const double x = info[ 0 ]->NumberValue();
        const double y = info[ 1 ]->NumberValue();
        Local<Number> h = Nan::New( c_hypot( x, y ) );
        info.GetReturnValue().Set( h );
    }
    NAN_MODULE_INIT( Init ) {
        Nan::Export( target, "hypot", node_hypot );
    }
    NODE_MODULE( addon, Init )
}
				 
				
					
						Once our implementation is finished, we create a wrapper written in C++ which calls the C function.
					
					
						Note the inclusion of NAN and recall that NAN provides a stable API across V8 versions.
					
					
						Most of the C++ is unwrapping and wrapping object values. The function wrapper takes a single argument: the arguments object. We perform basic input value sanity checks and then proceed to unwrap the individual arguments x and y. Once we have x and y, we call our C function and set the return value.
					
					
						And finally, we end by exporting an initialization function, which is required of all Node.js native add-ons.
					
					
						I should note that we did not have to write the implementation in a separate C file. We could have written directly in our add-on file; however, using a separate file a) facilitates re-usability of source files in non-add-on contexts and b) is a more common scenario when working with existing codebases.
					
				 
			 
			
				
# binding.gyp
{
  'targets': [
    {
      'target_name': 'addon',
      'sources': [
        'addon.cpp',
        'hypot.c'
      ],
      'include_dirs': [
		'<!(node -e "require(\'nan\')")',
		'./'
      ]
    }
  ]
}
				 
				
					
						We then create a minimal binding.gyp  file, which GYP uses to generate the build project for the target platform.
					
					
						In GYP terminology, we define a target--here, the target name is "addon".
					
					
						The sources  field indicates each source file which should be compiled.
					
					
						And finally, we list the directories which contain header files. Here, we use GYP command expansion to instruct Node to evaluate the string containing the require statement. When NAN is required, it prints the location of its header files to stdout, thus allowing us to dynamically resolve the NAN header file location.
					
				 
			 
			
				
# Navigate to add-on directory:
$ cd path/to/hypot/binding.gyp
# Generate build files:
$ node-gyp configure
# On Windows:
# node-gyp configure --msvs_version=2015
# Build add-on:
$ node-gyp build
				 
				
					
						Once we have created the four files, we can now build our add-on.
					
					
						To do so, we navigate to the add-on directory containing the binding.gyp file.
					
					
						Then we generate the build files using the configure  subcommand. This step generates the build files. On Unix platforms, this will be a Makefile. On Windows, this will be a vcxproj file.
					
					
						Finally, we run build  to actually compile the add-on.
					
				 
			 
			
				
/* hypot.js */
var hypot = require( './path/to/build/Release/addon.node' ).hypot;
var h = hypot( 5.0, 12.0 );
// returns 13.0
				 
				
					
						After compiling an add-on, we can use the exported function in JavaScript.
					
					
						As may be observed in the require statement, the add-on exports an object with a method, whose name matches the export we defined when we wrote our addon.cpp file.
					
					
						When invoked, this function behaves just like a comparable function implemented in JavaScript, accepting numeric arguments and returning a number.
					
				 
			 
			
				
					
						
							 
							ops/sec 
							perf 
						 
					 
					
						
							Builtin 
							3,954,799 
							1x 
						 
						
							Native 
							4,732,108 
							1.2x 
						 
						
							JavaScript 
							7,337,790 
							1.85x 
						 
					 
				
				
					
						Okay, so we have implemented our add-on and we're ready to use it, but we should ask ourselves: how does the performance compare to an implementation written purely in JavaScript?
					
					
						Here are benchmark results run on my laptop running Node.js version 8 which is using one of the latest versions of V8.
					
					
						In the first row, I am showing the results for the builtin hypot function provided by the JavaScript standard math library. We see that, on my machine, we can compute the hypotenuse around 4 million times per second.
					
					
						On the next row, I am showing the results of our native add-on. We see that we get a slight performance boost of around 800,000 operations per second.
					
					
						Lastly, on the third row, I am showing the results of an equivalent implementation written purely in JavaScript. We can see that, when compared to add-on performance, we can perform 2.5 million more operations per second, which is a significant performance boost.
					
					
						Two comments. First, simply because we can write code in C, this does not mean we will achieve better performance by doing so, due to overhead when calling into an add-on. Second, simply because something is a standard, this does not mean the function is fast, in an absolute sense. You can often achieve better performance in JavaScript via userland implementations by restricting the domain of input argument types and choosing your algorithms wisely.
					
				 
			 
			
			
				
				
					
						BLAS , or Basic Linear Algebra Subprograms, are routines that provide standard implementations for performing basic vector and matrix operations.
					
					
						BLAS routines are split into three levels. Level 1 BLAS routines perform scalar, vector and vector-vector operations. Level 2 BLAS routines perform matrix-vector operations. Level 3 BLAS routines perform matrix-matrix operations.
					
					
						BLAS routines are commonly used in the development of high quality linear algebra software (e.g., LAPACK) due to their efficiency, portability, and wide availability. And they are foundational for most modern numeric computing environments.
					
				 
			 
			
				
! dasum.f
! Computes the sum of absolute values.
double precision function dasum( N, dx, stride )
  implicit none
  integer :: stride, N
  double precision :: dx(*)
  double precision :: dtemp
  integer :: nstride, mp1, m, i
  intrinsic dabs, mod
  ! ..
  dasum = 0.0d0
  dtemp = 0.0d0
  ! ..
  if ( N <= 0 .OR. stride <= 0 ) then
    return
  end if
  ! ..
  if ( stride == 1 ) then
    m = mod( N, 6 )
    if ( m /= 0 ) then
      do i = 1, m
        dtemp = dtemp + dabs( dx( i ) )
      end do
      if ( N < 6 ) then
        dasum = dtemp
        return
      end if
    end if
    mp1 = m + 1
    do i = mp1, N, 6
      dtemp = dtemp + &
        dabs( dx( i ) ) + dabs( dx( i+1 ) ) + &
        dabs( dx( i+2 ) ) + dabs( dx( i+3 ) ) + &
        dabs( dx( i+4 ) ) + dabs( dx( i+5 ) )
    end do
  else
    nstride = N * stride
    do i = 1, nstride, stride
      dtemp = dtemp + dabs( dx( i ) )
    end do
  end if
  dasum = dtemp
  return
end function dasum
				 
				
					
						An example BLAS routine is the Level 1 function dasum, which stands for double absolute value sum. As suggested by the name, the function computes the sum of absolute values for a vector whose values are of type double.
					
					
						The algorithm is not particularly interesting, but should provide an illustrative example as to how we might expose this or a similar function to JavaScript.
					
				 
			 
			
				
! dasumsub.f
! Wraps dasum as a subroutine.
subroutine dasumsub( N, dx, stride, sum )
  implicit none
  ! ..
  interface
    double precision function dasum( N, dx, stride )
      integer :: stride, N
      double precision :: dx(*)
    end function dasum
  end interface
  ! ..
  integer :: stride, N
  double precision :: sum
  double precision :: dx(*)
  ! ..
  sum = dasum( N, dx, stride )
  return
end subroutine dasumsub
				 
				
					
						Recall that Node.js native add-ons are intended to provide C/C++ bindings. In which case, the first thing we need to do is provide a C interface to the Fortran function.
					
					
						The first obstacle in doing so is that we cannot use dasum directly from C because Fortran expects arguments to be passed by reference rather than by value. Furthermore, while not applicable here, Fortran functions can only return scalar values, not arrays. Thus, the general best practice is to wrap Fortran functions as subroutines (equivalent of a C function returning (void)), where we can pass a pointer for storing the output return value.
					
					
						In which case, the first thing we have to do is wrap our Fortran function as a Fortran subroutine.
					
				 
			 
			
				
/* dasum_fortran.h */
#ifndef DASUM_FORTRAN_H
#define DASUM_FORTRAN_H
#ifdef __cplusplus
extern "C" {
#endif
void dasumsub( const int *, const double *, const int *, double * );
#ifdef __cplusplus
}
#endif
#endif
				 
				
					
						Similar to our hypot example, we create a header file defining the subroutine prototype.
					
				 
			 
			
				
/* dasum.h */
#ifndef DASUM_H
#define DASUM_H
#ifdef __cplusplus
extern "C" {
#endif
double c_dasum( const int N, const double *X, const int stride );
#ifdef __cplusplus
}
#endif
#endif
				 
				
					
						We can now create the header file containing the prototype for our C wrapper, using the naming convention in which we attach c_ as a prefix to the function name.
					
				 
			 
			
				
/* dasum_f.c */
#include "dasum.h"
#include "dasum_fortran.h"
double c_dasum( const int N, const double *X, const int stride ) {
    double sum;
    dasumsub( &N, X, &stride, &sum );
    return sum;
}
				 
				
					
						Our C wrapper is fairly straightforward, passing all arguments by reference to the Fortran subroutine.
					
				 
			 
			
				
/* addon.cpp */
#include <nan.h>
#include "dasum.h"
namespace addon_dasum {
    using Nan::FunctionCallbackInfo;
    using Nan::TypedArrayContents;
    using Nan::ThrowTypeError;
    using Nan::ThrowError;
    using v8::Number;
    using v8::Local;
    using v8::Value;
    void node_dasum( const FunctionCallbackInfo<Value>& info ) {
        if ( info.Length() != 3 ) {
            ThrowError( "invalid invocation. Must provide 3 arguments." );
            return;
        }
        if ( !info[ 0 ]->IsNumber() ) {
            ThrowTypeError( "invalid input argument. First argument must be a number." );
            return;
        }
        if ( !info[ 2 ]->IsNumber() ) {
            ThrowTypeError( "invalid input argument. Third argument must be a number." );
            return;
        }
        const int N = info[ 0 ]->Uint32Value();
        const int stride = info[ 2 ]->Uint32Value();
        TypedArrayContents<double> X( info[ 1 ] );
        Local<Number> sum = Nan::New( c_dasum( N, *X, stride ) );
        info.GetReturnValue().Set( sum );
    }
    NAN_MODULE_INIT( Init ) {
        Nan::Export( target, "dasum", node_dasum );
    }
    NODE_MODULE( addon, Init )
}
				 
				
					
						Now that we have our C interface, we can create our add-on wrapper.
					
					
						Similar to before, we use NAN.
					
					
						And similar to before, we perform some basic input argument sanity checks before unwrapping input values. One thing to note is that we need to re-purpose the underlying TypedArray buffer as a C vector. This can be a relatively expensive operation, especially for small vectors.
					
					
						Once we have unwrapped our input arguments, we pass them to our C function and set the return value.
					
				 
			 
			
				
$ gfortran \
    -std=f95 \
    -ffree-form \
    -O3 \
    -Wall \
    -Wextra \
    -Wimplicit-interface \
    -fno-underscoring \
    -pedantic \
    -fPIC \
    -c \
    -o dasum.o \
    dasum.f
$ gfortran \
    -std=f95 \
    -ffree-form \
    -O3 \
    -Wall \
    -Wextra \
    -Wimplicit-interface \
    -fno-underscoring \
    -pedantic \
    -fPIC \
    -c \
    -o dasumsub.o \
    dasumsub.f
$ gcc \
    -std=c99 \
    -O3 \
    -Wall \
    -pedantic \
    -fPIC \
    -I ../include \
    -c \
    -o dasum_f.o \
    dasum_f.c
$ gcc -o dasum dasum_f.o dasumsub.o dasum_f.o -lgfortran
				 
				
					
						Compiling our add-on is not  as straightforward as before. Recall that I mentioned that GYP is oriented toward C/C++, and, here, we have to compile Fortran. Accordingly, we'll need to teach GYP how to compile Fortran, and our configuration will become considerably more complex.
					
					
						Forgetting the add-on for a second, if we were going to compile just the C and Fortran, we would do something like the following.
					
					
						First, we would need to compile our Fortran files, specifying various command-line options.
					
					
						Next, we would compile our C files, once again specifying various command-line options.
					
					
						After compiling our source files, we would link them together into a single library, making sure to include the standard Fortran libraries.
					
					
						To compile our add-on, we will need to translate this sequence, or something similar, to a GYP configuration file.
					
				 
			 
			
				
# binding.gyp
{
  'variables': {
    'addon_target_name%': 'addon',
    'addon_output_dir': './src',
    'fortran_compiler%': 'gfortran',
    'fflags': [
      '-std=f95',
      '-ffree-form',
      '-O3',
      '-Wall',
      '-Wextra',
      '-Wimplicit-interface',
      '-fno-underscoring',
      '-pedantic',
      '-c',
    ],
    'conditions': [
      [
        'OS=="win"',
        {
          'obj': 'obj',
        },
        {
          'obj': 'o',
        }
      ],
    ],
  },
				 
				
					
						We begin by defining variables.
					
					
						While GYP automatically sets C/C++ compiler flags, we must explicitly list Fortran compiler flags and explicitly define the Fortran compiler we want to use.
					
				 
			 
			
				
# binding.gyp (cont.)
  'targets': [
    {
      'target_name': '<(addon_target_name)',
      'dependencies': [],
      'include_dirs': [
        '<!(node -e "require(\'nan\')")',
        '../include',
      ],
      'sources': [
        'dasum.f',
        'dasumsub.f',
        'dasum_f.c',
        'addon.cpp'
      ],
      'link_settings': {
        'libraries': [
          '-lgfortran',
        ],
        'library_dirs': [],
      },
      'cflags': [
        '-Wall',
        '-O3',
      ],
      'cflags_c': [
        '-std=c99',
      ],
      'cflags_cpp': [
        '-std=c++11',
      ],
      'ldflags': [],
      'conditions': [
        [
          'OS=="mac"',
          {
            'ldflags': [
              '-undefined dynamic_lookup',
              '-Wl,-no-pie',
              '-Wl,-search_paths_first',
            ],
          },
        ],
        [
          'OS!="win"',
          {
            'cflags': [
              '-fPIC',
            ],
          },
        ],
      ],
				 
				
					
						After defining variables, we can begin defining targets.
					
					
						Similar to before, we define the target name, this time using variable expansion, and list the source files to compile.
					
					
						We then define various command-line flags depending on the host platform.
					
				 
			 
			
				
# binding.gyp (cont.)
      'rules': [
        {
          'extension': 'f',
          'inputs': [
            '<(RULE_INPUT_PATH)'
          ],
          'outputs': [
            '<(INTERMEDIATE_DIR)/<(RULE_INPUT_ROOT).<(obj)'
          ],
          'conditions': [
            [
              'OS=="win"',
              {
                'rule_name': 'compile_fortran_windows',
                'process_outputs_as_sources': 0,
                'action': [
                  '<(fortran_compiler)',
                  '<@(fflags)',
                  '<@(_inputs)',
                  '-o',
                  '<@(_outputs)',
                ],
              },
              {
                'rule_name': 'compile_fortran_linux',
                'process_outputs_as_sources': 1,
                'action': [
                  '<(fortran_compiler)',
                  '<@(fflags)',
                  '-fPIC',
                  '<@(_inputs)',
                  '-o',
                  '<@(_outputs)',
                ],
              }
            ],
          ],
        },
      ],
    },
				 
				
					
						In order to compile the Fortran files, we have to tell GYP how to process them, and we do so by defining a rule  which is triggered based on a file's filename extension.
					
					
						We explicitly specify the input and output arguments which will be used in command execution using GYP defined variables.
					
					
						Next, we define the action  to take (i.e., the compile command to invoke) based on the target platform.
					
					
						Now, when GYP creates the add-on target, it will compile Fortran files using the specified Fortran compiler and flags.
					
				 
			 
			
				
# binding.gyp (cont.)
    {
      'target_name': 'copy_addon',
      'type': 'none',
      'dependencies': [
        '<(addon_target_name)',
      ],
      'actions': [
        {
          'action_name': 'copy_addon',
          'inputs': [],
          'outputs': [
            '<(addon_output_dir)/<(addon_target_name).node',
          ],
          'action': [
            'cp',
            '<(PRODUCT_DIR)/<(addon_target_name).node',
            '<(addon_output_dir)/<(addon_target_name).node',
          ],
        },
      ],
    },
  ],
}
				 
				
					
						Lastly, we add one more target to our binding.gyp file, and the purpose of this target is to move the compiled add-on to a standard location.
					
					
						The main takeaway here is that GYP supports target dependencies. Here, the copy_addon target will not run until after  the add-on has been compiled. For those familiar with make , this is similar to Makefile prerequisites.
					
				 
			 
			
				
$ cd path/to/dasum/binding.gyp
$ node-gyp configure
# node-gyp configure --msvs_version=2015
$ node-gyp build
				 
				
					
						Similar to hypot, to build the add-on, we navigate to the add-on directory containing the binding.gyp file, generate the build files using the configure  subcommand, and run build  to compile the add-on.
					
				 
			 
			
				
/* dasum.js */
var dasum = require( './path/to/src/addon.node' ).dasum;
var x = new Float64Array( [ 1.0, -2.0, 3.0, -4.0, 5.0 ] );
var s = dasum( x.length, x, 1 );
// returns 15.0
				 
				
					
						To use the add-on, we require the add-on and invoke the exported method.
					
				 
			 
			
				
					
						
							Length 
							JavaScript 
							Native 
							Perf 
						 
					 
					
						
							10 
							22,438,020 
							7,435,590 
							0.33x 
						 
						
							100 
							4,350,384 
							4,594,292 
							1.05x 
						 
						
							1,000 
							481,417 
							827,513 
							1.71x 
						 
						
							10,000 
							28,186 
							97,695 
							3.46x 
						 
						
							100,000 
							1,617 
							9,471 
							5.85x 
						 
						
							1,000,000 
							153 
							873 
							5.7x 
						 
					 
				
				
					
						To measure add-on performance, we benchmark against an equivalent implementation written in plain JavaScript. Each row in the table corresponds to an input array length. The two middle columns correspond to operations per second. And the last column is the relative performance of the native add-on to the JavaScript implementation.
					
					
						As we can see, for small arrays, JavaScript is significantly faster, but that advantage disappears as soon as an input array has 100 elements.
					
					
						As I mentioned earlier, array unwrapping and reinterpretation as a C vector can have a significant impact on performance for small arrays. However, that cost is largely constant, becoming negligible as array length increases.
					
					
						For large input arrays, the add-on is significantly more performant, nearly 6 times more performant than the equivalent JavaScript implementation.
					
				 
			 
			
				
/* dasum_cblas.h */
#ifndef DASUM_CBLAS_H
#define DASUM_CBLAS_H
#ifdef __cplusplus
extern "C" {
#endif
double cblas_dasum( const int N, const double *X, const int stride );
#ifdef __cplusplus
}
#endif
#endif
				 
				
					
						Our BLAS journey is not, however, over. The Fortran reference implementation does not take into account hardware capabilities or chip architecture and, thus, is not the most performant.
					
					
						For optimal performance, we would rather use hardware optimized BLAS libraries, if available. For instance, on MacOS, we could use the Apple Accelerate Framework. On Intel chips, we could use Intel's Math Kernel Library (MKL). For a cross-platform hardware optimized library, we could use OpenBLAS.
					
					
						As an example, if we wanted to use the Apple Accelerate Framework, we could proceed as follows.
					
					
						First, we need to create a header file defining the prototype of the function we want to use. The function signature is the same, as before, but now we are using the CBLAS naming convention.
					
				 
			 
			
				
/* dasum_cblas.c */
#include "dasum.h"
#include "dasum_cblas.h"
double c_dasum( const int N, const double *X, const int stride ) {
    return cblas_dasum( N, X, stride );
}
				 
				
					
						Next, to prevent having to create multiple add-on files, we create a wrapper having the same name c_dasum as before.
					
				 
			 
			
				
# binding.gyp
{
  'variables': {
    'addon_target_name%': 'addon',
    'addon_output_dir': './src',
  },
  'targets': [
    {
      'target_name': '<(addon_target_name)',
      'dependencies': [],
      'include_dirs': [
        '<!(node -e "require(\'nan\')")',
        './../include',
      ],
      'sources': [
        'dasum_cblas.c',
        'addon.cpp'
      ],
      'link_settings': {
        'libraries': [
          '-lblas',
        ],
        'library_dirs': [],
      },
      'cflags': [
        '-Wall',
        '-O3',
      ],
      'cflags_c': [
        '-std=c99',
      ],
      'cflags_cpp': [
        '-std=c++11',
      ],
      'ldflags': [
		'-undefined dynamic_lookup',
        '-Wl,-no-pie',
        '-Wl,-search_paths_first'
      ],
    },
    {
      'target_name': 'copy_addon',
      'type': 'none',
      'dependencies': [
        '<(addon_target_name)',
      ],
      'actions': [
        {
          'action_name': 'copy_addon',
          'inputs': [],
          'outputs': [
            '<(addon_output_dir)/<(addon_target_name).node',
          ],
          'action': [
            'cp',
            '<(PRODUCT_DIR)/<(addon_target_name).node',
            '<(addon_output_dir)/<(addon_target_name).node',
          ],
        },
      ],
    },
  ],
}
				 
				
					
						We can modify the binding.gyp file to no longer include configuration settings and rules for compiling Fortran files.
					
					
						Instead, we specify the library we want to link to and update the source file list.
					
					
						Building and compiling the add-on follows the same procedure as before.
					
				 
			 
			
				
				
					
						
							Length 
							JavaScript 
							wasm 
							Native 
							Perf 
						 
					 
					
						
							10 
							22,438,020 
							18,226,375 
							7,084,870 
							0.31x 
						 
						
							100 
							4,350,384 
							6,428,586 
							6,428,626 
							1.47x 
						 
						
							1,000 
							481,417 
							997,234 
							3,289,090 
							6.83x 
						 
						
							10,000 
							28,186 
							110,540 
							355,172 
							12.60x 
						 
						
							100,000 
							1,617 
							11,157 
							30,058 
							18.58x 
						 
						
							1,000,000 
							153 
							979 
							1,850 
							12.09x 
						 
					 
				
				
					
						When we benchmark the hardware optimized BLAS libraries against equivalent implementations in JavaScript, we get the following results.
					
					
						As with the reference implementation, the add-on is slower for short array lengths.
					
					
						However, as we increase the array length, the add-on achieves significantly better performance even for an array length of 100 and better performance when compared to the reference implementation.
					
					
						Note that I have also included WebAssembly benchmarks. For those hoping that WebAssembly will remove the need for native add-ons and provide equivalent performance, you are mistaken.
					
					
						The main conclusion of these results is to use a hardware optimized library when available. These results are simply not possible otherwise.
					
				 
			 
			
			
				Challenges 
				
					
					
						Bugs 
						Standards 
						Proprietary 
						Windows 
						Portability 
						Complexity 
					 
				 
				
					
						At this point, you may be excited seeing a 20x improvement. One small problem, however: detecting and/or installing hardware optimized libraries is hard.
					
					
						The first problem is that some hardware optimized libraries contain bugs, so you need to provide patches; e.g., Apple Accelerate Framework.
					
					
						Next, resolving library installation locations in a robust cross-platform way is difficult, as no standard locations or naming conventions exist.
					
					
						Third, some hardware optimized libraries are proprietary and cannot be guaranteed to exist on a target platform.
					
					
						Fourth, hardware optimized BLAS on Windows is especially painful. And in fact, in general, Fortran BLAS is painful on Windows, and node-gyp cannot compile Fortran on Windows due to node-gyp's dependency on Microsoft Visual Studio, which does not include a Fortran compiler.
					
					
						Fifth, while OpenBLAS is close, there is no fully robust and fully cross-platform hardware optimized BLAS library that you can install alongside your add-on.
					
					
						...which means that you always need to ship a reference implementation fallback, and, for those environments where you cannot compile your native add-on, you also need to ship a pure JavaScript fallback.
					
					
						In short, to handle cross-platform complexity, your binding.gyp files become complex very quickly.
					
				 
			 
			
				
					There is one other issue, and it is an issue perhaps a bit peculiar to the world of JavaScript. And that issue is modularity .
				 
			 
			
				Modularity 
				
					
						In short, issues arise when you want to use source files from other packages, similar to "requiring" a module dependency.
					
   					
   						For example, in stdlib, we have BLAS implementations, which are often used in other BLAS implementations. In a single library like BLAS, resolving individual implementations is straightforward as dependencies often reside in the same directory. In our case, dependencies are not co-localized, and, in fact, in the general case, we cannot assume dependency locations due to variability in the package dependency tree (e.g., a dependency could be a sibling or a descendant or even reside in a global package directory).
   					
   					
   						And further, dependency source files can change based on the environment (e.g., whether a system library exists, or a third party library, or a reference implementation fallback, or lack of a Fortran compiler, etc.).
   					
   					
   						If we were only concerned with BLAS, one solution to this problem might be to simply include all of BLAS with each individual package and only expose the desired functionality. After all, a linker is the original tree shaker.
   					
   					
   						Just two problems. First, we don't want to have to download all of BLAS in order to build a package exporting a single function. And we certainly don't want to ship all of BLAS with each individual package, as that could lead to a massive amount of duplicated code being sent over the network.
   					
   					
   						Second, this solution does not scale. If your add-on depends on 10 functions, each from a different monolithic library, then you would have to download and install 10 different monolithic libraries for each build (intelligent caching aside).
   					
   					
   						Some might argue that this is an argument against hypermodularity and in favor of kitchen-sink type libraries.
   					
   					
   						This retort is, however, incorrect. The principle of modularity is an integral part of good software: only ship what you need when you need it, nothing more, end of story.
   					
   					
						In this case, what is needed is a set of tools to help us better think about modularity in the context of native add-ons.
					
				 
			 
			
				
{
    "options": {
        "os": "linux",
        "blas": "",
        "wasm": false
    },
    "fields": [
        {
            "field": "src",
            "resolve": true,
            "relative": true
        },
        {
            "field": "include",
            "resolve": true,
            "relative": true
        },
        {
            "field": "libraries",
            "resolve": false,
            "relative": false
        },
        {
            "field": "libpath",
            "resolve": true,
            "relative": false
        }
    ],
    "confs": [
        {
            "os": "linux",
            "blas": "",
            "wasm": false,
            "src": [
                "./src/dasum.f",
                "./src/dasumsub.f",
                "./src/dasum_f.c"
            ],
            "include": [
                "./include"
            ],
            "libraries": [],
            "libpath": [],
            "dependencies": []
        },
        {
            "os": "linux",
            "blas": "openblas",
            "wasm": false,
            "src": [
                "./src/dasum_cblas.c"
            ],
            "include": [
                "./include"
            ],
            "libraries": [
                "-lopenblas",
                "-lpthread"
            ],
            "libpath": [],
            "dependencies": []
        },
        {
            "os": "mac",
            "blas": "",
            "wasm": false,
            "src": [
                "./src/dasum.f",
                "./src/dasumsub.f",
                "./src/dasum_f.c"
            ],
            "include": [
                "./include"
            ],
            "libraries": [],
            "libpath": [],
            "dependencies": []
        },
        {
            "os": "mac",
            "blas": "apple_accelerate_framework",
            "wasm": false,
            "src": [
                "./src/dasum_cblas.c"
            ],
            "include": [
                "./include"
            ],
            "libraries": [
                "-lblas"
            ],
            "libpath": [],
            "dependencies": []
        },
        {
            "os": "mac",
            "blas": "openblas",
            "wasm": false,
            "src": [
                "./src/dasum_cblas.c"
            ],
            "include": [
                "./include"
            ],
            "libraries": [
                "-lopenblas",
                "-lpthread"
            ],
            "libpath": [],
            "dependencies": []
        },
        {
            "os": "win",
            "blas": "",
            "wasm": false,
            "src": [
                "./src/dasum.c"
            ],
            "include": [
                "./include"
            ],
            "libraries": [],
            "libpath": [],
            "dependencies": []
        },
        {
            "os": "",
            "blas": "",
            "wasm": true,
            "src": [
                "./src/dasum.c"
            ],
            "include": [
                "./include"
            ],
            "libraries": [],
            "libpath": [],
            "dependencies": []
        }
    ]
}
				 
				
					
   						To address this challenge in stdlib, we leverage the same algorithm used by `require` to resolve dependencies.
   					
   					
   						Namely, we create a manifest.json file for each package which lists source files based on environment conditions as well as any package dependencies containing source files we want to use.
   					
   					
   						When compiling a package, we load the manifest.json, walk the dependency tree, resolve source files tailored to both configuration and environment, and then dynamically populate GYP variables before compiling add-ons.
   					
   					
   						This allows us to decompose traditionally monolithic libraries into separate components, while maintaining dependency resolution.
   					
   					
   						I should note that the approach outlined is applicable more generally to all native add-ons, including those outside of stdlib.
   					
   					
   						If a third party add-on were to include a `manifest.json` which advertised source files, a stdlib add-on would be able to use the functionality contained therein in its implementation.
   					
   					
   						I should also mention, the approach I just outlined is not wholly a new idea--several people have tried their hands at building a C/C++ package manager, often inspired by npm--but I have yet to see an approach which allows explicitly resolving add-on dependencies within a node_modules dependency tree.
   					
   					
   						If you are interested in learning more about we do things, see stdlib.
   					
				 
			 
			
			
				
				
					A Node API for Node.js native add-ons.
				 
			 
			
				Features 
				
					
					
						Stability 
						Compatibility 
						VM Neutrality 
					 
				 
				
					
						
							A stable API abstraction. Similar in its goals as NAN.
						 
						
							Compatibility across Node versions.
						 
						
							Key differentiator: same API across Node VMs; e.g., V8, Chakra, etc.
						 
					 
					
						In short, N-API promises native add-ons which simply work. :)
					
				 
			 
			
				
/* addon.cpp */
#include <node_api.h>
#include <assert.h>
#include "hypot.h"
namespace addon_hypot {
    napi_value node_hypot( napi_env env, napi_callback_info info ) {
        napi_status status;
        size_t argc = 2;
        napi_value argc[ 2 ];
        status = napi_get_cb_info( env, info, &argc, args, nullptr, nullptr );
        assert( status == napi_ok );
        if ( argc < 2 ) {
            napi_throw_type_error( env, "invalid invocation. Must provide 2 arguments." );
            return nullptr;
        }
        napi_value vtype0;
        status = napi_typeof( env, args[ 0 ], &vtype0 );
        assert( status == napi_ok );
        if ( vtype0 != napi_number ) {
            napi_throw_type_error( env, "invalid input argument. First argument must be a number." );
            return nullptr;
        }
        napi_value vtype1;
        status = napi_typeof( env, args[ 0 ], &vtype1 );
        assert( status == napi_ok );
        if ( vtype1 != napi_number ) {
            napi_throw_type_error( env, "invalid input argument. Second argument must be a number." );
            return nullptr;
        }
        const double x;
        status = napi_get_value_double( env, args[ 0 ], &x );
        assert( status == napi_ok );
        const double y;
        status = napi_get_value_double( env, args[ 1 ], &y );
        assert( status == napi_ok );
        napi_value h;
        status = napi_create_number( env, c_hypot( x, y ), &h );
        assert( status == napi_ok );
        return h;
    }
    #define DECLARE_NAPI_METHOD( name, func ) { name, 0, func, 0, 0, 0, napi_default, 0 }
    void Init( napi_env env, napi_value exports, napi_value module, void* priv ) {
        napi_status status;
        napi_property_descriptor addDescriptor = DECLARE_NAPI_METHOD( "hypot", node_hypot );
        status = napi_define_properties( env, exports, 1, &addDescriptor );
        assert( status == napi_ok );
    }
    NAPI_MODULE( addon, Init )
}
				 
				
					
						As an example of what an N-API add-on might  look like, and I say might  because the implementation is still experimental, here is the hypot add-on refactored from NAN to N-API.
					
					
						The first notable difference is that we no longer directly call V8 methods, and, instead, everything goes through N-API.
					
					
						The second notable difference is the usage of return value references and the returning of status values.
					
					
						Otherwise, we still need to export an initialization function and an add-on still follows the same general structure.
					
				 
			 
			
			
				Conclusions 
				
					
					
						Parity 
						Performance 
						Progress 
					 
				 
				
			 
			
			
			
			
				
				
					Intentionally left blank.