Application Benchmarks with bSIMD

The following benches were performed on Numscale's physical hardware, including Intel, ARM, AMD and IBM. Results to follow on more architectures. For further information or specific analysis requests, please contact us.


Black and Scholes

Black and Scholes is a famous algorithm from the financial world used to calculate the price of European style options.

In this test, we benchmark an implementation of this algorithm using bSIMD versus an implementation using the standard library. bSIMD includes highly optimized SIMD versions of many mathematical functions, such as exp, enabling use to easily vectorize complex functions such as this. This test was performed using float.

Black & Scholes
ARM aarch64
NEON
Intel x86_64
SSE4.2
AVX2
Intel KNL
AVX512
IBM Power8
VSX


Evaluation of a Neural Network Activation Function

The activation function of a neural network is typically a sigmoid function of the form:

$\sigma = \frac{1}{1+e^{-z}}$

In this test, we benchmark an implementation of this algorithm using bSIMD versus an implementation using the standard library. As with the previous benchmark, we use the vectorized versions of mathematical functions provided by bSIMD. This test was performed using float. The code for this test may be seen here.

Neural Network
ARM aarch64
NEON
Intel x86_64
SSE4.2
AVX2
Intel KNL
AVX512
IBM Power8
VSX


Sigma Delta

Motion detection is a fundamental task in robotics and computer vision. Ensuring its correctness and speed is paramount to calculating precise results. Motion may be trivially computed by performing background substraction between frames but such a simple algorithm often fails in many situations. Sigma Delta is a much more robust and efficient algorithm which uses a Gaussian model of the variation of each pixel's brightness to detect movements in all cases. The motion detection algorithm Sigma Delta is implemented using both the C++ standard library and bSIMD. A grayscale 8 bit image sequence of size 1024x1024 is used in this test.

Sigma Delta
ARM aarch64
NEON
Intel x86_64
SSE4.2
AVX2
IBM Power8
VSX


Nbody

The well-known, computational intensive N-Body problem is implemented using bSIMD and the standard library. The code for this test may be seen here. This test runs for 4096 bodies in the simulations.

Nbody
ARM aarch64
NEON
Intel x86_64
SSE4.2
AVX2
Intel KNL
AVX512
IBM Power8
VSX


Mandelbrot Set

Mandelbrot set computation is a data-parallel task that generates the famous fractal images by computing the convergence of a given complex function. The Mandelbrot set is computed using a scalar algorithm and an algorithm vectorized using bSIMD. The code for this test may be seen here.

Mandelbrot Set
ARM aarch64
NEON
Intel x86_64
SSE4.2
AVX2
Intel KNL
AVX512
IBM Power8
VSX