`Arch-R`

The following benches were performed on Numscale's physical hardware, including `Intel`

, `ARM`

, `AMD`

and `IBM`

. Results to follow on more architectures.
For further information or specific analysis requests, please contact us.

`Arch-R`

Descriptive statistics are mathematical functions that gives a quantitative summary of a data set's
properties including measures of central tendency like the mean or the median, measures of variability
like the variance, kurtosis, skewness and extremum. `Arch-R`

provides a sensible selection of such functions.

The first family of measures is the central tendency measures. Those functions include the `sum`

, the
`mean`

, the `weighted mean`

, the sum of absolute value (`asum`

) and the sum of squared value (`asum2`

). All those functions
are provided by `Arch-R`

for both single and double precision data set. Benchmarks has been done on
array of 2048 32-bits floating-point values and compared to a C++ implementation using the Standard
Library.

In some cases, the data being analyzed are corrupted or invalid. A sensor may fail or a person may
have wrongly filled his or her question form. In these cases, a common practice is to use the `NaN`

value to indicate the error or missing data. Obviously, we can not run functions on such data as the
results will be invalid. `Arch-R`

provides functions to either detect the number of invalid values
(`invalid_count`

) and to perform central tendency measures while ignoring these (`filtered_mean`

).
Benchmarks of the `Arch-R`

implementation of those functions have been done on array of 2048 32-bits
floating-point values and compared to a C++ implementation using the Standard Library.

The `variance`

measures how far a set of values are spread out from their average value. Along with the
standard deviation, it is used in various application like statistical inference or Monte Carlo sampling.
Benchmarks of the `Arch-R`

implementation of those functions have been done on array of 2048 32-bits
floating-point values and compared to a C++ implementation using the Standard Library.

`skewness`

is a measure of the asymmetry of the distribution of values around their mean. Two different
measure of `skewness`

are usually : Pearson's and sampled version, both being provided by `Arch-R`

.
Benchmarks of the `Arch-R`

implementation of those functions have been done on array of 2048 32-bits
floating-point values and compared to a C++ implementation using the Standard Library.

`kurtosis`

is the last batch of dispersion measure functions provided by `Arch-R`

in both its classic
implementation and in its `excess`

version. Benchmarks of the `Arch-R`

implementation of those functions
have been done on array of 2048 32-bits floating-point values and compared to a C++ implementation using the Standard Library.

`Arch-R`

Numscale specialises in portable, high performance software. Many of our clients require an FFT with the best possible performance on x86_64, ARM and PowerPC. In this benchmark, the performance of a 2D Fourier Transform of size 2048x2048 in single precision using Arch-R is compared to the famous FFTW. The FFTW is distributed under the highly restrictive GPL licence.

Arch-R's 2D FFT is almost 50% faster than the FFTW on Intel x86, ARM and PowerPC, without sacrificing any numerical precision!

We are faster in 1D too! Arch-R FFT was used to calculate a single precision 1D FFT of size 2048 and this was benchmarked against the FFTW on ARM, PowerPC and Intel x86_64. Archr-R FFT again is faster than the FFTW on all architectures! Arch-R FFT, is light-weight, easy to use and is easy to integrate into your existing projects, while still being high-performance and cross platform. It's time to use Arch-R in all of your critical projects!

`Arch-R`

In the following benchmarks, several common image processing operations are implemented in Arch-R using `bSIMD`

so that the maximum performance possible is realised across all architectures.
These are benchmarked against `OpenCV 3.1.0`

on `ARM`

and `x86`

.

Binary image morphology is **widely** in used image processing and computer vision to locate objects of a certain size or to eliminate noise of a certain form. Image morphology can be very expensive
as the value of each pixel is computed as a function of its neighbouring pixels. In the following benchmarks, we will apply various structuring elements to input images and benchmark the
time taken by a version implemented using `Archr-R`

against `OpenCV 3.1.0`

, the most popular image processing library, on `ARM`

and `x86`

. In each benchmark, `OpenCV`

is compiled with all optimizations enabled to
ensure that a accurate comparison of the performance between `Archr-R`

and `OpenCV`

is obtained. As the image morphology code is implemented using `Archr-R`

, the exact same code is used on both `ARM`

and `x86`

.

In the first test, we compare the time taken to perform a **"closing"** operation. This involves performing a dilation followed by an erosion with the same structuring element. In this test, a
circular structuring of radius one is used on an `8 bit`

image of size `2048x2048`

.

It is clear from the results here that interesting speed-ups are obtained on both `ARM`

and `x86`

. However, we can see that `OpenCV`

has a highly optimized image morphology functionality
on `x86`

, as we can see from the results here. These same optimizations are clearly not implemented on `ARM`

, explaining why `Archr-R`

is so much more efficient. The speed-up of the `Archr-R`

implementation
is due to the use of vector intrinsics and efficient memory accesses. On SSE and NEON, a speed-up of 16 is expected for an 8-bit and on AVX2, a speed-up of 32 is expected versus a scalar implementation.
However, the actual speed-ups observed lead us to conclude that `OpenCV`

uses Intel Intrinsics in its `x86`

version. Nevertheless, the `Archr-R`

version of this algorithm is still significantly quicker.

In the second test, we compare a binary dilation performed using a circular structuring element of radius 9 implemented using `Archr-R`

against `OpenCV 3.1.0`

, again on `ARM`

and `x86`

.

In this test, we observe very impressive speed-ups for `Archr-R`

versus `OpenCV`

. These speed-ups are again explained by the combination of the use of SIMD instructions via `bSIMD`

, combined with efficient memory accesses and a highly optimized algorithm.
As the structuring element in this example is significant larger than that in the previous example, the effect is multiplied. Again, we note the `Archr-R`

ode in this example is identical on both `ARM`

and `x86`

.

In the final image morphology test, we compare an erosion performed using a square structuring element of size 25.

Again, `Archr-R`

performs extremely well on all architectures.