High Performance Array Operations with Cython | Set 2

The resulting code in the first part is fast. In this article, we will compare the performance of the code with the clip() function which is present in the NumPy library.

Surprisingly, our program is fast compared to NumPy written in C.

Code # 1: Performance comparison.

a = timeit ( `numpy.clip (arr2, -5, 5, arr3)` ,

`from __main__ import b, c, numpy` , number = 1000 )

 

print ( "Time for NumPy clip program:" , a)

  

b = timeit ( `sample.clip (arr2, -5, 5, arr3)` ,

`from __main__ import b , c, sample` , number = 1000 )

 

print ( "Time for our program:" , b)

Output:

 Time for NumPy clip program: 8.093049556000551 Time for our program:, 3.760528204000366 

Well, the codes in this article required typed memory representations in Cython, which simplifies the code working with arrays. The cpdef clip () declaration declares clip () as a C and Python level function. This means that function calls are more efficiently called by other Cython functions (for example, if you want to call clip () from another Cython function).

The code uses two decorators & # 8212 ; @cython.boundscheck(False) and @cython.wraparound(False) . These are a few additional performance optimizations.

@ cython.boundscheck (False): Eliminates all bounds of the check array and will be used if indexing is not out of range. 
@ cython.wraparound (False): Eliminates processing of negative array indices when going to the end of the array (as in Python lists). Including these decorators can make the code run significantly faster (almost 2.5x faster in this example in testing).

Code # 2: a variant of the clip () function using conditional expressions

# decorators

@ cython . boundscheck ( False )

@ cython . wraparound ( False )

 

cpdef clip (double [:] a, double min , double max , double [:] out):

  

  if min & gt;  max :

raise ValueError ( "min must be & lt; = max" )

 

  if a.shape [ 0 ]! = out.shape [ 0 ]:

raise ValueError

( "input and output arrays must be the same size" )

 

  for i in range (a.shape [ 0 ]):

out [i] = (a [i] 

if a [i] & lt;  max else max

if a [i ] & gt;  min else min

After testing, this version of the code runs 50% faster. But how does this code stack up against the handwritten version of C. After some experimentation, you can verify that the handcrafted C extension is more than 10% slower than the Cython version.