
This post gives a concise example on how to use OpenMP in Cython on macOS.


Install OpenMP.

brew install libomp

Install numpy (used in the example) and Cython.

conda install numpy cython

My Cython version is 3.0.0.


In test.pyx, we implement the log-sum-exp trick in Cython.

from cython.parallel cimport prange
from libc.math cimport exp, log, fmax
cimport cython

cdef double c_max(
    int N,
    double *a,
) nogil:
    cdef int i
    cdef double b = a[0]
    for i in range(1, N):
        b = fmax(b, a[i])
    return b

cdef double c_logsumexp(
    int N,
    double *a,
) nogil:
    cdef int i
    cdef double b = c_max(N, a)
    cdef double x = 0.0
    for i in prange(N):
        x += exp(a[i] - b)
    x = b + log(x)
    return x

def logsumexp(double [::1] a):
    return c_logsumexp(a.shape[0], &a[0])

Note how to write the setup.py:

from setuptools import Extension, setup
from Cython.Build import cythonize

extensions = [
        extra_compile_args=['-Xpreprocessor', '-fopenmp'],

    ext_modules=cythonize(extensions, language_level='3'),

The -Xpreprocessor is required for the openmp pragmas to be processed.


python3 setup.py build_ext --inplace

After the build, ls -F output on my mac:

build/  setup.py  test.c  test.cpython-39-darwin.so*  test.pyx


python3 -m timeit -s 'from scipy.special import logsumexp; import numpy as np; a = np.random.randn(1000)' 'logsumexp(a)'
python3 -m timeit -s 'from test import logsumexp; import numpy as np; a = np.random.randn(1000)' 'logsumexp(a)'

The output:

10000 loops, best of 5: 32.1 usec per loop
50000 loops, best of 5: 6.66 usec per loop