The Rust and the Python

Efficient numerical Python extensions
… written in Rust

Agenda

Goals

Quick intro to Rust
Machine Learning (ML) theory
A modern, clean and efficient Python package
Optimization with a Rust extension

All sources (including these slides) on Github
Open source license (MIT). Feel free to use as a template!

Rust: A quick introduction

Rust is on its seventh year as the most loved language with 87% of developers saying they want to continue using it. Rust also ties with Python as the most wanted technology with TypeScript running a close second. (stackoverflow survey 2022)

Developed by Graydon Hoare at Mozilla since 2010; first stable version 2015
Learning impacts Python programming (highly recommended blog post)
but … it takes time to learn. C++ background helpful
A very active, committed community
Common headline on hackernews: " * .. but written in Rust"

Structs and traits


                            pub struct NewsArticle {
                                pub headline: String,
                                pub location: String,
                                pub author: String,
                                pub content: String,
                            }
                            pub trait Summary {
                                fn summarize(&self) -> String;
                            }
                            impl Summary for NewsArticle {
                                fn summarize(&self) -> String {
                                    format!("{}, by {} ({})", self.headline, self.author, self.location)
                                }
                            }

No inheritance. Use aggregation instead
Traits (interfaces) similar to Protocols in Python

Borrowing & Lifetimes


                            let mut s = String::from("hello");

                            {
                                let r1 = &mut s;
                            } // r1 goes out of scope here, so we can make a new reference with no problems.

                            let r2 = &mut s;

Rust is memory safe (like Python, unlike C/C++)
No garbage collection like Python / Java
No reference counting like C++
The Borrowing checker enforces rules that prevent memory errors

Enums


                            enum Message {
                                Quit,
                                Move { x: i32, y: i32 },
                                Write(String),
                                ChangeColor(i32, i32, i32),
                            }

Algebraic data types
The foundation for elegant error handling and optinal types

… and much more

Powerful pattern matching
Functional programming
Excellent tooling
…

Which
algorithm / ML
to choose?

Easy to understand
Educational yet useful / versatile
Easy / fast to implement (±50 LoC)
Easy / fast to train and evaluate
Bonus: probabilistic

Gaussian Mixture Models

$$ \begin{aligned} p(\bm x | M ) &= \sum_{m\in M} \pi_m \cdot p(\bm x | m) \\ &= \sum_{m\in M} \pi_m \cdot \mathcal N(x; \bm \mu_m, \Sigma_m) \\ &= \pi_m \cdot \frac{1}{ \sqrt{(2\pi)^d \det({\Sigma_m})} } \\ & \qquad \exp \left(-\frac{1}{2}({\mathbf x}-{\bm\mu_m})^{\top}{{\Sigma_m}}^{-1}({\bm x}-{\bm\mu_m}) \right) \end{aligned} $$

$$ X = \left\{ \begin{pmatrix} \bm x_i \\ z_i \end{pmatrix}\right\}_i, \quad \bm x \in \mathbb R^d, z \in M $$

1 Multinomial/Categorical distribution
$k$ Normal distributions
Each draw $i$ first determines the Normal distribution $z_i$ and then the sample $\bm x_i$
Example: Weight / height distribution in an animal population
see my blog post (no formulas)

Data distributed by combination of $k$ Normal distributions

Expectation Maximization (EM)

Gaussian mixture models can be trained with EM

$$ X = \left\{ \bm x_i\right\}_i, \quad \bm x \in \mathbb R^d $$

Algorithm for unsupervised learning / Clustering
Missing data: We don't know which Gaussian $z_i$ produced $\bm x_i$
The unobservable variable $z_1$ is called the latent state
Alternating pattern: E-Step and M-Step
Guaranteed to converge but prone to local minima

Many models can be trained with EM:

$k$-Means
Self-organizing neural networks
Other mixture models
Hidden Markov Models
Kalmann filters

E(xpectation)-step

$$ \begin{aligned} \forall m \in M, \bm x \in X&: \\ \bar \gamma_{\bm x,m} &= \pi_m \cdot p(\bm x | m) \\ &= \pi_m \cdot \mathcal N( \bm x; \bm\mu_m, \Sigma_m) \\ &= \pi_m \cdot \frac{1}{ \sqrt{(2\pi)^d \det({\Sigma_m})} } \exp \left(-\frac{1}{2}({\mathbf x}-{\bm\mu_m})^{\top}{{\Sigma_m}}^{-1}({\bm x}-{\bm\mu_m}) \right)\\ \gamma_{\bm x,m} &= \frac{\bar \gamma_{\bm x, m}}{ \sum_{m \in M} \bar \gamma_{\bm x, m} } \end{aligned} $$

"Guessing the hidden states"
For each sample and each component, the probability
that the sample has been generated by the component

M(aximization)-step

$$ \begin{aligned} \bm\mu_m &\leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot \bm x }{\sum_{\bm x \in X} \gamma_{\bm x,m}} \\ \Sigma_m &\leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot (\bm x - \bm\mu_m)^t\cdot (\bm x - \bm \mu_m) }{\sum_{\bm x \in X} \gamma_{\bm x,m}}\\ \pi_m &\leftarrow \frac{\sum_{\bm x \in X}}{\sum_{\bm x \in X} \gamma_{\bm x,m}} \end{aligned} , \quad $$

Maximize the likelihood of the distribution
Compute the new parameters of each Gaussian

EM in action

Applications

High-dimensional clustering (Warning: $\mathcal O(n^3)$!)

Regression
Image recognition
…

GMM/EM in Python


                            $ # Python packaging and dependency management
                            $ curl -sSL https://install.python-poetry.org | python3 -
                            $ # Get the code
                            $ git clone git@github.com:StefanUlbrich/numeric-rust-python-tutorial.git tutorial
                            $ cd tutorial && git checkout python-skeleton
                            tutorial$ # Create virtual environment and install dependencies
                            tutorial$ poetry env use python3.11 # Optional
                            tutorial$ poetry install

System setup (Python install, IDE setup, etc.) out of scope
All steps of the tutorial as git tags (e.g., python-skeleton and python-version)
Tested on MacOS and Linux only (WSL should work)
I use VSCode and Jupyter lab for development and prototyping

Design decisions

ADT / Data class represents the GMM

Numpy arrays for parameters
Additional dimensions instead of lists

No OOP

No OOP (easier to replace functions)
EM implemented as three functions
E-Step
M-Step variants: 2 Python + 1 Rust
Simple initialization:
(random $\gamma_{\bm x,m}$ + M-Step)

Benchmark (timeit) and data generation

                                    
                                        @dataclass
                                        class GMM:
                                            means: NDArray[np.float64]   # k x d
                                            covs: NDArray[np.float64]    # k x d x d
                                            weights: NDArray[np.float64] # k

                                        def expect(
                                            gmm: GaussianMixtureModel,
                                            data: NDArray[np.float64]
                                        ) -> NDArray[np.float64]:
                                            ...

                                        def maximize(
                                            gmm: GaussianMixtureModel,
                                            responsibilities: NDArray[np.float64],
                                            data: NDArray[np.float64]
                                        ) -> None:
                                            ...

The E-Step

Intimidating, but …

Only $n\times k$ Gaussians
And normalization (probabilities)
Try it out

The M-Step — Naïve

$$ \begin{aligned} \bm\mu_m &\leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot \bm x }{\sum_{\bm x \in X} \gamma_{\bm x,m}} \\ \Sigma_m &\leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot (\bm x - \bm\mu_m)^t\cdot (\bm x - \bm \mu_m) }{\sum_{\bm x \in X} \gamma_{\bm x,m}} \\ \pi_m &\leftarrow \frac{\sum_{\bm x \in X}}{\sum_{\bm x \in X} \gamma_{\bm x,m}} \end{aligned} $$

We will focus on the M-Step
Loops in python are slow. How can we avoid as many loops as possible
First naïve implementation (only one loop)

The M-Step — The physicist's way

$$ \Sigma_m \leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot (\bm x - \bm\mu_m)^t\cdot (\bm x - \bm \mu_m) } {\sum_{\bm x \in X} \gamma_{\bm x,m}}, \qquad \forall m \in M $$

$$ \Sigma_m \leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot \bar{\bm x}_m^t\cdot \bar{\bm x}_m } {\sum_{\bm x \in X} \gamma_{\bm x,m}}, \qquad \bar{\bm x}_m = \bm x - \bm \mu_m, \quad \forall m \in M $$

$$ \bar X_{k,n,d} \cdot \Gamma_{k,n} \cdot \bar X_{k,n,d} = \bm \Sigma_{k,d,d} $$

                                    
                                        knd, kn, knd -> kdd

                                    
                                        einstein_sum_notation('knd, kn, knd -> kdd', data, responsibilities, data)

                                    
                                        einsum('knd, kn, knd -> kdd', data, responsibilities, data)

                                    
                                        np.einsum('knd, kn, knd -> kdd', data, responsibilities, data)

Think in tensors to avoid loops
This is called Einstein notation
Included in Numpy!
Must be faster, right? Let's find out!

Benchmark


                                data, _ = gmm.make_blobs(n_samples=10000, centers=20, n_features=2, random_state=7)

                                model = gmm.initialize(data, 20)
                                print(",",model.means)
                                r = gmm.expect(model, data)

Pure Python with einsum


                                    13 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Pure Python with loops


                                    7.37 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Python implementation is performant!
Surprisingly, the loop version is faster

Translating the M-Step to Rust


                            $ # Installation
                            $ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
                            $ rustup update # Optional: Update the tool chain
                            $ cd tutorial && git checkout rust-examples
                            tutorial$ # git checkout rust-implementation # spoiler alert!
                            tutorial$ (cd data; poetry run data) # we need data for the experiments
                            tutorial$ cargo run --example read_data
                            tutorial$ # cargo bench # run benchmarks later

Install instructions
There is a bug in the read_data example, sorry!
Tested on MacOS and Linux only (WSL should work)

Loading numpy data

                            
                                fn main() {
                                    let data: Array2 = read_npy("data/data.npy").unwrap();
                                    println!("{}", data);

                                    let responsibilities: Array2 = read_npy("data/responsibilities.npy").unwrap();
                                    println!("{}", responsibilities);

                                    let means: Array2 = read_npy("data/means.npy").unwrap();
                                    println!("{}", means);
                                }

No unit tests (due to Python binding). Using examples instead
Wrong folder in the template

Array passing

                                
                                    use ndarray::prelude::*;

                                    pub fn foo(data: Array2) -> Array2 { Array2::::zeros((0,0)) }

                                
                                    use ndarray::prelude::*;

                                    pub fn foo(data: &Array2) -> Array2 { Array2::::zeros((0,0)) }

                                
                                    use ndarray::prelude::*;

                                    pub fn foo(mut data: &Array2, other: ArrayView2:: )  {
                                        temp.assign(&data);
                                    }

Ownership to arrays can be passed to a function
Arrays can be passed as a mutable or immutable reference
A view to an array can be generated that holds a reference
A function can be written in a generic way to accept all
In Python you don't even need the type

Array handling

Creating sums

                                    
                                        let sum_responsibilities = responsibilities.sum_axis(Axis(0));

                                    
                                        sum_responsibilities = responsibilities.sum(axis=1)

Broadcasting

                                    
                                        let x = (&responsibilities.slice(s![.., .., NewAxis]) * &data.slice(s![.., NewAxis, ..]))

                                    
                                        x = np.sum(data[np.newaxis, :, :] * responsibilities[:, :, np.newaxis], axis=1)

Dot product

                                    
                                        let cov = &x.t().dot(&y)

                                    
                                        covs = x.T @ y

ndarray has an interface that reminds of numpy
See the ndarray for numpy users guide
No einsum in Rust

The M-Step

The C-way
Iterators
Parallelization
Benchmarking with criterion.rs

Creating the extension


                            tutorial$ git checkout extension-skeleton
                            tutorial$ # git checkout extension-final # spoiler alert!
                            tutorial$ maturin develop -r --strip # Builds the extensions and adds it to the venv
                            tutorial$ maturin build -r --strip # Creates a binary wheel

PyO3 is a crate for the binding boilerplate
Maturin is a build tool. Shares pyproject.tomlwith poetry
We loose unit testing unless we create another crate

Benchmark


                                data, _ = gmm.make_blobs(n_samples=10000, centers=20, n_features=2, random_state=7)

                                model = gmm.initialize(data, 20)
                                print(",",model.means)
                                r = gmm.expect(model, data)

Pure Python with einsum


                                    13 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Pure Python with loops


                                    7.37 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Parallel Rust


                                    3.49 ms ± 23.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

MacBook, 2.6 GHz 6-Core Intel Core i7
Rust wins due to parallelization
More analysis desired

Takeaways

It is not difficult to write clean python packages!
It is not difficult to create mixed Rust/Python packages!
Loops are faster than einsum!
Rust is faster because of parallelization!
Simple things are not difficult in Rust!
The rest is (opinionated, not shown)!

The Rust and the Python

Efficient numerical Python extensions … written in Rust

Agenda

Rust: A quick introduction

Structs and traits

Borrowing & Lifetimes

Enums

… and much more

Whichalgorithm / MLto choose?

Gaussian Mixture Models

Expectation Maximization (EM)

E(xpectation)-step

M(aximization)-step

EM in action

Applications

GMM/EM in Python

Design decisions

The E-Step

The M-Step — Naïve

The M-Step — The physicist's way

Benchmark

Translating the M-Step to Rust

Loading numpy data

Array passing

Array handling

The M-Step

Creating the extension

Benchmark

Takeaways

Happy coding, Rustlings!

Efficient numerical Python extensions
… written in Rust

Which
algorithm / ML
to choose?