The Rust and the Python

Efficient numerical Python extensions
… written in Rust

Agenda

  • Goals
    1. Quick intro to Rust
    2. Machine Learning (ML) theory
    3. A modern, clean and efficient Python package
    4. Optimization with a Rust extension
  • All sources (including these slides) on Github
  • Open source license (MIT). Feel free to use as a template!

Rust: A quick introduction

Rust is on its seventh year as the most loved language with 87% of developers saying they want to continue using it. Rust also ties with Python as the most wanted technology with TypeScript running a close second. (stackoverflow survey 2022)
  • Developed by Graydon Hoare at Mozilla since 2010; first stable version 2015
  • Learning impacts Python programming (highly recommended blog post)
  • but … it takes time to learn. C++ background helpful
  • A very active, committed community
  • Common headline on hackernews: " * .. but written in Rust"

Structs and traits


                            pub struct NewsArticle {
                                pub headline: String,
                                pub location: String,
                                pub author: String,
                                pub content: String,
                            }
                            pub trait Summary {
                                fn summarize(&self) -> String;
                            }
                            impl Summary for NewsArticle {
                                fn summarize(&self) -> String {
                                    format!("{}, by {} ({})", self.headline, self.author, self.location)
                                }
                            }

                        
  • No inheritance. Use aggregation instead
  • Traits (interfaces) similar to Protocols in Python

Borrowing & Lifetimes


                            let mut s = String::from("hello");

                            {
                                let r1 = &mut s;
                            } // r1 goes out of scope here, so we can make a new reference with no problems.

                            let r2 = &mut s;

                        
  • Rust is memory safe (like Python, unlike C/C++)
  • No garbage collection like Python / Java
  • No reference counting like C++
  • The Borrowing checker enforces rules that prevent memory errors

Enums


                            enum Message {
                                Quit,
                                Move { x: i32, y: i32 },
                                Write(String),
                                ChangeColor(i32, i32, i32),
                            }
                        
  • Algebraic data types
  • The foundation for elegant error handling and optinal types

… and much more

  • Powerful pattern matching
  • Functional programming
  • Excellent tooling

Which
algorithm / ML
to choose?

  • Easy to understand
  • Educational yet useful / versatile
  • Easy / fast to implement (±50 LoC)
  • Easy / fast to train and evaluate
  • Bonus: probabilistic

Gaussian Mixture Models

$$ \begin{aligned} p(\bm x | M ) &= \sum_{m\in M} \pi_m \cdot p(\bm x | m) \\ &= \sum_{m\in M} \pi_m \cdot \mathcal N(x; \bm \mu_m, \Sigma_m) \\ &= \pi_m \cdot \frac{1}{ \sqrt{(2\pi)^d \det({\Sigma_m})} } \\ & \qquad \exp \left(-\frac{1}{2}({\mathbf x}-{\bm\mu_m})^{\top}{{\Sigma_m}}^{-1}({\bm x}-{\bm\mu_m}) \right) \end{aligned} $$
$$ X = \left\{ \begin{pmatrix} \bm x_i \\ z_i \end{pmatrix}\right\}_i, \quad \bm x \in \mathbb R^d, z \in M $$
  • 1 Multinomial/Categorical distribution
  • $k$ Normal distributions
  • Each draw $i$ first determines the Normal distribution $z_i$ and then the sample $\bm x_i$
  • Example: Weight / height distribution in an animal population
  • see my blog post (no formulas)
  • Data distributed by combination of $k$ Normal distributions

Expectation Maximization (EM)

Gaussian mixture models can be trained with EM

$$ X = \left\{ \bm x_i\right\}_i, \quad \bm x \in \mathbb R^d $$
  • Algorithm for unsupervised learning / Clustering
  • Missing data: We don't know which Gaussian $z_i$ produced $\bm x_i$
  • The unobservable variable $z_1$ is called the latent state
  • Alternating pattern: E-Step and M-Step
  • Guaranteed to converge but prone to local minima
  • Many models can be trained with EM:
    • $k$-Means
    • Self-organizing neural networks
    • Other mixture models
    • Hidden Markov Models
    • Kalmann filters

E(xpectation)-step

$$ \begin{aligned} \forall m \in M, \bm x \in X&: \\ \bar \gamma_{\bm x,m} &= \pi_m \cdot p(\bm x | m) \\ &= \pi_m \cdot \mathcal N( \bm x; \bm\mu_m, \Sigma_m) \\ &= \pi_m \cdot \frac{1}{ \sqrt{(2\pi)^d \det({\Sigma_m})} } \exp \left(-\frac{1}{2}({\mathbf x}-{\bm\mu_m})^{\top}{{\Sigma_m}}^{-1}({\bm x}-{\bm\mu_m}) \right)\\ \gamma_{\bm x,m} &= \frac{\bar \gamma_{\bm x, m}}{ \sum_{m \in M} \bar \gamma_{\bm x, m} } \end{aligned} $$
  • "Guessing the hidden states"
  • For each sample and each component, the probability
    that the sample has been generated by the component

M(aximization)-step

$$ \begin{aligned} \bm\mu_m &\leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot \bm x }{\sum_{\bm x \in X} \gamma_{\bm x,m}} \\ \Sigma_m &\leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot (\bm x - \bm\mu_m)^t\cdot (\bm x - \bm \mu_m) }{\sum_{\bm x \in X} \gamma_{\bm x,m}}\\ \pi_m &\leftarrow \frac{\sum_{\bm x \in X}}{\sum_{\bm x \in X} \gamma_{\bm x,m}} \end{aligned} , \quad $$
  • Maximize the likelihood of the distribution
  • Compute the new parameters of each Gaussian

EM in action

Applications

High-dimensional clustering (Warning: $\mathcal O(n^3)$!)

  • Regression
  • Image recognition

GMM/EM in Python


                            $ # Python packaging and dependency management
                            $ curl -sSL https://install.python-poetry.org | python3 -
                            $ # Get the code
                            $ git clone git@github.com:StefanUlbrich/numeric-rust-python-tutorial.git tutorial
                            $ cd tutorial && git checkout python-skeleton
                            tutorial$ # Create virtual environment and install dependencies
                            tutorial$ poetry env use python3.11 # Optional
                            tutorial$ poetry install
                        
  • System setup (Python install, IDE setup, etc.) out of scope
  • All steps of the tutorial as git tags (e.g., python-skeleton and python-version)
  • Tested on MacOS and Linux only (WSL should work)
  • I use VSCode and Jupyter lab for development and prototyping

Design decisions

  • ADT / Data class represents the GMM
    • Numpy arrays for parameters
    • Additional dimensions instead of lists
  • No OOP
    • No OOP (easier to replace functions)
    • EM implemented as three functions
    • E-Step
    • M-Step variants: 2 Python + 1 Rust
    • Simple initialization:
      (random $\gamma_{\bm x,m}$ + M-Step)
  • Benchmark (timeit) and data generation
                                    
                                        @dataclass
                                        class GMM:
                                            means: NDArray[np.float64]   # k x d
                                            covs: NDArray[np.float64]    # k x d x d
                                            weights: NDArray[np.float64] # k

                                        def expect(
                                            gmm: GaussianMixtureModel,
                                            data: NDArray[np.float64]
                                        ) -> NDArray[np.float64]:
                                            ...

                                        def maximize(
                                            gmm: GaussianMixtureModel,
                                            responsibilities: NDArray[np.float64],
                                            data: NDArray[np.float64]
                                        ) -> None:
                                            ...
                                    
                                

The E-Step

$$ \begin{aligned} \forall m \in M, \bm x \in X&: \\ \bar \gamma_{\bm x,m} &= \pi_m \cdot p(\bm x | m) \\ &= \pi_m \cdot \mathcal N( \bm x; \bm\mu_m, \Sigma_m) \\ &= \pi_m \cdot \frac{1}{ \sqrt{(2\pi)^d \det({\Sigma_m})} } \exp \left(-\frac{1}{2}({\mathbf x}-{\bm\mu_m})^{\top}{{\Sigma_m}}^{-1}({\bm x}-{\bm\mu_m}) \right)\\ \gamma_{\bm x,m} &= \frac{\bar \gamma_{\bm x, m}}{ \sum_{m \in M} \bar \gamma_{\bm x, m} } \end{aligned} $$

Intimidating, but …

  • Only $n\times k$ Gaussians
  • And normalization (probabilities)
  • Try it out

The M-Step — Naïve

$$ \begin{aligned} \bm\mu_m &\leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot \bm x }{\sum_{\bm x \in X} \gamma_{\bm x,m}} \\ \Sigma_m &\leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot (\bm x - \bm\mu_m)^t\cdot (\bm x - \bm \mu_m) }{\sum_{\bm x \in X} \gamma_{\bm x,m}} \\ \pi_m &\leftarrow \frac{\sum_{\bm x \in X}}{\sum_{\bm x \in X} \gamma_{\bm x,m}} \end{aligned} $$
  • We will focus on the M-Step
  • Loops in python are slow. How can we avoid as many loops as possible
  • First naïve implementation (only one loop)

The M-Step — The physicist's way

$$ \Sigma_m \leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot (\bm x - \bm\mu_m)^t\cdot (\bm x - \bm \mu_m) } {\sum_{\bm x \in X} \gamma_{\bm x,m}}, \qquad \forall m \in M $$
$$ \Sigma_m \leftarrow \frac{\sum_{\bm x \in X} \gamma_{\bm x,m} \cdot \bar{\bm x}_m^t\cdot \bar{\bm x}_m } {\sum_{\bm x \in X} \gamma_{\bm x,m}}, \qquad \bar{\bm x}_m = \bm x - \bm \mu_m, \quad \forall m \in M $$
$$ \bar X_{k,n,d} \cdot \Gamma_{k,n} \cdot \bar X_{k,n,d} = \bm \Sigma_{k,d,d} $$
                                    
                                        knd, kn, knd -> kdd
                                    
                                
                                    
                                        einstein_sum_notation('knd, kn, knd -> kdd', data, responsibilities, data)
                                    
                                
                                    
                                        einsum('knd, kn, knd -> kdd', data, responsibilities, data)
                                    
                                
                                    
                                        np.einsum('knd, kn, knd -> kdd', data, responsibilities, data)
                                    
                                
  • Think in tensors to avoid loops
  • This is called Einstein notation
  • Included in Numpy!
  • Must be faster, right? Let's find out!

Benchmark


                                data, _ = gmm.make_blobs(n_samples=10000, centers=20, n_features=2, random_state=7)

                                model = gmm.initialize(data, 20)
                                print(",",model.means)
                                r = gmm.expect(model, data)
                            
Pure Python with einsum

                                    13 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
                                
Pure Python with loops

                                    7.37 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
                                
  • Python implementation is performant!
  • Surprisingly, the loop version is faster

Translating the M-Step to Rust


                            $ # Installation
                            $ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
                            $ rustup update # Optional: Update the tool chain
                            $ cd tutorial && git checkout rust-examples
                            tutorial$ # git checkout rust-implementation # spoiler alert!
                            tutorial$ (cd data; poetry run data) # we need data for the experiments
                            tutorial$ cargo run --example read_data
                            tutorial$ # cargo bench # run benchmarks later
                        
  • Install instructions
  • There is a bug in the read_data example, sorry!
  • Tested on MacOS and Linux only (WSL should work)

Loading numpy data

                            
                                fn main() {
                                    let data: Array2 = read_npy("data/data.npy").unwrap();
                                    println!("{}", data);

                                    let responsibilities: Array2 = read_npy("data/responsibilities.npy").unwrap();
                                    println!("{}", responsibilities);

                                    let means: Array2 = read_npy("data/means.npy").unwrap();
                                    println!("{}", means);
                                }
                            
                        
  • No unit tests (due to Python binding). Using examples instead
  • Wrong folder in the template

Array passing

                                
                                    use ndarray::prelude::*;

                                    pub fn foo(data: Array2) -> Array2 { Array2::::zeros((0,0)) }
                                
                            
                                
                                    use ndarray::prelude::*;

                                    pub fn foo(data: &Array2) -> Array2 { Array2::::zeros((0,0)) }
                                
                            
                                
                                    use ndarray::prelude::*;

                                    pub fn foo(mut data: &Array2, other: ArrayView2:: )  {
                                        temp.assign(&data);
                                    }
                                
                            
                                
                                    
                                
                            
  • Ownership to arrays can be passed to a function
  • Arrays can be passed as a mutable or immutable reference
  • A view to an array can be generated that holds a reference
  • A function can be written in a generic way to accept all
  • In Python you don't even need the type

Array handling

Creating sums
                                    
                                        let sum_responsibilities = responsibilities.sum_axis(Axis(0));
                                    
                                
                                    
                                        sum_responsibilities = responsibilities.sum(axis=1)
                                    
                                
Broadcasting
                                    
                                        let x = (&responsibilities.slice(s![.., .., NewAxis]) * &data.slice(s![.., NewAxis, ..]))
                                    
                                
                                    
                                        x = np.sum(data[np.newaxis, :, :] * responsibilities[:, :, np.newaxis], axis=1)
                                    
                                
Dot product
                                    
                                        let cov = &x.t().dot(&y)
                                    
                                
                                    
                                        covs = x.T @ y
                                    
                                

The M-Step

  • The C-way
  • Iterators
  • Parallelization
  • Benchmarking with criterion.rs

Creating the extension


                            tutorial$ git checkout extension-skeleton
                            tutorial$ # git checkout extension-final # spoiler alert!
                            tutorial$ maturin develop -r --strip # Builds the extensions and adds it to the venv
                            tutorial$ maturin build -r --strip # Creates a binary wheel
                        
  • PyO3 is a crate for the binding boilerplate
  • Maturin is a build tool. Shares pyproject.tomlwith poetry
  • We loose unit testing unless we create another crate

Benchmark


                                data, _ = gmm.make_blobs(n_samples=10000, centers=20, n_features=2, random_state=7)

                                model = gmm.initialize(data, 20)
                                print(",",model.means)
                                r = gmm.expect(model, data)
                            
Pure Python with einsum

                                    13 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
                                
Pure Python with loops

                                    7.37 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
                                
Parallel Rust

                                    3.49 ms ± 23.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
                                
  • MacBook, 2.6 GHz 6-Core Intel Core i7
  • Rust wins due to parallelization
  • More analysis desired

Takeaways

  • It is not difficult to write clean python packages!
  • It is not difficult to create mixed Rust/Python packages!
  • Loops are faster than einsum!
  • Rust is faster because of parallelization!
  • Simple things are not difficult in Rust!
  • The rest is (opinionated, not shown)!

Happy coding, Rustlings!