Hummingbird A library for accelerating inference with traditional machine learning models
Karla Saur/Matteo Interlandi

May 28, 2020
Traditional machine learning (ML), such as linear regressions and decision trees, is extremely popular. As shown in the chart below of the Kaggle Survey from 2019, the most popular ML algorithms are still traditional .

Recently, the ever-increasing interest around deep learning and neural networks has led to a vast increase in processing frameworks that are highly specialized and optimized for running these types of computations. Frameworks like TensorFlow, PyTorch, and ONNX Runtime are built around the idea of a computational graph that models the dataflow of individual units and have tensors as their basic computational unit. These frameworks can run efficiently on hardware accelerators (e.g. GPUs) and their prediction performance can be further optimized with compiler frameworks such as TVM.

Unfortunately, traditional ML libraries and toolkits (such as Scikit-Learn, ML.NET, and H2O) are usually developed to run on CPU environments. While they may potentially exploit multi-core parallelism to improve performance, they do not use a common abstraction (such as tensors) to represent their computation. The lack of this common extraction means that for these frameworks to make use of hardware acceleration, one would need to have many implementations ((for each operator) x (for each hardware backend)) which does not scale well. This means that traditional ML is often missing out on the potential accelerations that deep learning and neural networks enjoy.
Hummingbird is a library for compiling trained traditional ML models into tensor computations. Hummingbird allows users to seamlessly leverage neural network frameworks (such as PyTorch) to accelerate traditional ML models. Thanks to Hummingbird, users can benefit from: (1) all the current and future optimizations implemented in neural network frameworks; (2) native hardware acceleration; (3) having a unique platform to support for both traditional and neural network models; and have all of this (4) without having to re-engineer their models.

Currently, you can use Hummingbird to convert your trained traditional ML models into PyTorch. Hummingbird supports a variety of tree-based classifiers and regressors. These models include scikit-learn Decision Trees and Random Forest, and also LightGBM and XGBoost Classifiers/Regressors. Support for other neural network backends (e.g., ONNX, TVM) and models is on their roadmap.

Sorce: GitHub