Author(s): Fourie, Alexander (2024)
Abstract:
Random forests are one of the most widely used machine learning methods that allow for high interoperability and explainability. They make up for where the standard decision tree falls short, namely in the cases of over fitting, poor generalization, and better handling of outliers and noise. In addition, they are powerful tools that can be used to solve a variety of nonlinear classification and regression problems. Their applications extend to the domains of healthcare, finance, marketing, and data mining. In an effort to enable C++ applications to store and exchange tree based models, researchers developed Treelite and its sub-module TL2cgen. These libraries allow for minimal code duplication and for models to be stored in a platform independent format. In this project the focus has been placed on investigating and analysing the runtime for generating predictions for random forests using TL2cgen and Treelite, with an emphasis on the former. The results show that the thread pool implementation of Treelite is not the governing factor in its superior runtime to TL2cgen.
Document(s):
Fourie_BA_EEMCS.pdf