Heatchmap: A Gaussian process approach to predict hitchhiking waiting times
TL;DR¶
We applied a probilistic machine learing approach to a dataset of hitchhiking waiting times around the world. In comparison to previous endeavors we are able to make waiting time predictions for any spot on the globe that results in a complete world map that is also more accurate than previous partial maps despite our chosen performance metric and experienced hitchhikers. In addition we are able to model uncertainties of waiting time predictions to not pretend that our predictions are accurate where there is no evidence and to call hitchhikers to share their experience especially in the regions where we lack data till this day.
To achieve this we applied a Gaussian process on data points in a 2D feature space (longitude, latitude) and quite noisy target values (waiting times). Specifically we made use of a kernel that involves three radial basis function kernels that each model a different scale on which waiting times in hitchhiking depend on each other. Besides that we built the model so that its predictions are inherently constrained to positive waiting times. Eventually we demonstrated how domain expertise can be used to tune the models hyperparameters.
The GP-kernel we applied would look as follows for 1-dimensional inputs:
$$k_{3\_RBFs}(x, x') = \sum_{i \in \{a, b, c\}}{\sigma_i}^2 \exp(- \frac{(x - x')^2}{2 l_i^2}) + \sigma_n \mathbb{I}(x = x')$$
The data challenges we had to face and migth be transferable to other GP problems:
- extremely noisy target values
- target values are not normally distributed and constrained to be greater than 0
- 2D feature-space
- varying density of data in the feature-space that prohibits the use of a simple RBF kernel
Our work resulted in a model that enabled us to draw the following map where a region becomes more grayed out if the model is more uncertain about how long a hitchhiker would wait in that region.
show_map()