Logo image
Who’s at risk on the road? Demonstrating young and novice driver risk in low-to middle-income contexts
Journal article   Open access   Peer reviewed

Who’s at risk on the road? Demonstrating young and novice driver risk in low-to middle-income contexts

Luis A. Guzman, Ignacio Sarmiento-Barbieri, Olga L. Sarmiento, Darío Hidalgo and Alex Quistberg
Journal of safety research, v 97, pp 406-417
Jun 2026
Featured in Collection :   Drexel's Newest Publications
url
https://doi.org/10.1016/j.jsr.2026.03.015View
Published, Version of Record (VoR) Open

Abstract

Crash risks Logistic Regression Low- and middle-income countries Novice young drivers Random forests Road safety Colombia Machine Learning
•We use machine learning techniques to select variables and interactions to estimate road crash risk in Colombia.•Analysis covers Colombian license and fatal-injury‑crash data from 2007 to 2020.•Younger and less‑experienced drivers face the highest motor crash risk.•Crash risk was higher for males, motorcyclists, and drivers in some Colombian regions.•Graduated licensing is recommended to lower the crash risk among novice young drivers in Colombia. Young novice drivers experience higher crash rates, yet most studies focus on high‑income countries, leaving limited evidence from low‑ and middle‑income countries (LMICs). This study examined crash involvement and its risk factors in Colombia to inform improvements to the national driver‑licensing process. We analyzed the national driver‑license registry from 2007 to 2020 (n = 5,822,842) and all police‑reported road crashes during the same period (n = 541,134). Crash probability was modeled with a non‑parametric machine‑learning approach (random forests) and, for interpretation, a logistic regression that incorporated age, driving experience, license category, sex, number of fines, and region. Main effects explained 51% of the predictive variability, while interactions accounted for the remaining 49%. The random‑forest model achieved an F1 score of 96.38% with 93.9% of precision, indicating a low false‑positive rate and a recall of 98.9%. Driving experience exhibited the strongest interaction effects: interactions with other variables explained ∼ 2.0% of the variability, the pairwise interaction between experience and license category accounted for ∼ 0.4%, and the interaction between experience and region explained ∼ 0.25%. Logistic‑regression results corroborated the machine‑learning findings, revealing negative associations of both age and experience with crash probability across multiple model specifications. As expected, younger and less‑experienced drivers faced the highest crash risk. Risk also varied by sex (higher in males), by license category (higher among motorcyclists), and across regions. These findings suggest that implementing a graduated licensing system could reduce crash risk among novice and young drivers in Colombia and, by extension, in other LMICs.

Metrics

1 Record Views

Details

Logo image