Published, Version of Record (VoR)CC BY V4.0, Open
Abstract
Business & Economics Science & Technology Economics Social Sciences Technology Transportation
In regions with scarce data, such as Norway, predicting cost performance in large-scale road (LSR) projects presents a unique challenge due to the high risk of cost overruns and their significant economic implications. This study aims to develop a data-driven framework for predicting cost performance in LSR projects by combining synthetic data generation and machine learning models. The approach employs synthetic data generation via Conditional Generative Adversarial Networks (CTGAN) to enhance the data pool and improve predictive accuracy. By integrating 173 synthetically generated samples with 52 actual project samples, a robust dataset of 225 road projects was created. Three machine learning classifiers (i.e., XGBoost, MLP, and SVM) were applied to this enriched dataset. The models achieved an average accuracy of 0.76 and an F1 score of 0.74 when tested against real-world data, demonstrating substantial alignment with actual project outcomes. Further validation with 5fold cross-validation on the combined datasets confirmed the consistency of these results, with similar accuracy and F1 scores. This research highlights the effectiveness of synthetic data in overcoming the limitations of small datasets and underscores its potential to substantially improve decision-making in highway engineering by providing more accurate, data-driven insights for project planning, design, and management.
Metrics
1 Record Views
Details
Title
Predicting cost performance in road projects with limited data: Exploring synthetic data generation using CTGAN
Creators
Ali Foroutan Mirhosseini - Norwegian Public Roads Administration
Kelly Pitera - Norwegian University of Science and Technology
James Odeck - Norwegian Public Roads Administration
Amirreza Rouhi - Drexel University, Electrical and Computer Engineering
Publication Details
Research in transportation economics, v 117, 101755