Paper 1: EU Public Procurement Contract Value Prediction: A Machine Learning Approach
Abstract: Using modern analytical tools, such as machine learning, organisations can collect and analyse large amounts of data related to suppliers, pricing, demand structure, and market trends. This study uses machine learning models to predict the value of EU public procurement contracts based on the Tenders Electronic Daily (TED) database. The analysis covers 13,345,120 initial contract records for the period 2006-2025 in 33 states and 63 procurement sectors. After careful data quality control procedures, the analytical set was formed by 10,038,018 valid contracts (75% retention rate). Three complementary methodologies were used: Random Forest regression to identify nonlinear patterns, ordinary least squares (OLS) regression to interpret coefficients, and K-means clustering to classify procurement behaviour at the country level. The Random Forest achieved cross-validation R²=0.2795 and the test R²=0.2613, with the country of origin dominating the predictive value (Germany: 24.39%, United Kingdom: 2.33%, Italy: 1.96%). Temporal features had a significance of 18.36%, while competition indicators (number of proposals: 8.99%) and structural characteristics (batch number: 12.29%) had a significant impact. The OLS regression showed statistically significant effects for countries: Germany showed 98.4% lower contract prices despite being Europe's largest economy, reflecting federal administrative fragmentation, while Italy showed 295.7% higher rates thanks to centralised infrastructure projects. The K-means clustering revealed three clear procurement profiles: Greece as a transparency-focused outsider (109 average bids per contract), 19 mature economies with high value, and 13 less valuable, fragmented systems, including Germany. The results show that institutional frameworks dominate economic factors in determining the value of contracts, which has political implications for the design of procurement systems across the EU.
Keywords: Public procurement; EU procurement; machine learning; Random Forest; Contract Value Prediction; TED database; institutional economics; OLS regression; K-means clustering; predictive modelling; data quality; EU AI Act