Predicting non-specific myocardial fibrosis in clinical setting in a large cohort of young and veteran athletes using a powerful machine learning model

Read the full research here

E Androulakis,ย N Dikaros,ย E Papatheodorou,ย A Merghani,ย S Fyyaz,ย G Perry-Williams,ย S Sharma,ย M Papadakis. European Journal of Preventive Cardiology.

Abstract:

Background

Non-specific myocardial fibrosis (NSMF) is a heterogeneous entity with potentially significant clinical implications. We aimed to create a machine-learning (ML) based model for its prediction, based on thorough baseline investigations in a large cohort of young and veteran athletes.

Methods

We analysed data from 980 young and veteran athletes referred to our Sports Cardiology service. All athletes underwent comprehensive evaluation with 12-lead ECG, Holter, cardiopulmonary exercise test (CPET), and cardiac magnetic resonance. After excluding individuals with well-defined cardiac conditions, we identified 61 young athletes with NSMF, Group A1, and compared them with a matched group of 75 athletes with no fibrosis, Group A2. We also identified two groups of veteran athletes, Group B1 comprised of 112 athletes with NSMF and Group B2 of 458 athletes with no fibrosis. 706 athletes were included in the analysis. A NSMF prediction model has not been attempted before, hence selected baseline characteristics including demographics, ECG findings, exercise intensity, ventricular arrhythmia from holter/CPET, and echocardiography were tested to train a model using Python programming. We tested various ML algorithms to create a model which classifies into two distinct classes; Class A which contains athletes with no or insertion point fibrosis, and Class B which contains the athletes with NSMF. We created 4 different classifiers; a logistic regression classifier, a random forest classifier, a naive Bayes classifier, and a voting classifier which chooses the class predicted by the majority of the previous three.

Results

A dummy classifier was used to predict class A and would achieve an accuracy of ~73% based on the patientsโ€™ characteristics. We therefore used a classifier which a-priori predicts Class A as a baseline, to which we compared the results of our trained classifiers. We split the dataset, trained the model, and measured its accuracy on the test set for each classifier. ML classifiers had the following accuracy classification results;.

A-priori classifier: 73%. Naive Bayes: 83% <0.001. Logistic Regression: 87% <0.001. Voting classifier: 89% <0.001. Random forest: 90% <0.001.

Testing the models derived by ML, they all significantly outperformed the benchmark method and the best accuracy was achieved by the random forest classifier with 90% accuracy. Of note, age was not included in the model as did not provide any additional predictive information.

Conclusions

This study showed ECG findings, ventricular arrhythmia, performance in CPET, and clinical symptoms can be used to develop a powerful prediction model of NSMF in clinical practice for both young or veteran athletes, independent of age. Testing ML-derived models, they all significantly outperformed the benchmark method, and the best accuracy was achieved with 90% certainty. Pending is the creation of an online platform that can be used by the responsible clinicians in practice.