Developing a Machine Learning-Based Cardiometabolic Disease Model for Predicting Liver Disease
Abstract
Cardiometabolic diseases, which are a leading cause of global mortality, are interconnected metabolic and cardiovascular disorders that include diabetes, MASLD and ischemic heart diseases. Predicting disease may help in its early diagnosis and treatment. Cohort studies are crucial in cardiometabolic disease research as it can give significant insight into disease demographics, prevalence and its prediction. Here, we utilise the data of a national longitudinal cohort study to investigate and predict liver disease. Clinical and anthropometric data of Phenome India Cohort $(n=207)$ were analysed and divided into subgroups based on the status of hepatic steatosis and fibrosis. Sixteen key metadata, including liver enzyme, renal, FibroScan and anthropometric parameters were used for initial model development, and eight parameters were identified using forward and recursive feature selection. Seven machine learning (ML) algorithms, namely Random Forest, XGBoost, CatBoost, SVM, Logistic Regression, Na"ive Bayes, and Neural Network, were trained on the new parameters, and data was split into training (75%) and testing (25%) sets. Models using all 16 features tended to overfit, achieving perfect performance on the training set but lower generalisation on the testing set. Feature reduction to eight resulted in a simpler model with similar performance. SVM provided the most desirable test performance among the seven algorithms achieving balance between sensitivity and specificity (accuracy 0.738, sensitivity 0.857, specificity 0.500, F1-score 0.814, ROC-AUC 0.724; 5-fold cross-validated accuracy 0.710 and ROC-AUC 0.741). Adjusting the decision threshold between 0.55 and 0.80 led to lower sensitivity at lower thresholds and high sensitivity at higher thresholds. The application of ML algorithms to clinical metadata can help in the prediction of liver disease.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Click here for more information on Copyright policy
Click here for more information on Licensing policy