Predicting non-responsiveness to lifestyle interventions in prediabetes: a machine learning approach

Sources of data and participants

The database used to develop the predictive algorithm was the International Center for Nutritional Status Assessment (ICANS, University of Milan, Milan, Italy) database, which contains data from an ongoing large-scale open cohort nutrition study . As part of the study protocol, all patients at baseline will undergo a complete nutritional assessment, lifestyle interventions and eventually pharmacological interventions will also be prescribed, and follow-up examinations will be scheduled. A more limited number of parameters are routinely collected during follow-up to assess changes in weight, body composition, and laboratory tests. The development of the algorithm included all prediabetic patients enrolled from 2009 to the beginning of 2019. The complete database contains 18.973 baseline observations and a total of 45.148 follow-up observations. In this study, we included a total of 59 variables from the database.

Patients included in this study were self-referred patients seeking a weight loss program, mainly residing in Milan or nearby cities, and newly or recently diagnosed with prediabetes. Eligibility criteria are as follows: Must be 18 years or older. Not pregnant or breastfeeding. No conditions that significantly limit movement or physical activity. No severe cardiovascular, neurological, endocrine, or psychiatric disease. Only lifestyle interventions were prescribed. The lifestyle intervention consisted of a low-calorie omnivorous diet with a Mediterranean pattern, with macronutrient and micronutrient levels set according to the Italian Recommended Daily Intake (5). Physical activity recommendations were also provided according to the WHO physical activity guidelines (6).

This study followed the principles established by the Declaration of Helsinki, and written informed consent was obtained from each subject. The Ethics Committee of the University of Milan (n. 6/2019) approved the study procedures.

Results and predictors

The outcome was normalization of blood glucose (binary, fasting blood glucose <100 mg/dL) within 1 year of starting the lifestyle intervention.

A total of 59 predictor variables were used in the analysis.

Demographic data: age, gender, education, occupation, marital status

Anthropometric measurements: height, weight, arm length, arm circumference, wrist circumference, waist circumference, subcutaneous fat of the biceps, subcutaneous fat of the triceps, subcutaneous fat of the subscapularis, skin fat of the upper arm, Muscle area, arm fat area, body density, fat mass, lean mass

Bioimpedance analysis: intracellular water, extracellular water

Abdominal ultrasound examination: sternal subcutaneous adipose tissue, sternal visceral adipose tissue, abdominal subcutaneous adipose tissue, abdominal visceral adipose tissue

Indirect calorimetry: oxygen consumption, carbon dioxide production, respiratory quotient, resting energy expenditure

Medical history: family status, menstruation, pregnancy, dietary status, dietary history, physical activity, smoking, medications, clinical symptoms, weight history.

Vital signs: heart rate, systolic blood pressure, diastolic blood pressure

Blood and urine tests: white blood cell count, red blood cell count, hemoglobin, mean corpuscular volume, glucose, total cholesterol, HDL cholesterol, LDL cholesterol, triglycerides, glutamate pyruvate transaminase, glutamate oxaloacetate transaminase, gamma glutamyl transferase, thyroid stimulating hormone, creatinine , uric acid, urea

Statistical and machine learning analysis techniques

All eligible patients at the time of the study were included and the sample size was determined (no pre-calculations were performed).

For algorithms requiring complete data, k-nearest neighbor imputation (Gower’s distance, number of neighbors = 5) was used to impute missing data during the preprocessing phase.

Maximum predictive strength was determined through optimization of correct classification rate (CCF) and receiver operating characteristic area under the curve (AUROC). Among accuracy and discriminatory ability, accuracy was chosen as the most relevant metric in clinical practice (i.e., maximizing CCF).

We compared several statistical and machine learning models using 10-fold cross-validation resampling. For models requiring tuning parameters, grids consisting of several combinations of tuning parameters were tested by 10-fold cross validation.

Before model selection, preprocessing steps were defined for each model to ensure the best predictive ability for a particular model. All preprocessing steps were repeated in each cross-validation fold to capture uncertainties regarding non-deterministic data manipulation.

Principal component analysis (PCA) was employed as an optional preprocessing step aimed at reducing the dimensionality of the dataset. In these cases, PCA was used to transform a reduced number of sets of predictors designed to capture the maximum amount of information in the original variables. A potential advantage of this approach, besides dimensionality reduction, is that it can generate statistically independent predictor variables that can ameliorate the problem of correlation between variables in a dataset.

The following models were evaluated:

Logistic regression

linear discriminant analysis

Quadratic discriminant analysis

Naive Bayes tuned for kernel smoothness and Laplace correction

K nearest neighbors adjusted to the number of nearest neighbors, and distance weighting function, Minkowski distance order

Ridge regression and LASSO, adjusting the amount of regularization and percentage of LASSO penalty

Decision tree (adjusted for tree depth, minimum node size, and cost complexity parameters)

Adjusted for the cost/complexity parameters used in the CART model, the maximum depth of the tree, the minimum number of data points within a node required to further split the node, and the cost value you assign to the node. Bagging tree. Class corresponding to first factor level

Random Forest. It is tuned for the number of randomly selected predictors, the number of trees, and the minimum node size.

Boosted trees, tree depth, number of trees, learning rate, number of randomly chosen predictors, minimum node size, minimum loss reduction, percentage of observations sampled, and number of iterations before stopping. adjusted accordingly.

Linear support vector machine tuned for cost and dead margin

Single-layer neural network tuned for number of hidden units, amount of regularization, and number of epochs

Sensitivity, specificity, positive predictive value, and negative predictive value were calculated for the best model as follows: sensitivity = TP/(TP + FN), specificity = TN/(TN + FP). , positive predictive value = TP/(TP + FP), negative predictive value = TN/(TN + FN), FN, false negative. FP, false positive; TN, true negative; TP, true positive;

All statistical analyzes were performed using R 4.1.1 (7). Model preprocessing, tuning, resampling, and fitting were performed by adding the Tidymodels package to R (see Appendix for algorithm-specific packages).

Source link

What's Hot

BMW M5 interior layout and technology

Inside Pregnant Megan Fox and Brian Austin Green’s Co-Parenting During MGK Split: Report

Oswego County Lobby Today: Pet of the Week

Predicting non-responsiveness to lifestyle interventions in prediabetes: a machine learning approach

Brockway’s Historic Wildwood Cemetery Walking Tour Set for November 2 | Lifestyle

How she built her Hollywood success and lavish lifestyle!

Latest Lifestyle News, Live Updates Today 31st October 2024: Siddharth Malhotra and Kiara Advani arrive in Delhi for Diwali. Choose simple clothes for the airport look

BMW M5 interior layout and technology

Inside Pregnant Megan Fox and Brian Austin Green’s Co-Parenting During MGK Split: Report

Oswego County Lobby Today: Pet of the Week

How to win over the dynamic Indonesian consumer

How to win over the dynamic Indonesian consumer

LG H&H, DSH to foster beauty technology startup

All the latest fashion, beauty and luxury news from the Middle East

What's Hot

Predicting non-responsiveness to lifestyle interventions in prediabetes: a machine learning approach

Sources of data and participants

Results and predictors

Statistical and machine learning analysis techniques

Related Posts

Subscribe to Updates