Diabetes Analysis and Detection#
🚩 Analysis and Detection of Diabetes 🚩
The main objective of the following The project is to carry out an analysis of how a series of medical and other socioeconomic variables influence the presence of diabetes, as well as the detection of diabetes (or the risk of developing the disease) based on these variables.
🔸 The analysis part is based on descriptive statistics (EDA) as well as inferential statistics through the Logistic Regression model.
🔸 The detection part is developed following a Machine Learning methodology based on the optimization and comparison of alternatives (internal evaluation) and on the estimation of the future performance of the best alternative (external evaluation).
🔸 Special attention is paid to the fact that we are facing a very unbalanced classification problem, for this purpose appropriate techniques and metrics are used for these cases.
🔸 It is also shown how to obtain probabilistic predictions using Logistic Regression, which is the second best model of those considered, performing almost the same as the best one (XGBoost).
🔸 Probabilistic predictions can be interpreted as the risk of developing the disease, so that given a patient’s information, the system is capable of predicting the percentage of risk that the patient has of having diabetes, that is, the probability of presenting the disease. disease, which could be used as a support system in decision-making by health personnel.