This is an integrative project for the course Technologies for Information Systems at Politecnico di Milano. The development of this project has been supervised by Chiara Criscuolo (@chiaracriscuolo), Tommaso Dolci (@TommasoD) and Mattia Salnitri (@MattiaSalnitri).
The objective of this project is to create a user-friendly notebook for health researchers who may lack expertise in computer science or data science. The notebook is designed to analyze the ethical and biased aspects of a given dataset, with an emphasis on fairness.
To achieve this goal, we have combined three distinct notebooks that were originally created by Daniel Caputo (@DanielCaputo111296) for analyzing a diabetes dataset. This work is based on the paper Criscuolo, C., Dolci, T., Salnitri, M. (2022). Towards Assessing Data Bias in Clinical Trials. In: , et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare, which is available here.
The final notebook has been designed to be adaptable to various datasets. Users can adjust input parameters (such as column names, protected variables, sensitive attributes, target variables, etc.) before running the notebook. This flexibility allows researchers to use the notebook with different datasets and apply the techniques to evaluate fairness in various scenarios.
In order to execute the notebook on Google Colab you need to:
- Download this project and upload the whole folder to your google drive
- Enter the project and open the notebook using google colab
- Configure the setup cell to install the requirements and set the
path_to_project
variable (*check the π‘ TIP*) - Enjoy!
The requirements are defined in the requirements.colab.txt
file, the notebook will install them automatically in the google setup cell. Make sure to set the path_to_project
and update the pip installation command based on your google drive folder structure.
π‘ TIP: Use the file explorer on the left of google colab to navigate to the project folder and copy the path.
conda create -n <name> python=3.9.6
conda activate <name>
pip install -r requirements.txt
If you are running the project on an Apple Silicon chip you can use the requirements.osx-arm64.txt
file:
- Make sure that the
conda-forge
channel is added:
conda config --add channels conda-forge
conda config --set channel_priority strict
- Create the environment
conda create -n <name> python=3.9.6
conda activate <name>
conda install --file requirements.osx-arm64.txt
You can just install the requirements by running:
pip install -r requirements.txt
In order to create the list of requirements you can run:
pip list --format=freeze > requirements.txt
or if you are using conda and an Apple Silicon chip:
conda list -e > requirements.osx-arm64.txt
The notebook uses various techniques and technologies, such as AIF360, fairlearn, RankingFacts, and scikit-learn, to preprocess and analyze the data, and to train and evaluate machine learning models. The notebook also includes visualizations and statistics to help understand the distribution and correlations of the data, and to identify any potential biases.
This notebook has been created starting from these three notebooks:
PLEASE NOTE: The notebook must be configured with a dataset and some configuration variables, in the Configure the notebook
section. Regarding the attribute selection and weighting, the notebook automatically computes the weights based on the 3 selected attributes with the hightest correlation to the target variable.
It is possible to insert manually the selected attribute and the corresponding weight in the Configure the notebook
section.
The protected attributes must be categorical and binary (0,1), but the original column must be mantained as a continuous variable.
tis-project-diabetes-analysis
βββ data
| βββ Diabetes_dataset.csv // Pre-processed dataset
βββ old_notebooks
β βββ Diabetes_FairLearn.ipynb
β βββ Diabetes_AIF360.ipynb
β βββ Diabetes_RankingFacts.ipynb
βββ RankingFacts // RankingFacts library
β βββ FAIR
β βββ ...
βββ utils
β βββ data_preprocessing.py // Data preprocessing functions
β βββ print_util.py // Print and visualization functions
β βββ util.py // Utility logic functions
βββ ethics_analysis_diabetes_example.ipynb
βββ ethics_analysis.ipynb
βββ README.md
βββ requirements.colab.txt
βββ requirements.osx-arm64.txt
βββ requirements.txt
- Andrea Prisciantelli (@priscia99)
- Mattia Redaelli (@redaellimattia)