Skip to content

The objective of this project is to create a user-friendly notebook for health researchers who may lack expertise in computer science or data science. The notebook is designed to analyze the ethical and biased aspects of a given dataset, with an emphasis on fairness.

Notifications You must be signed in to change notification settings

priscia99/TIS-project-ethics-analysis

Repository files navigation

Technologies for Information Systems - Ethics Analysis

Introduction

This is an integrative project for the course Technologies for Information Systems at Politecnico di Milano. The development of this project has been supervised by Chiara Criscuolo (@chiaracriscuolo), Tommaso Dolci (@TommasoD) and Mattia Salnitri (@MattiaSalnitri).

The objective of this project is to create a user-friendly notebook for health researchers who may lack expertise in computer science or data science. The notebook is designed to analyze the ethical and biased aspects of a given dataset, with an emphasis on fairness.

To achieve this goal, we have combined three distinct notebooks that were originally created by Daniel Caputo (@DanielCaputo111296) for analyzing a diabetes dataset. This work is based on the paper Criscuolo, C., Dolci, T., Salnitri, M. (2022). Towards Assessing Data Bias in Clinical Trials. In: , et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare, which is available here.

The final notebook has been designed to be adaptable to various datasets. Users can adjust input parameters (such as column names, protected variables, sensitive attributes, target variables, etc.) before running the notebook. This flexibility allows researchers to use the notebook with different datasets and apply the techniques to evaluate fairness in various scenarios.

πŸ“¦ Installation

In order to execute the notebook on Google Colab you need to:

  1. Download this project and upload the whole folder to your google drive
  2. Enter the project and open the notebook using google colab
  3. Configure the setup cell to install the requirements and set the path_to_project variable (*check the πŸ’‘ TIP*)
  4. Enjoy!

The requirements are defined in the requirements.colab.txt file, the notebook will install them automatically in the google setup cell. Make sure to set the path_to_project and update the pip installation command based on your google drive folder structure.

πŸ’‘ TIP: Use the file explorer on the left of google colab to navigate to the project folder and copy the path.

conda create -n <name> python=3.9.6
conda activate <name>
pip install -r requirements.txt

If you are running the project on an Apple Silicon chip you can use the requirements.osx-arm64.txt file:

  • Make sure that the conda-forge channel is added:
conda config --add channels conda-forge
conda config --set channel_priority strict
  • Create the environment
conda create -n <name> python=3.9.6
conda activate <name>
conda install --file requirements.osx-arm64.txt

You can just install the requirements by running:

pip install -r requirements.txt

List of requirements

In order to create the list of requirements you can run:

pip list --format=freeze > requirements.txt

or if you are using conda and an Apple Silicon chip:

conda list -e > requirements.osx-arm64.txt

🧐 What is this notebook about?

The notebook uses various techniques and technologies, such as AIF360, fairlearn, RankingFacts, and scikit-learn, to preprocess and analyze the data, and to train and evaluate machine learning models. The notebook also includes visualizations and statistics to help understand the distribution and correlations of the data, and to identify any potential biases.

This notebook has been created starting from these three notebooks:

βš™οΈ Data configuration

PLEASE NOTE: The notebook must be configured with a dataset and some configuration variables, in the Configure the notebook section. Regarding the attribute selection and weighting, the notebook automatically computes the weights based on the 3 selected attributes with the hightest correlation to the target variable. It is possible to insert manually the selected attribute and the corresponding weight in the Configure the notebook section.

The protected attributes must be categorical and binary (0,1), but the original column must be mantained as a continuous variable.

πŸ“‚ Folder structure

tis-project-diabetes-analysis
β”œβ”€β”€ data
|   └── Diabetes_dataset.csv     // Pre-processed dataset
β”œβ”€β”€ old_notebooks
β”‚   β”œβ”€β”€ Diabetes_FairLearn.ipynb
β”‚   β”œβ”€β”€ Diabetes_AIF360.ipynb
β”‚   └── Diabetes_RankingFacts.ipynb
β”œβ”€β”€ RankingFacts                 // RankingFacts library
β”‚   β”œβ”€β”€ FAIR
β”‚   └── ...
β”œβ”€β”€ utils
β”‚   β”œβ”€β”€ data_preprocessing.py    // Data preprocessing functions
β”‚   β”œβ”€β”€ print_util.py            // Print and visualization functions
β”‚   └── util.py                  // Utility logic functions
β”œβ”€β”€ ethics_analysis_diabetes_example.ipynb
β”œβ”€β”€ ethics_analysis.ipynb
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.colab.txt
β”œβ”€β”€ requirements.osx-arm64.txt
└── requirements.txt

πŸ“š Resources

πŸ‘¨πŸΌβ€πŸ’» Group Members

About

The objective of this project is to create a user-friendly notebook for health researchers who may lack expertise in computer science or data science. The notebook is designed to analyze the ethical and biased aspects of a given dataset, with an emphasis on fairness.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published