Welcome to NCAR Dask Tutorial!
Organized by: Brian Vanderwende, Negin Sobhani, Deepak Cherian, and Ben Kirk
The materials and notebooks in this tutorial is published as a Jupyter book here.
Here you will find the tutorial materials from the CISL/CSG Dask Tutorial. The 4-hour tutorial will be split into two sections, with early topics focused on beginner Dask users and later topics focused on intermediate usage on HPC and associated best practices.
This tutorial is open to non-UCAR staff. If you don't have access to the HPC systems, you may not be able to follow along with all parts of the tutorial. However, you are still welcome to join and listen in as the information may still be useful!
Video Recoding: Will be available after the event
- Dask Overview
- Dask Data Arrays
- Dask DataFrames
- Dask + Xarray
- Dask Schedulers
- Dask on HPC Systems
- Dask Best Practices
Before beginning any of the tutorials, it is highly recommended that you have a basic understanding of Python programming and Python libraries such as NumPy, pandas, and Xarray.
This tutorial is open to non-UCAR staff. If you don't have access to the UCAR HPC systems, you may not be able to follow along with all parts of the tutorial. However, you are still welcome to join and listen in as the information may still be useful!
This is the preferred way to interact with this tutorial. Users with access to Casper can run the notebooks interactively, and will be able to save their work and pull in new updates. To connect to NCAR JupyterHub, please open this link in a web browser: https://jupyterhub.hpc.ucar.edu/
Next, clone the repository to your local directory:
git clone https://github.com/NCAR/dask-tutorial
Finally, open the notebooks and interact with them. Make sure to choose the "NPL 2023a" kernel.
Users without access to the NCAR/UCAR Casper cluster can only run through the first few notebooks. To run the notebooks locally:
First clone this repository to your local machine via:
git clone https://github.com/NCAR/dask-tutorial
Next, download conda (if you haven't already)
If you do not already have the conda package manager installed, please follow the instructions here.
Now, create a conda environment:
Navigate to the dask-tutorial/
directory and create a new conda environment with the required
packages via:
cd dask-tutorial
conda env update --file environment.yml
This will create a new conda environment named "dask-tutorial".
Next, activate the environment:
conda activate dask-tutorial
Finally, launch JupyterLab with:
jupyter lab
We welcome contributions from the community! If you have a tutorial you would like to add or if you would like to improve an existing tutorial, please follow these steps:
Fork the repository.
Clone the repository to your local machine:
git clone https://github.com/your-username/dask-tutorial-repository.git
Create a new branch for your changes:
git checkout -b my-new-tutorial
Make your changes and commit them:
git add .
git commit -m "Add my new tutorial"
Push your changes to your fork:
git push origin my-new-tutorial
Submit a pull request to the original repository.
If you have any questions or need help with the tutorials, please open a GitHub issue in the repository.
- NCAR CISL/CSG Team
- ESDS Initiative
The tutorials in this repository are released under the MIT License.