Thanks for checking out the contribution page!
amltk
is open to all open-source contributions, whether it be fixes,
features, integrations or even just simple doc fixes.
This guide will walk through some simple guidelines on the workflow
of amltk
but also some design principles that are core to its
development. Some of these principles may be new for first time
contributors and examples will be given where necessary.
Clone the repo manually or with the below hub
cli from GitHub.
hub repo fork automl/amltk
Below is a quickstart guide for those familiar with open-source development. You
do not need to use just
, however we provide it as a convenient workflow tool.
Please refer to the justfile
for the commands ran if you wish to use your own
workflow.
NOTE: If you are using Windows, please go to the Windows installation section.
# Install just, the Makefile-like tool for this repo
# https://github.com/casey/just#installation
sudo apt install just
# Make virtual env (however you like)
python -m venv .my-virtual-env
source ./.my-virtual-env/bin/activate
# Install the library with dev dependancies
just install
# ... make a new branch
just pr-feat my-new-feature
# ... make changes
# ... commit changes
# Run tests
just test
# Run the documentation, fix any warnings
just docs
# just docs-code # Run code and display output (slower)
# just docs-full # Run examples and code (slowest)
# Run pre-commit checks
just check
# ... fix anything that needs fixing
# Push to your fork
git push
# Create a PR (opening the browser too)
hub pull-request --browse
Below we will go into more detail on each of these steps.
The core workflows of amltk
are accessed through the justfile
It is recommended to have this installed with
their simple one-liners on their repo.
All of these were developed with bash in mind and your usage with other platforms may vary,
please use the justfile
as reference if this is the case.
If you are contributing from outside the automl
org and under your own
github profile, you'll have to create your own fork.
If you use the hub
tool from github, you can do this locally with:
# Clones this repo
hub clone automl/amltk
# Forks the repo to your own user account and sets up tracking
# to match your repo, not the automl/amltk version.
hub fork
To install amltk
for development, we rely on specific dependencies
that are not required for the actual library to run. These are listed
in the pyproject.toml
under the [project.optional-dependencies]
header.
You can install all of these by doing the following:
# Create a virtual environment in your preferred way.
python -m venv .my-virtual-env
source ./.my-virtual-env/bin/activate
# Install all required dependencies
just install
When you ran just install
, the tool pre-commit
was installed.
This is a framework that the repo has set up to run a set of code
quality tools upon each commit, fixing up easy to fix issues, run some
automatic formatting and run a static type checker on the code in the
repository. The configuration for pre-commit
can be found in
.pre-commit-config.yaml
.
To run these checks at any time, use the command just fix
, followed
by just check
. Any list of errors will be presented to you, and we
recommend fixing these before committing.
While these can certainly be skipped, these checks will be run using github actions, a Continuous Integration (CI) service. If there are problems you are not sure how to fix, please feel free to discuss them in the Pull Request and we will help you solve them!
To see a list of tools used and their purposes, please see the section on Code Quality.
We follow a Pull Request into main
workflow, which is essentially
that any contributions to amltk
should be done in a branch with
a pull request to the main
branch. We prefer a branch name that
describes the kind of pull request that it is. We have provided some
default options but please feel free to use your own if you are familiar
with these workflows:
# These utilities will pull the most recent `main` branch,
# create a new branch with your `branchname` and and push
# the new branch back to github
just pr-feat branchname # Creates a branch feat-branchname
just pr-doc branchname # Creates a branch doc-branchname
just pr-fix branchname # Creates a branch fix-branchname
just pr-other branchname # Creates a branch other-branchname
If you are unfamiliar with creating a PR on github, please check out this guide.
Please provide a short summary of your changes to help give context to any reviewers who may be looking at your code. If submitting a more complex PR that changes behaviours, please consider more context when describing not only what you changed but why and how.
We may ask you to break up your changes into smaller atomic units that are easier to verify and review, but we will describe this process to you if required.
Once your PR is submitted, we will likely have comments and changes that
are required. Please be patient as we may not be able to respond
immediately. If there are only minor comments, we will simply annotate
your code where these changes are required and upon fixing them, we will
happily merge these into the main
branch and thank you for your
open-source contributions!
Good practice is to actually review your own PR after submitting it. You'll often find small issues such as out-of-sync doc strings or even small logical issues. In general, if you can't understand your own PR, it's likely we won't either.
If the PR requires larger structural changes or more discussion, there will likely be a few back-and-forth discussion points which we will actively respond to help get your contribution in.
If you do not wish to actively monitor the PR for whatever reason, granting us access to modify your PR will substantially help integration. To do so, please follow the instructions here.
This library uses conventional commits
as a way to write commits. This makes commit messages simpler and easier
to read as well as allows for an easier time managing the repo, such as
changelogs and versioning. This is important enough that we even enforce
this through pre-commit
to fail the commit if the message does not follow
the format. Please follow the link above to find out more but for reference,
here are some short examples:
fix(scheduler): Use X instead of Y
feat(pipeline): Allow for Z
refactor(Optuna): Move integrations to seperate file
doc(Example): Integrating custom space parser
Our testing for the library is done using pytest
,
with some additional utilities from pytest-coverage
for code coverage and pytest-cases
for test structure.
In general, writing a test and running just test
to test the whole library should be sufficient.
If you need more fine-grained control, such as only testing a particular test, please refer to this
cheatsheet.
pytest # Test whole library and examples
pytest "tests/path/to/testfile.py" # Test a particular file
pytest -k "test_name_of_my_test" # Test a particular test
⚠️ In general, you should prefer to runjust test
overpytest
if new to testing. This will run all test until it hits its first failure which allows for better incremental testing. It will also avoid running the examples which are often longer and saved for CI.
If testing any added examples, please use the just test-examples
command, which is
a shortcut for pytest "tests/test_examples.py" -x --lf
. There is unfortunately no way
to sub-select one.
If you are not sure how to test your contribution or need some pointers to get started, please reach out in the PR comments and we will be happy to assist!
To ensure a consistent code quality and to reduce noise in the PR, there are a selection of code quality tools that run.
These will be run automatically before a commit can be done with
pre-commit
. The configuration for this can be found in
.pre-commit-config.yaml
. All of these can be manually triggered using just check
.
pre-commit run --all-files
This will automatically run the below tools.
The primary linter we use is ruff
. The fixes and formatting
can be done manually as so:
ruff --fix src
ruff format src
# Or this which does both
just fix
This codebase also relies heavily on pythons typing
and mypy
to ensure correctness across modules. Running this standalone on all files can take some time,
so we don't require you to run this, our automated testers will. If you wish to do so manually,
then use just check-types
.
If any of the typing concepts are confusing, now is a good chance to learn,
and we would be happy to assist in helping properly type your PR if things do not work. If all else
fails, please feel free to introduce a # type: ignore
to tell mypy
to shut up along with a
description to why it is there. This will help future contributors and maintainers understand the
reasons behind these ignores. You can find a cheatsheet for basic mypy type
hinting here.
We follow a PR into trunk development flow (whatever that's meant to be called),
in which all development is done in feature branches and the merged into main
.
The reason for feature branches is to allow multiple maintainers to actively work on
amltk
without interfering. The main
branch is locked down, meaning commits can not be made directly to main,
and features actions to trigger releases.
The documentation is handled by mkdocs-material
with some additional plugins. This is a markdown based documentation generator which allows
custom documentation and renders API documentation written in the code itself.
Document features where the code lives. For example, if you introduce a new function, prefer to add documentation with an example in the docstring of the function itself. There is also some module level documentation which is used as a reference for the API. Most likely updating this is only required for larger changes.
You can find a collection of features for custom documentation here as well as code reference documentation here
You can find the entry point for the documentation infrastructure of mkdocs
in mkdocs.yml
.
You can live view documentation changes by running just docs
,
which will open your webbrowser and run mkdocs --serve
to watch all files
and live update any changes that occur.
Please do not ignore warnings. The CI will fail if there are any warnings and we will not merge any PR's that have warnings.
If creating an example, there is a custom format used to render .py
files
and convert them to markdown that we can host.
An example is just a python file, using the triple quote """
comments to switch between commentary and code blocks.
The first """
block is special, in that the first line,
in this case My Example Name is the name of the example,
with anything following it being simple commentary.
"""My Example Name
Here's a short description.
"""
from x import a
from y import b
"""
This is a commentary section. To see what can go in here,
take a look at https://squidfunk.github.io/mkdocs-material/reference/
Below we set some variables to some values.
!!! note "Special note"
We use the variables p, q for fun.
"""
p = 2 # (1)!
p = 3 # <!> (2)!
print(p) # (3)!
# 1. You can add annotations to lines, where the text to annotate goes at
the bottom of the code block, before the next commentary section.
https://squidfunk.github.io/mkdocs-material/reference/annotations/
# 2. You can use <!> to highlight specific lines
# 3. Anything printed out using `print` will be rendered
"""
This concludes this example, check out ./examples for examples on how
to create ... examples.
"""
This section serves as a guide for active maintainers of amltk
to
keep the ship running smoothly and help foster a growing user-base.
All maintainers must be familiar with the rest of the CONTRIBUTING.md
.
We appreciate all open-source contributions, whether that be a question, issue or PR. This also pertains to potentially first-time contributors and people new to Python and open-source in general. This includes objective non-personal criticisms. We will try to be as helpful and communicative as possible with respect to our availability, and encourage open discussion.
To foster growth and contribution, we will guide users through the library as required and encourage any and all contributions. If more work is required on a PR, please encourage users to grant access to their fork such that we can actively contribute to their contribution and utilize a collaborative approach. This will also help prevent staling contributions.
In the event of any individual who makes personal attacks or derogative comments, please maintain decorum, respond nicely, and if issues persist, then inform the user they will be blocked.
Please check the CODE_OF_CONDUCT.md
for more details.
We use squash-merge
from feature branches to keep the commit log
to the convential-commits
standard. This helps automate systems.
Please familiarize yourself with conventional commits and ensure that
the PR is up-to-date with the main
branch before merging.
There should be no manual versioning, as this will take place during releases automatically.
In the case of staling PR's, these will likely need a forceful rebase
from the main
branch. This often has a negative impact on the commit
history of a pull request but this will be removed by squash-merge
.
To keep things relatively uniform, we try support recommended workflows
through the justfile
. If there is a workflow that you prefer and is
not covered, please add your own.
Maintaining repositories is time-consuming work,
whether that be benchmarking, experimenting, testing, versioning,
issues, pull requests, documentation and anything else tangential to
code features. Any and all automation to the repository is greatly
appreciated but should be documented in the Maintainers
section.
One of the hardest parts of maintenance for a mature library,
especially one that supports integrations from both mature and
research code is managing dependencies. Where possible,
prefer not adding an explicit dependency. This mainly holds for
the required dependencies which all users must install. For
developer dependencies, please feel free to add one with good
justification. When integrating some machine learning ecosystem like
scikit-learn
or pytorch
, please try to bundle these dependencies
as optional and reflect so accordingly in the code.
There is some utility to work with optional dependencies in amltk.types
,
such as safe_isinstance
and safe_issubclass
, to not rely on the
library being installed for runtime type checking. For static
compile time type checking, please use mypy's if TYPE_CHECKING:
idiom. This will prevent runtime errors for users who do not have
these dependencies installed. For example:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from ConfigSpace import ConfigurationSpace
def draw_configspace(self, space: ConfigurationSpace) -> None:
...
The exception to this rule is any modules a user must explicitly import
for the integration. In this case, it is fine to assume the user has the
required dependencies and any error generated is considered user error and
if possible guide them to the pip install "amltk[optional_dep]"
that
they require for the integration.
We have dependabot
enabled in the repository using
the .github/dependabot.yml
. This bot will periodically
make pull requests to the repository that update dependencies. Do
not accept these blindly but rather wait for any CI to finish and
ensure all tests still pass.
If you are not using Windows, feel free to skip this section.
- Install WSL and Ubuntu by following the steps outlined in the official Ubuntu installation guide.
- Open the cloned repo in PyCharm and navigate to "Add new Interpreter" -> "On WSL..."
- Choose WSL and specify the directory of the virtual env.
- Open a PyCharm terminal and click "New predefined session" and select Ubuntu.
Since WSL is a Linux environment, you need to install Python separately, even if you have it on your Windows machine.
In the terminal, run the following commands to set up the project dependencies:
sudo apt update && sudo apt upgrade
sudo apt install python3.10
sudo apt install python3-pip
pip install --upgrade pip setuptools
sudo apt install python3.10-venv
python3 -m venv .venv
Then create a virtual environment and activate it:
source .venv/bin/activate
just install