1. Introduction
Last Updated: 2024-05-10
What is Kaggle?
Kaggle is the largest AI & ML community, the ultimate platform for data science and machine learning enthusiasts of all levels to level up with the latest techniques and technologies. Discover a vast repository of datasets, notebooks, and pre-trained models to kickstart your next project. Participate in competitions, learn from courses, and connect with a diverse community of over 18 million users from around the globe. Whether you're a beginner or a seasoned pro, Kaggle is the place to hone your skills, stay ahead of the curve, and collaborate on cutting-edge projects.
What you'll build
In this codelab you'll create, configure and launch a kaggle competition. You'll walk through the competitor experience and understand best practices for running an engaging competition.
What you'll learn
- Understand how to create and manage a Kaggle competition from the host's side
- Navigate the competitor experience, from exploration to submission
- Learn best practices for running an engaging competition
This codelab is focused on creating a competition quickly and leverages Kaggle's growing competition library.
What you'll need
- A recent web browser
- Basic knowledge of python
2. Getting set up
Create a Kaggle Account
Visit the Kaggle website (https://www.kaggle.com/) and click "Register" to create a free account.
Verify your account
- In the upper right corner of the page click on your profile image
- Click "Your Profile"
- Click on the "Settings" button on the right side of the profile content
- Under "Phone Verification" follows the instructions to verify your account
3. Creating your first competition
Introducing AI generated competition templates
AI Generated Competitions is a new feature on Kaggle that allows users to create machine learning competitions quickly and easily. It leverages AI to generate synthetic datasets that mimic the statistical properties of existing datasets without containing any personally identifiable information.
Here's how it works:
- Choose a template: Select from a list of templates based on different machine learning tasks (e.g., classification, regression).
- AI generates a dataset: Kaggle's AI creates a new dataset for your competition based on your chosen template. This dataset is similar to the original but uses a subset of features and has slightly different feature distributions.
- Customize your competition: Enter basic details like the competition name, description, and timeline. You can also choose the privacy settings for your competition.
- Launch: After finalizing the details and setting a launch, you're ready to launch your competition.
This feature streamlines the competition creation process, making it accessible to more users and enabling them to focus on the machine learning aspects rather than dataset preparation.
Create a competition
Navigate to https://www.kaggle.com/competitions/new, select "New AI Generated Competition"
Select the "Regression with a Crab Age Dataset" Competition.
Competition Details
Fill out a descriptive name and subtitle. For example, you could use ‘<Your Names>'s Test Crab Competition' as the title and ‘Creating my first competition to see how it works' as the subtitle. Note that the competition URL is automatically filled in based on the title.
Visibility and Access
We now need to set the visibility and access for the competition.
Visibility
- Public: Your competition is visible to anyone on Kaggle. It'll show up in search results, so anyone interested can join.
- Private: Your competition is hidden from public view. It won't appear in searches, and only people you specifically invite can participate.
Who Can Join
- Anyone: This is like an open door policy. Anyone on Kaggle can join your competition.
- Only people with a link: This is more exclusive. You'll generate a special link, and only people with that link can join.
- Restricted email list: This is the most controlled option. You provide a list of specific email addresses or domains (like @yourschool.edu), and only people with those addresses can join.
We'll talk more about the Enable Notebooks and Models setting later. For now, make sure it is toggled on. For our example competition set these settings to Private and Only people with the link.
Read and agree to the terms and click "Create Competition".
4. Understanding and configuring your competition
Behind the scenes we've created a completely new competition with a unique dataset. Let's do a quick review of the competition settings.
Host Tab
The host tab contains everything you need as a host to properly configure your competition. Specifically see the page list on the right of the page:
Basic Details
This section includes:
- General
- Privacy, Access & Resources
- Timeline
- Scoring & Teams
We covered the General and Privacy sections when launching the competition.
Timeline
The competition end date is timezone aware.
Scoring & Team
The Scoring & Team section allows you to control how many folks can join a team, how many times they can submit each day, and how many of their submissions they need to choose for final evaluation.
Images
Images allows you to customize the banner and thumbnail for your competition. This will affect the home page of the competition as well as the listing entry for your competition.
Hosts
Here you can add other Kaggle users as a host for your competition. Other hosts will have full access (including launching) to your competition.
Evaluation Metric
The Evaluation Metric tab is the heart of the competition. When creating a competition from scratch, here you need to do some careful thought about which evaluation (or scoring) metric to use, upload your solution file, define the public/private test split, and provide a sample submission. However since we used a generated competition we don't need to do any of this!
Scoring Metric
This determines how a submission is scored against the solution file. Each metric has documentation and actual code available.
Solution File
Since we are using a generated competition, this file is unique to your competition!
The Solution Sampling allows you to adjust the amount of the solution file that is used to score submissions during the competition (the public leaderboard) vs how many rows are used to determine the final leaderboard. During competition, users will be allowed to select (based on the Scored Private Submissions setting) which of their submissions to be used for the final leaderboard (called the Private Leaderboard here).
This process ensures that competitors are not rewarded for overfitting or flooding with submissions.
Sandbox Submissions
These allow competition hosts to ensure that scoring works as anticipated, and allows them to set "benchmark" submissions for competitors to compare against. These benchmark submissions will show up on the leaderboard.
Teams & Submissions
During the competition this allows the hosts to download all the scores, as well as manage teams. Before the competition starts, this is empty.
Launch Checklist
This will be covered in the next section!
5. Launching your competition
From the top of the competition page, click on the "Launch Checklist" button.
Launch Checklist
The Launch Checklist shows the required steps to take before launching a competition. Since we have already started from a competition template, most of these steps are already completed! There are only two tasks that remain, setting a deadline and updating the competition rules.
Set Deadline
First click on the arrow next to Set Deadline. Competitions usually last at least a couple months. The maximum length for a competition is one year.
Edit Rules
Your competition rules need to be updated from the default template before launching. If you are running this competition for a class or group this is a good place to put any information about expectations.
Launch
We are ready for launch! Go ahead and launch your competition! You are now ready for competitors to join!
6. Competitor Experience
Now that you've launched your competition, let's take a look at what the competitor experience looks like. We'll cover joining the competition and submitting a submission. For this, you can join the Google IO Demo Competition here: https://www.kaggle.com/competitions/google-io-demo-competition
Joining the competition
After navigating to the competition home page, click the "Join Competition" button in the upper right then read and acknowledge rules.
Making your first submission
Go to the code tab and click "New Notebook". This will open a notebook which will allow you to submit to the competition.
First we will read in the train and test data
# read the test and train data
train = pd.read_csv('/kaggle/input/google-io-demo-competition/train.csv')
test = pd.read_csv('/kaggle/input/google-io-demo-competition/test.csv')
Let's take a look at the data.
# take a look at some of the data
train.head()
Let's prepare the data for training. In this case we drop out Sex because it's not a numeric value. (Hint: figuring out how to include this should improve the performance of your model).
# drop out the results from the test data
data = train.drop(columns=[‘Age', ‘Sex'])
answers = train[‘Age']
Then we create a model. In this case we are doing a random forest model.
# imports for the model
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
model = RandomForestRegressor()
# train the model
model.fit(data, answers)
Create a submission:
predictions = model.predict(test.drop(columns=[‘Sex']))
submission = pd.DataFrame({‘id': test[‘id'], ‘Age': predictions})
submission.to_csv(‘submission.csv', index=False)
Then you can submit to the competition by selecting "Submit to Competition" on the right side menu.
Tips for running a great competition
- Make sure to include a starter notebook that makes a basic submission
- Encourage use of the discussions and sharing notebooks early in the competition
- Have fun!