A practical, top-down approach, starting with high-level frameworks with a focus on Deep Learning.
UPDATED VERSION: 👉 Check out my 60-page guide, No ML Degree, on how to land a machine learning job without a degree.
There are three main goals to get up to speed with deep learning:
- Get familiar to the tools you will be working with, e.g. Python, the command line and Jupyter notebooks
- Get used to the workflow, everything from finding the data to deploying a trained model
- Building a deep learning mindset, an intuition for how deep learning models behave and how to improve them
- Spend a week on codecademy.com and learn the python syntax, command line and git. If you don't have any previous programming experience, it's good to spend a few months learning how to program. Otherwise, it's easy to become overwhelmed.
- Spend one to two weeks using Pandas and Scikit-learn on Kaggle problems using Jupyter Notebook on Colab, e.g. Titanic, House prices, and Iris. This gives you an overview of the machine learning mindset and workflow.
- Spend one month implementing models on cloud GPUs. Start with FastAI and PyTorch. The FastAI community is the go-to place for people wanting to apply deep learning and share the state of the art techniques.
Once you have done this, you will know how to add value with ML.
Think of your portfolio as evidence to a potential employer that you can provide value for them.
When you are looking for your first job, there are four main roles you can apply for
- Machine Learning Engineering,
- Applied Machine Learning Researcher / Residencies,
- Machine Learning Research Scientist, and
- Software Engineering.
A lot of the work related to machine learning is pure software engineering roles (category 4), e.g. scaling infrastructure, but that's out of scope for this article.
It's easiest to get a foot in the door if you aim for Machine Learning Engineering roles. There are a magnitude more ML engineering roles compared to category 2 & 3 roles, they require little to no theory, and they are less competitive. Most employers prefer scaling and leveraging stable implementations, often ~1 year old, instead of allocating scarce resources to implement SOTA papers, which are often time-consuming and seldom work well in practice.
Once you can cover your bills and have a few years of experience, you are in a better position to learn theory and advance to category 2 & 3 roles. This is especially true if you are self-taught, you often have an edge against an average university graduate. In general, graduates have weak practical skills and strong theory skills.
You'll have a mix of 3 - 10 technical and non-technical people looking at your portfolio, regardless of their background, you want to spark the following reactions:
- the applicant has experience tackling our type of problems,
- the applicant's work is easy to understand and well organized, and
- the work was without a doubt 100% made by the applicant.
Most ML learners end up with the same portfolio as everyone else. Portfolio items include things as MOOC participation, dog/cat classifiers, and implementations on toy datasets such as the titanic and iris datasets. They often indicate that you actively avoid real-world problem-solving, and prefer being in your comfort zone by copy-pasting from tutorials. These portfolio items often signal negative value instead of signaling that you are a high-quality candidate.
A unique portfolio item implies that you have tackled a unique problem without a solution, and thus have to engage in the type of problem-solving an employee does daily. A good starting point is to look for portfolio ideas on active Kaggle competitions, and machine learning consulting projects, and demo versions of common production pipelines. Here's a Twitter thread on how to come up with portfolio ideas.
Here are rough guidelines to self-assess the strength of your portfolio:
Even though ML engineering roles are the most strategic entry point, they are still highly competitive. In general, there are ~50 software engineering roles for every ML role. From the self-learners I know, 2/3 fail to get a foot in the door and end up taking software engineering roles instead. You are ready to look for a job when you have two high-quality projects that are well-documented, have unique datasets, and are relevant to a specific industry, say banking or insurance.
Project Type | Base score |
---|---|
Common project | -1 p |
Unique project | 10 p |
Multiplier Type | Factor |
---|---|
Strong documentation | 5x |
5000-word article | 5x |
Kaggle Medal | 10x |
Employer relevancy | 20x |
- Hireable: 5,250 p
- Competative: 15,000 p
For most companies, the risk of pursuing cutting edge research is often too high, thus only the biggest companies tend to need this skillset. There are smaller research organizations that hire for these positions, but these positions tend to be poorly advertised and have a bias for people in their existing community.
Many of these roles don't require a Ph.D., which makes them available to most people with a Bachelor's or Master's degrees, or self-learners with one year of focussed study.
Given the status, scarcity, and requirements for these positions, they are the most competitive ML positions. Positions at well-known companies tend to get more than a thousand applicants per position.
Daily, these roles require that you understand and can implement SOTA papers, thus that's what they will be looking for in your portfolio.
Projects type | Base score |
---|---|
Common project | -10 p |
Unique project | 1 p |
SOTA paper implementation | 20 p |
Multiplier type | Factor |
---|---|
Strong documentation | 5x |
5000-word article | 5x |
SOTA performance | 5x |
Employer relevancy | 20x |
- Hireable: 52,500 p
- Competitive: 150,000 p
Research scientist roles require a Ph.D. or equivalent experience. While the former category requires the ability to implement SOTA papers, this category requires you to come up with research ideas. The mainstream research community measure the quality of research ideas by their impact, here is a list of the venues and their impact. To have a competitive portfolio, you need two published papers in the top venues in an area that's relevant to your potential employer.
Project type | Base score |
---|---|
Common project | -100 p |
An unpublished paper | 5 p |
ICML/ICLR/NeurIPS publication | 500p |
All other publications | 50 p |
Multiplier type | Factor |
---|---|
First author paper | 10x |
Employer relevancy | 20x |
- Hireable: 20,000 p
- Competitive roles and elite PhD positions: 200,000 p
Examples:
- My first portfolio item (after 2 months of learning): Code | Write-up
- My second portfolio item (after 4 months of learning): Code | Write-up
- Dylan Djian's first portfolio item: Code | Write-up
- Dylan Djian's second portfolio item: Code | Write-up
- Reiichiro Nakano's first portfolio item: Code | Write-up
- Reiichiro Nakano's second portfolio item: Write-up
Most recruiters will spend 10-20 seconds on each of your portfolio items. Unless they can understand the value in that time frame, the value of the project is close to zero. Thus, writing and documentation are key. Here's another thread on how to write about portfolio items.
The last key point is relevancy. It's more fun to make a wide range of projects, but if you want to optimize for breaking into the industry, you want to do all projects in one niche, thus making your skillset super relevant for a specific pool of employers.
Further Inspiration:
Learning how to read papers is critical if you want to get into research, and a brilliant asset as an ML engineer. There are three key areas to feel comfortable reading papers:
- Understanding the details of the most frequent algorithms, gradient descent, linear regression, and MLPs, etc
- Learning how to translate the most frequent math notations into code
- Learn the basics of algebra, calculus, statistics, and machine learning
- For the first week, spend it on 3Blue1Brown's Essence of linear algebra, the Essence of Calculus, and StatQuests' the Basics (of statistics) and Machine Learning. Use a spaced repetition app like Anki and memorize all the key concepts. Use images as much as possible, they are easier to memorize.
- Spend one month recoding the core concepts in python numpy, including least squares, gradient descent, linear regression, and a vanilla neural network. This will help you reduce a lot of cognitive load down the line. Learning that notations are compact logic and how to translate it into code will make you feel less anxious about the theory.
- I believe the best deep learning theory curriculum is the Deep Learning Book by Ian Goodfellow and Yoshua Bengio and Aaron Courville. I use it as a curriculum, and the use online courses and internet resources to learn the details about each concept. Spend three months on part 1 of the Deep learning book. Use lectures and videos to understand the concepts, Khan academy type exercises to master each concept, and Anki flashcards to remember them long-term.
Key Books:
- Deep Learning Book by Ian Goodfellow and Yoshua Bengio and Aaron Courville.
- Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD by Jeremy Howard and Sylvain. Gugger.
- Deep Learning with Python by François Chollet.
- Neural Networks and Deep Learning by Michael Nielsen.
- Grokking Deep Learning by Andrew W. Trask.
- FastAI
- Keras Slack
- Distill Slack
- Pytorch
- Emil Wallner
- S. Zayd Enam
- Catherine Olsson
- Greg Brockman V2
- Greg Brockman V1
- Andrew Ng
- Amid Fish
- Spinning Up by OpenAI
- Confession as an AI researcher
- YC Threads: One and Two
If you have suggestions/questions create an issue or ping me on Twitter.
UPDATED VERSION: 👉 Check out my 60-page guide, No ML Degree, on how to land a machine learning job without a degree.