See the react interface live here (for as long as I keep the API running).
Search for Steam games based off a text description.
Describe the kind of game you're looking for, and get a list of games that closest match the vibe you're looking for based off the store description and reviews.
- Python
- Linux or Mac
- Ability to execute
.sh
scripts. - Windows untested, might work with WSL. (GPU support unknown/unlikely)
- Ability to execute
- ~20 GB disk space to hold libraries and models.
- A GPU with > 6 GB VRAM
- Lots of time 😊
- Clone this repository.
- Install Python, if you don't already have it.
Download prepopulated database here (X.XX GB).Will let you skip to step 04, bypassing GPU requirement and many hours of calling Steam API.- Not yet available.
- There are multiple numbered folders in this repository that should be run in order.
- In each directory, there is an
init.sh
script that will create a virtual environment directory (e.g..venv_01
for step 01) and install the required dependencies for that step. - Once the virtual environment is created, activate it (e.g.
source ./.venv_01/bin/activate
), then runpython run.py
. - The first step will initialize and populate a database.
- You should use the same database file for all steps.
- Scripts will only add/process new data for that step, so you can run them multiple times as you add more data to the database.
- Requirements: A web connection, time (~6 hours)
- Overview: Queries the Steam API to populate the database with steam app ids, game names, store descriptions, and top 100 reviews (decided by Steam).
- Use
--new
to create a new database. - Rate limited to 1 request per 1.5 seconds, to meet Valve's rate limiting policy.
- Currently only populates randomly selected app ids with store descriptions and reviews per run.
- Controllable with
--limit <number>
(default 5,000) - Populating the database with all 180k app ids and their reviews will take about a week.
- Controllable with
- Takes about 2 hours for 1,000 app ids, 6-10 hours for 5,000 app ids.
- Re-running the script with the same database will add more app ids to the database.
- Example invocation:
python run.py --db ./steam.db --limit 1000
- Requirements: A CUDA GPU (highly recommended), or without that, time (a couple hours)
- Overview: Creates embeddings for the store descriptions and reviews.
- Uses Instructor to create embeddings.
- Can specify which model is used with the
--model-name
argument.--model-name hkunlp/instructor-large
(default, requires 2.5 GB VRAM)--model-name hkunlp/instructor-xl
(highly recommended, requires ~6 GB VRAM)
- With RTX 3090, takes about 30 minutes per 5,000 new items added by step 01.
- Example invocation:
python run.py --db ./steam.db --model-name hkunlp/instructor-xl
- Requirements: A good CPU is recommended
- Overview: Uses hnswlib to create a nearest neighbor index for the embeddings.
- Work in progress.
- Requirements: Fairly low. A CPU with 3-6 GB of RAM.
- Overview: Queries the database for games that match a given query.
- Takes about 10 seconds to complete and return results.
- Most of the time spent in this script is loading Instructor to generate an embedding for your query.
- Important: Use the same
--model-name
you used to generate the database in step 02. - You can use
--query-for-type <type>
to limit search toall
,description
orreview
embeddings. - Example invocation:
python run.py --db ./steam.db --query "I want a game that is like a mix of Minecraft and Skyrim"
- Requirements: Same as 04
- Overview: A Flask API that can be used to query the database.
- Important: Not very configurable at the current moment. Has hardcoded assumptions about the database and model.
- Includes Dockerfile and docker-compose.yml for easy deployment.
- Can also be executed with
gunicorn
orflask run
. - Example invocation:
python run.py
- Requirements: Static file hosting.
- Overview: A simple React interface for querying the API.
- Important: Not very configurable at the current moment. API is hardcoded to my home computer.
- Use
npm run build
to build the static files (available in./build
). - Use
npm start
to start a development server.
Note: These results are from a small testing dataset of 5,000 games. The results should be better with a larger dataset.
Output is formatted as:
<app_id>: <game_name> (<source>)
Match: <cosine_similarity>
app_id
: The Steam app id for the game.game_name
: The name of the game.source
: The source of the embedding. Eitherdescription
orreview
.cosine_similarity
: The cosine similarity between the query and the embedding.
1748370: The Hunt (review)
Match: 78.49%
1466060: Tainted Grail: The Fall of Avalon (review)
Match: 78.28%
611670: The Elder Scrolls V: Skyrim VR (review)
Match: 78.03%
2157290: Azarine Heart (review)
Match: 77.88%
641980: Block Survival: Legend of the Lost Islands (review)
Match: 77.04%
367030: Lootfest (review)
Match: 76.45%
2175570: Eternamine (review)
Match: 76.21%
1604030: V Rising (review)
Match: 76.20%
1179410: Lootcraft (review)
Match: 76.15%
1507910: Survival Engine (review)
Match: 76.06%
848260: Super GTR Racing (description)
Match: 80.96%
1586930: Extreme racing (description)
Match: 80.66%
808080: Ultimate Racing 2D (description)
Match: 80.59%
1318280: OverShoot Battle Race (review)
Match: 80.46%
661290: Arizona Derby (review)
Match: 80.11%
397640: Nulldrifters (review)
Match: 79.83%
1814150: Atomic Cyclecar Racing (review)
Match: 79.77%
2178650: Fastraq (description)
Match: 79.64%
1634090: Mini Racing World (review)
Match: 79.59%
1927710: Supreme Race on Highway (review)
Match: 79.41%
1758320: Lofty Quest (review)
Match: 85.18%
360840: Lumini (review)
Match: 83.94%
1783370: Aery - Dreamscape (review)
Match: 83.53%
1536520: Equinoxe (review)
Match: 81.90%
612740: Bokida - Heartfelt Reunion (review)
Match: 81.85%
406220: Gnomes Vs. Fairies (review)
Match: 81.68%
2487460: Lost Forest (review)
Match: 81.53%
1121140: Ionia (review)
Match: 81.53%
1941890: Santo Island Incident (review)
Match: 81.43%
1481000: Triversal (review)
Match: 81.43%
Related Repository: https://github.com/Netruk44/repo-search
- Previous version of this idea.
- Searches through text documents/code for content that matches a description.