Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Better doc for distributed RBs #2378

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
amend
  • Loading branch information
vmoens committed Aug 7, 2024
commit 68d9819ef5d5c2daf0014c072f38eda57a41277e
4 changes: 2 additions & 2 deletions examples/replay-buffers/distributed_replay_buffer.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Example use of a distributed replay buffer
==========================================
Example use of a distributed replay buffer (custom)
===================================================

This example illustrates how a skeleton reinforcement learning algorithm can be implemented in a distributed fashion
with communication between nodes/workers handled using `torch.rpc`.
Expand Down
12 changes: 3 additions & 9 deletions examples/replay-buffers/distributed_replay_buffer_multiproc.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Example use of a distributed replay buffer
==========================================
Example use of a distributed replay buffer (single node)
========================================================

This example illustrates how a skeleton reinforcement learning algorithm can be implemented in a distributed fashion
with communication between nodes/workers handled using `torch.rpc`.
Expand All @@ -20,15 +20,9 @@

"""

import os
import sys
import time

import torch.distributed.rpc as rpc

from distributed_rb_utils import main
from torch import multiprocessing as mp

from distributed_rb_utils import main

REPLAY_BUFFER_NODE = "ReplayBuffer"
TRAINER_NODE = "Trainer"
Expand Down
14 changes: 3 additions & 11 deletions examples/replay-buffers/distributed_replay_buffer_submitit.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Example use of a distributed replay buffer
==========================================
Example use of a distributed replay buffer (submitit)
=====================================================

This example illustrates how a skeleton reinforcement learning algorithm can be implemented in a distributed fashion
with communication between nodes/workers handled using `torch.rpc`.
Expand All @@ -20,18 +20,10 @@

"""

import os
import sys
import time

import submitit

import torch.distributed.rpc as rpc

from distributed_rb_utils import main
from torch import multiprocessing as mp
from torchrl._utils import logger as torchrl_logger

from distributed_rb_utils import main

DEFAULT_SLURM_CONF = {
"timeout_min": 10,
Expand Down