Replies: 1 comment
-
Just follow-up on this question, grateful if anyone has some suggestions! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! If I want to do model inference with a customized model, how do I enable tensor parallel to shard the model across multiple GPUs? I couldn't find a clear instruction on how to set the correct inject_policy, or if there are other solutions.
For my specific case, I have a multimodal-LLM, with a ViT, projector, and an LLM, but not sure how to evaluate it in a sharded way in deepspeed.
Beta Was this translation helpful? Give feedback.
All reactions