tensor parallel for customized model #6471

zzxslp · 2024-09-02T00:41:19Z

zzxslp
Sep 2, 2024

Hi! If I want to do model inference with a customized model, how do I enable tensor parallel to shard the model across multiple GPUs? I couldn't find a clear instruction on how to set the correct inject_policy, or if there are other solutions.

For my specific case, I have a multimodal-LLM, with a ViT, projector, and an LLM, but not sure how to evaluate it in a sharded way in deepspeed.

zzxslp · 2024-09-04T21:33:55Z

zzxslp
Sep 4, 2024
Author

Just follow-up on this question, grateful if anyone has some suggestions!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor parallel for customized model #6471

{{title}}

Replies: 1 comment

{{title}}

Select a reply

tensor parallel for customized model #6471

zzxslp Sep 2, 2024

Replies: 1 comment

zzxslp Sep 4, 2024 Author

zzxslp
Sep 2, 2024

zzxslp
Sep 4, 2024
Author