You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromtransformersimportAutoTokenizer, AutoModelForSeq2SeqLMimporttimeimporttorchckpt_path="./models/glm/glm_10b_cn"#tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b", trust_remote_code=True)#model = AutoModelForSeq2SeqLM.from_pretrained("THUDM/glm-10b", trust_remote_code=True)tokenizer=AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)
model=AutoModelForSeq2SeqLM.from_pretrained(ckpt_path, trust_remote_code=True)
model=model.half().cuda()
model.eval()
# Inference#inputs = tokenizer("Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a pioneer in online education, Ng co-founded Coursera and deeplearning.ai.", return_tensors="pt")whileTrue:
t0=time.time()
inputs=tokenizer("橘子的颜色是[MASK]。", return_tensors="pt")
inputs=tokenizer.build_inputs_for_generation(inputs, max_gen_length=512)
inputs=inputs.to('cuda')
outputs=model.generate(**inputs, max_length=512, eos_token_id=tokenizer.eop_token_id)
torch.cuda.synchronize()
print("cost time", time.time() -t0)
print(tokenizer.decode(outputs[0].tolist()))
推理硬件:
GPU:
2卡x80G A100
CPU:
processor : 27
cpu family : 6
model : 106
model name : Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz
cpu MHz : 2294.608
cache size : 55296 KB
您好,使用libai做glm-10b-chinese推理加速,目前现象:libai 2卡推理耗时是huggingface单卡耗时的两倍(0.6s vs 0.3s),请帮忙分析一下原因,多��。
libai推理代码:
python3 -m oneflow.distributed.launch --nproc_per_node 2 demo.py
huggingface推理代码:
推理硬件:
相关库:
The text was updated successfully, but these errors were encountered: