-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: __init__() got an unexpected keyword argument 'flags' #463
Comments
check your omegaconf version==2.1.0? |
I update the omegaconf to 2.1.0. But there is another error: bash tools/train.sh tools/train_net.py projects/T5/configs/mt Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system bein [02/22 17:17:52 libai]: Rank of current process: 0. World size: 2 from libai.config import LazyCall from configs.common.train import train from projects.T5.configs.optim import optim train_data_path = "projects/T5/data/training_data/part_0" micro_batch_size = 64 dataloaderdataloader = OmegaConf.create() model = LazyCall(T5ForPreTraining)(cfg=cfg) model configmodel.cfg.vocab_size = 12902 train.zero_optimization.enabled = True [02/22 17:17:53 libai]: Full config saved to projects/T5/output/mt5_output/config.yaml |
可以安装一下最新的oneflow: |
|
可以确认一下自己训练的时候用的什么规模的模型配置
libai里和其他的库比如megatron提供的都是模型的预训练任务,所以测试效果可以在测试集上跑一下预训练任务的指标,如果希望训练出完整的T5,也就是达到libai中利用T5权重做推理任务的话,还需要在多个下游任务上finetune预训练模型后测试效果 |
I want to run T5 example. This is my command. But there is an error. How can I fix it?
export CUDA_VISIBLE_DEVICES=2,3
bash tools/train.sh tools/train_net.py projects/T5/configs/mt5_pretrain.py 2
bash: /home/qyh/anaconda3/envs/syl-env/lib/libtinfo.so.6: no version information available (required by bash)
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W20230222 16:59:05.227458 4178164 rpc_client.cpp:190] LoadServer 127.0.0.1 Failed at 0 times error_code 14 error_message failed to connect to all addresses
Traceback (most recent call last):
File "/resources/qyh/big-model/libai/tools/train_net.py", line 25, in
from libai.config import LazyConfig, default_argument_parser, try_get_key
File "/resources/qyh/big-model/libai/libai/init.py", line 20, in
from libai import data
File "/resources/qyh/big-model/libai/libai/data/init.py", line 17, in
from .build import (
File "/resources/qyh/big-model/libai/libai/data/build.py", line 35, in
train_sampler=LazyCall(CyclicSampler)(shuffle=True),
File "/resources/qyh/big-model/libai/libai/config/lazy.py", line 123, in call
return DictConfig(content=kwargs, flags={"allow_objects": True})
TypeError: init() got an unexpected keyword argument 'flags'
Traceback (most recent call last):
File "/resources/qyh/big-model/libai/tools/train_net.py", line 25, in
from libai.config import LazyConfig, default_argument_parser, try_get_key
File "/resources/qyh/big-model/libai/libai/init.py", line 20, in
from libai import data
File "/resources/qyh/big-model/libai/libai/data/init.py", line 17, in
from .build import (
File "/resources/qyh/big-model/libai/libai/data/build.py", line 35, in
train_sampler=LazyCall(CyclicSampler)(shuffle=True),
File "/resources/qyh/big-model/libai/libai/config/lazy.py", line 123, in call
return DictConfig(content=kwargs, flags={"allow_objects": True})
TypeError: init() got an unexpected keyword argument 'flags'
Killing subprocess 4172867
Killing subprocess 4172868
Traceback (most recent call last):
File "/home/qyh/anaconda3/envs/py39/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/qyh/anaconda3/envs/py39/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/qyh/anaconda3/envs/py39/lib/python3.9/site-packages/oneflow/distributed/launch.py", line 240, in
main()
File "/home/qyh/anaconda3/envs/py39/lib/python3.9/site-packages/oneflow/distributed/launch.py", line 228, in main
sigkill_handler(signal.SIGTERM, None)
File "/home/qyh/anaconda3/envs/py39/lib/python3.9/site-packages/oneflow/distributed/launch.py", line 196, in sigkill_handler
raise subprocess.CalledProcessError(
subprocess.CalledProcessError: Command '['/home/qyh/anaconda3/envs/py39/bin/python3', '-u', 'tools/train_net.py', '--config-file', 'projects/T5/configs/mt5_pretrain.py']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered: