Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi card parallel #51

Open
Re-dot-art opened this issue Jan 17, 2024 · 17 comments
Open

Multi card parallel #51

Re-dot-art opened this issue Jan 17, 2024 · 17 comments

Comments

@Re-dot-art
Copy link

image
May I ask if there is a parallel issue with multiple cards here? Is there any solution? I used four v100 graphics cards to run.

@Adamdad
Copy link
Owner

Adamdad commented Jan 17, 2024

In the paper, we train all models with Distributed Data Parallel with 8xV100 GPU, so parallel training should be OK. Can you check your own environment and pytorch distributed config?

@Code-of-Liujie
Copy link

image

@Adamdad
Copy link
Owner

Adamdad commented Jan 17, 2024

image

Please indicate which config you are using. For all configs in this repo, we have model.train_cfg. For example

train_cfg=dict(
assigner=dict(type='DynamicSoftLabelAssigner', topk=13, iou_factor=2.0),
alpha=1,
beta=6,
allowed_border=-1,
pos_weight=-1,
debug=False),

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 17, 2024 via email

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 17, 2024 via email

@Adamdad
Copy link
Owner

Adamdad commented Jan 17, 2024

Do you want to try it remotely? 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 18:56 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Please indicate which config you are using. For all configs in this repo, we have model.train_cfg. For example

train_cfg=dict(
assigner=dict(type='DynamicSoftLabelAssigner', topk=13, iou_factor=2.0),
alpha=1,
beta=6,
allowed_border=-1,
pos_weight=-1,
debug=False),
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Sorry, but what do you mean by "try it remotely"?

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 17, 2024 via email

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 17, 2024 via email

@Adamdad
Copy link
Owner

Adamdad commented Jan 17, 2024

base = './consistent_teacher_r50_fpn_coco_180k_10p.py' fold = 1 percent = 10 data = dict( train=dict( sup=dict( ann_file="data/coco_semi/semi_supervised/instances_train2017.${fold}@${percent}.json", ), unsup=dict( ann_file="data/coco_semi/semi_supervised/instances_train2017.${fold}@${percent}-unlabeled.json", ), ), ) log_config = dict( delete=True, interval=50, hooks=[ dict(type="TextLoggerHook", by_epoch=False), dict( type="WandbLoggerHook", init_kwargs=dict( project="consistent-teacher", name="${cfg_name}", config=dict( fold="${fold}", percent="${percent}", work_dirs="${work_dir}", total_step="${runner.max_iters}", ), ), by_epoch=False, ) ], ) Where are you talking about train_cfg 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 19:54 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Do you want to try it remotely? 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 18:56 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Please indicate which config you are using. For all configs in this repo, we have model.train_cfg. For example

train_cfg=dict(
assigner=dict(type='DynamicSoftLabelAssigner', topk=13, iou_factor=2.0),
alpha=1,
beta=6,
allowed_border=-1,
pos_weight=-1,
debug=False),
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> Sorry, but what do you mean by "try it remotely"? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

In the first lime, we defaultly import hyper-parameters from another base config

_base_ = './consistent_teacher_r50_fpn_coco_180k_10p.py'

the train_cfg is in the base config.

Sorry, but I cannot not assist you through remote control.

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 17, 2024 via email

@Adamdad
Copy link
Owner

Adamdad commented Jan 17, 2024

Here this mmdetion is the latest version, but the mmdet in the environment ==2.28.1 is not caused by this reason. 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 20:09 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) base = './consistent_teacher_r50_fpn_coco_180k_10p.py' fold = 1 percent = 10 data = dict( train=dict( sup=dict( ann_file="data/coco_semi/semi_supervised/instances_train2017.${fold}@${percent}.json", ), unsup=dict( ann_file="data/coco_semi/semi_supervised/instances_train2017.${fold}@${percent}-unlabeled.json", ), ), ) log_config = dict( delete=True, interval=50, hooks=[ dict(type="TextLoggerHook", by_epoch=False), dict( type="WandbLoggerHook", init_kwargs=dict( project="consistent-teacher", name="${cfg_name}", config=dict( fold="${fold}", percent="${percent}", work_dirs="${work_dir}", total_step="${runner.max_iters}", ), ), by_epoch=False, ) ], ) Where are you talking about train_cfg 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 19:54 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Do you want to try it remotely? 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 18:56 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Please indicate which config you are using. For all configs in this repo, we have model.train_cfg. For example

train_cfg=dict(
assigner=dict(type='DynamicSoftLabelAssigner', topk=13, iou_factor=2.0),
alpha=1,
beta=6,
allowed_border=-1,
pos_weight=-1,
debug=False),
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> Sorry, but what do you mean by "try it remotely"? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> In the first lime, we defaultly import hyper-parameters from another base config base = './consistent_teacher_r50_fpn_coco_180k_10p.py' the train_cfg is in the base config. Sorry, but I cannot not assist you through remote control. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

I would suggest to down-grade the mmdetection version of base config as well.

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 17, 2024 via email

@Adamdad
Copy link
Owner

Adamdad commented Jan 17, 2024

The lower version of mmdecion doesn't. Is it possible to do without mmdetection in this directory? 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 20:18 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Here this mmdetion is the latest version, but the mmdet in the environment ==2.28.1 is not caused by this reason. 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 20:09 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) base = './consistent_teacher_r50_fpn_coco_180k_10p.py' fold = 1 percent = 10 data = dict( train=dict( sup=dict( ann_file="data/coco_semi/semi_supervised/instances_train2017.${fold}@${percent}.json", ), unsup=dict( ann_file="data/coco_semi/semi_supervised/instances_train2017.${fold}@${percent}-unlabeled.json", ), ), ) log_config = dict( delete=True, interval=50, hooks=[ dict(type="TextLoggerHook", by_epoch=False), dict( type="WandbLoggerHook", init_kwargs=dict( project="consistent-teacher", name="${cfg_name}", config=dict( fold="${fold}", percent="${percent}", work_dirs="${work_dir}", total_step="${runner.max_iters}", ), ), by_epoch=False, ) ], ) Where are you talking about train_cfg 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 19:54 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Do you want to try it remotely? 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 18:56 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Please indicate which config you are using. For all configs in this repo, we have model.train_cfg. For example

train_cfg=dict(
assigner=dict(type='DynamicSoftLabelAssigner', topk=13, iou_factor=2.0),
alpha=1,
beta=6,
allowed_border=-1,
pos_weight=-1,
debug=False),
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> Sorry, but what do you mean by "try it remotely"? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> In the first lime, we defaultly import hyper-parameters from another base config base = './consistent_teacher_r50_fpn_coco_180k_10p.py' the train_cfg is in the base config. Sorry, but I cannot not assist you through remote control. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> I would suggest to down-grade the mmdetection version of base config as well. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

You can also copy the older version config of mmdet into local folder, and change the base config path in your current config.

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 17, 2024 via email

@Adamdad
Copy link
Owner

Adamdad commented Jan 17, 2024

    wandb_login._login(anonymous=anonymous, force=force, _disable_warning=True)   File "D:\Anaconda3\envs\teacher\lib\site-packages\wandb\sdk\wandb_login.py", line 238, in _login     wlogin.prompt_api_key()   File "D:\Anaconda3\envs\teacher\lib\site-packages\wandb\sdk\wandb_login.py", line 174, in prompt_api_key     raise UsageError("api_key not configured (no-tty). call " + directive) wandb.errors.UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 20:24 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) The lower version of mmdecion doesn't. Is it possible to do without mmdetection in this directory? 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 20:18 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Here this mmdetion is the latest version, but the mmdet in the environment ==2.28.1 is not caused by this reason. 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 20:09 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) base = './consistent_teacher_r50_fpn_coco_180k_10p.py' fold = 1 percent = 10 data = dict( train=dict( sup=dict( ann_file="data/coco_semi/semi_supervised/instances_train2017.${fold}@${percent}.json", ), unsup=dict( ann_file="data/coco_semi/semi_supervised/instances_train2017.${fold}@${percent}-unlabeled.json", ), ), ) log_config = dict( delete=True, interval=50, hooks=[ dict(type="TextLoggerHook", by_epoch=False), dict( type="WandbLoggerHook", init_kwargs=dict( project="consistent-teacher", name="${cfg_name}", config=dict( fold="${fold}", percent="${percent}", work_dirs="${work_dir}", total_step="${runner.max_iters}", ), ), by_epoch=False, ) ], ) Where are you talking about train_cfg 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 19:54 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Do you want to try it remotely? 原始邮件 发件人:"Xingyi Yang"< @.*** >; 发件时间:2024/1/17 18:56 收件人:"Adamdad/ConsistentTeacher"< @.*** >; 抄送人:"Code-of-Liujie"< @.*** >;"Comment"< @.*** >; 主题:Re: [Adamdad/ConsistentTeacher] Multi card parallel (Issue #51) Please indicate which config you are using. For all configs in this repo, we have model.train_cfg. For example

train_cfg=dict(
assigner=dict(type='DynamicSoftLabelAssigner', topk=13, iou_factor=2.0),
alpha=1,
beta=6,
allowed_border=-1,
pos_weight=-1,
debug=False),
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> Sorry, but what do you mean by "try it remotely"? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> In the first lime, we defaultly import hyper-parameters from another base config base = './consistent_teacher_r50_fpn_coco_180k_10p.py' the train_cfg is in the base config. Sorry, but I cannot not assist you through remote control. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> I would suggest to down-grade the mmdetection version of base config as well. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> You can also copy the older version config of mmdet into local folder, and change the base config path in your current config. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

please register on wandb and put your username and api-key in the code.

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 17, 2024 via email

@Code-of-Liujie
Copy link

Code-of-Liujie commented Jan 18, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants