-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbnet在Ascend910B3上训练速度非常慢 #683
Comments
@zx214 您好,感谢您的反馈。
以下是我们在相同的硬件设备上,使用MindSpore r2.2.11训练DBNet ResNet-50的日志,供您参考:
|
感谢,我来试一下 |
哈喽,安装官网安装了mindspore2.2.12和CANN7.0.0.beta1,测试源代码模型编译时间没问题,根据任务需求添加了summary的记录和修改数据读取部分,模型部分没有动;但是模型编译时间还是很长,请问模型编译速度慢可能因为什么影响呢 |
@zx214 您好。根据您的描述,似乎是从启动Python脚本,到开始训练计算的耗时较长,而不一定是MindSpore的静态图模型编译耗时较长。 建议您考虑使用以下方案,定界出真正耗时较长的代码:
根据您的描述,耗时较长的原因可能有:
|
@zx214 您好,根据您反馈的错误日志。因为MindSpore源代码中未包含相关错误信息,该问题可能不是MindSpore引入的。建议您检查一下Ascend的环境配置,包括硬件是否安装正常,CANN包是否安装正确(例如安装完CANN后是否安装了te和hccl的whl包)。 |
1、环境说明:Ascend910B3、mindspore2.2.0、CANN 7.0.rc1,训练集1012,测试集数量253,batch为8
2、训练log如下链接:
dbnet_log.log
3、yaml配置文件为:
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: False
amp_level: 'O0'
seed: 42
log_interval: 100
val_while_train: True
drop_overflow_update: False
ckpt_max_keep: 0
common:
num_epochs: &num_epochs 1
batch_size: &batch_size 1
num_workers: &num_workers 8
ckpt_save_dir: &ckpt_save_dir './tmp_det'
ckpt_load_path: &ckpt_load_path ./tmp_det/best.ckpt
dataset_root: &dataset_root ./dataset_ic15/det
resume: &resume True
lr: &lr 0.007
resume_epochs: 0
data_shape: (736,1280)
model:
type: det
transform: null
backbone:
name: det_resnet50
pretrained: True
neck:
name: DBFPN
out_channels: 256
bias: False
head:
name: DBHead
k: 50
bias: False
adaptive: True
resume: True
pretrained: False
postprocess:
name: DBPostprocess
box_type: quad # whether to output a polygon or a box
binary_thresh: 0.3 # binarization threshold
box_thresh: 0.6 # box score threshold
max_candidates: 1000
expand_ratio: 1.5 # coefficient for expanding predictions
metric:
name: DetMetric
main_indicator: f-score
loss:
name: DBLoss
eps: 1.0e-6
l1_scale: 10
bce_scale: 5
bce_replace: bceloss
scheduler:
scheduler: polynomial_decay
lr: *lr
num_epochs: *num_epochs
decay_rate: 0.9
warmup_epochs: 3
optimizer:
opt: SGD
filter_bias_and_bn: false
momentum: 0.9
weight_decay: 1.0e-4
only used for mixed precision training
loss_scaler:
type: dynamic
loss_scale: 512
scale_factor: 2
scale_window: 1000
train:
ckpt_save_dir: *ckpt_save_dir
dataset_sink_mode: False
gradient_accumulation_steps: 1
dataset:
type: DetDataset
dataset_root: *dataset_root
data_dir: train/images
label_file: train_gt.txt
sample_ratio: 1.0
transform_pipeline:
- DecodeImage:
img_mode: RGB
to_float32: False
- DetLabelEncode:
- RandomColorAdjust:
brightness: 0.1255 # 32.0 / 255
saturation: 0.5
- RandomHorizontalFlip:
p: 0.5
- RandomRotate:
degrees: [ -10, 10 ]
expand_canvas: False
p: 1.0
- RandomScale:
scale_range: [ 0.5, 3.0 ]
p: 1.0
- RandomCropWithBBox:
max_tries: 10
min_crop_ratio: 0.1
crop_size: [ 640, 640 ]
p: 1.0
- ValidatePolygons:
- ShrinkBinaryMap:
min_text_size: 8
shrink_ratio: 0.4
- BorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- NormalizeImage:
bgr_to_rgb: False
is_hwc: True
mean: imagenet
std: imagenet
- ToCHWImage:
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visualize
output_columns: [ 'image', 'binary_map', 'mask', 'thresh_map', 'thresh_mask' ] #'img_path']
output_columns: ['image'] # for debug op performance
loader:
shuffle: True
batch_size: *batch_size
drop_remainder: True
num_workers: *num_workers
eval:
ckpt_load_path: tmp_det/best.ckpt
dataset_sink_mode: False
dataset:
type: DetDataset
dataset_root: *dataset_root
data_dir: val/images
label_file: test_gt.txt
sample_ratio: 1.0
transform_pipeline:
- DecodeImage:
img_mode: RGB
to_float32: False
- DetLabelEncode:
- DetResize: # GridResize 32
target_size: [ 736, 1280 ]
keep_ratio: False
limit_type: none
divisor: 32
- NormalizeImage:
bgr_to_rgb: False
is_hwc: True
mean: imagenet
std: imagenet
- ToCHWImage:
# the order of the dataloader list, matching the network input and the labels for evaluation
output_columns: [ 'image', 'polys', 'ignore_tags', 'shape_list' ]
net_input_column_index: [0] # input indices for network forward func in output_columns
label_column_index: [1, 2] # input indices marked as label
loader:
shuffle: False
batch_size: 1 # TODO: due to dynamic shape of polygons (num of boxes varies), BS has to be 1
drop_remainder: False
num_workers: 2
4、速度慢目前定位为两部分: ①kernel_meta信息编译大概需要一个多小时;②训练速度慢;请问如何解决呢
The text was updated successfully, but these errors were encountered: