Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comet ML doesn't log on segment/train.py #12693

Closed
1 of 2 tasks
james-imi opened this issue Jan 31, 2024 · 4 comments
Closed
1 of 2 tasks

Comet ML doesn't log on segment/train.py #12693

james-imi opened this issue Jan 31, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@james-imi
Copy link

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

Installed comet ml and did export variables and .comet.config. Still does not create a project nor open a Comet ML project

python segment/train.py --img 640 --epochs 40 --data settings/opg-pathology-data.yaml --weights yolov5m-seg.pt --seed 1111 --optimizer SGD --device 0 --hyp settings/opg-pathology-exp1.yaml --project opg-pathology-v3 --name opg-pathology-exp1 --batch-size 8 --cache
segment/train: weights=yolov5m-seg.pt, cfg=, data=settings/opg-pathology-data.yaml, hyp=settings/opg-pathology-exp1.yaml, epochs=40, batch_size=8, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=0, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=opg-pathology-v3, name=opg-pathology-exp1, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=1111, local_rank=-1, mask_ratio=4, no_overlap=False
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-282-g9cdbd1de Python-3.10.13 torch-1.12.1+cu113 CUDA:0 (NVIDIA GeForce RTX 3070 Laptop GPU, 7957MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.4, hsv_v=0.3, degrees=0.0, translate=0.1, scale=0.2, shear=0.0, perspective=0.0, flipud=0.5, fliplr=0.5, mosaic=0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir opg-pathology-v3', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=2

                 from  n    params  module                                  arguments                     
  0                -1  1      5280  models.common.Conv                      [3, 48, 6, 2, 2]              
  1                -1  1     41664  models.common.Conv                      [48, 96, 3, 2]                
  2                -1  2     65280  models.common.C3                        [96, 96, 2]                   
  3                -1  1    166272  models.common.Conv                      [96, 192, 3, 2]               
  4                -1  4    444672  models.common.C3                        [192, 192, 4]                 
  5                -1  1    664320  models.common.Conv                      [192, 384, 3, 2]              
  6                -1  6   2512896  models.common.C3                        [384, 384, 6]                 
  7                -1  1   2655744  models.common.Conv                      [384, 768, 3, 2]              
  8                -1  2   4134912  models.common.C3                        [768, 768, 2]                 
  9                -1  1   1476864  models.common.SPPF                      [768, 768, 5]                 
 10                -1  1    295680  models.common.Conv                      [768, 384, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  2   1182720  models.common.C3                        [768, 384, 2, False]          
 14                -1  1     74112  models.common.Conv                      [384, 192, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  2    296448  models.common.C3                        [384, 192, 2, False]          
 18                -1  1    332160  models.common.Conv                      [192, 192, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  2   1035264  models.common.C3                        [384, 384, 2, False]          
 21                -1  1   1327872  models.common.Conv                      [384, 384, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  2   4134912  models.common.C3                        [768, 768, 2, False]          
 24      [17, 20, 23]  1    828127  models.yolo.Segment                     [2, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], 32, 192, [192, 384, 768]]
Model summary: 302 layers, 21675199 parameters, 21675199 gradients, 70.3 GFLOPs

Transferred 493/499 items from yolov5m-seg.pt
AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 82 weight(decay=0.0), 85 weight(decay=0.0005), 85 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning /home/james/workspace/Adra/train_df/yolov5/new_yolov5/yolov5/PATHOLOGY_FEATURES_v2/labels/train.cache... 882 images, 300 backgrounds, 0 corrupt: 100%|████
train: Caching images (0.8GB ram): 100%|██████████| 882/882 [00:00<00:00, 3519.14it/s]
val: Scanning /home/james/workspace/Adra/train_df/yolov5/new_yolov5/yolov5/PATHOLOGY_FEATURES_v2/labels/val.cache... 224 images, 70 backgrounds, 0 corrupt: 100%|█████████
val: Caching images (0.2GB ram): 100%|██████████| 224/224 [00:00<00:00, 1096.30it/s]

AutoAnchor: 5.63 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅
Plotting labels to opg-pathology-v3/opg-pathology-exp1/labels.jpg...

Environment

Latest YOLO

Minimal Reproducible Example

python segment/train.py --img 640 --epochs 40 --data settings/opg-pathology-data.yaml --weights yolov5m-seg.pt --seed 1111 --optimizer SGD --device 0 --hyp settings/opg-pathology-exp1.yaml --project opg-pathology-v3 --name opg-pathology-exp1 --batch-size 8 --cache

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@james-imi james-imi added the bug Something isn't working label Jan 31, 2024
@glenn-jocher
Copy link
Member

@james-imi hello! Thanks for reaching out and providing detailed information about the issue you're facing with Comet ML integration.

It seems like you've already checked the basics, such as exporting environment variables and setting up the .comet.config file. Here are a few steps you can take to troubleshoot the issue further:

  1. Check Comet ML Installation: Ensure that Comet ML is installed correctly in your environment. You can do this by running pip show comet-ml to see if the package is present and which version is installed.

  2. Environment Variables: Double-check that your Comet ML API key and other necessary environment variables are correctly set in your environment. Sometimes, a simple typo can cause issues.

  3. Comet ML Configuration: Verify that your .comet.config file is correctly formatted and located in your home directory or the root of your project.

  4. Logging Level: Increase the logging level of Comet ML to see more detailed output, which might give you clues about what's going wrong.

  5. Check for Errors: Look for any error messages in the console output related to Comet ML. They can provide insights into what might be causing the logging issue.

  6. Comet ML Version: Ensure that you are using a version of Comet ML that is compatible with the current version of YOLOv5.

If after following these steps you're still facing issues, please provide any error messages or additional information that could help diagnose the problem. If necessary, you can also refer to our documentation for more details on integrating Comet ML with YOLOv5.

Remember, the YOLO community and the Ultralytics team are here to help! 🚀

@james-imi
Copy link
Author

For #1

Name: comet-ml
Version: 3.37.0
Summary: Supercharging Machine Learning
Home-page: https://www.comet.com
Author: Comet ML Inc.
Author-email: [email protected]
License: Proprietary
Location: /home/james/anaconda3/envs/yolov5/lib/python3.10/site-packages
Requires: dulwich, everett, jsonschema, psutil, python-box, requests, requests-toolbelt, rich, semantic-version, sentry-sdk, simplejson, six, urllib3, websocket-client, wrapt, wurlitzer
Required-by:

For environment variables, it is set.

export COMET_API_KEY=0av....

For #3, it is proper .comet.config

[comet]
api_key=0av...
project_name=name

There are no logging errors with Comet

Copy link
Contributor

github-actions bot commented Mar 2, 2024

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Mar 2, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 12, 2024
@glenn-jocher
Copy link
Member

Hey @james-imi! Thanks for confirming the details. It looks like your Comet ML setup is correct, and the environment is properly configured. Since there are no logging errors and everything seems in place, let's try a couple of quick checks:

  • Restart: Sometimes, a simple restart of your session or terminal can help, especially after setting environment variables or making configuration changes.
  • Script Check: Ensure that the Comet integration code or hooks are correctly implemented in your segment/train.py script. Comet should initiate when the training starts.

If the issue persists, it might be helpful to manually insert a few logging statements in your training script to confirm that the Comet API is being hit. You can also try a minimal example with Comet to isolate whether the issue is with the integration or something specific to your current project setup.

Keep me posted on how it goes! 🌟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@glenn-jocher @james-imi and others