Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault #1620

Closed
IceAndSea opened this issue Dec 7, 2020 · 55 comments
Closed

segmentation fault #1620

IceAndSea opened this issue Dec 7, 2020 · 55 comments
Labels
bug Something isn't working Stale

Comments

@IceAndSea
Copy link

when I run detect.py, the code display 'segmentation fault' , my operating system is ubuntu 20,pytorch is 1.7, python 3.8.
Can you tell me how to solve this problem?

@IceAndSea IceAndSea added the bug Something isn't working label Dec 7, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2020

Hello @IceAndSea, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at [email protected].

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@IceAndSea
Copy link
Author

IceAndSea commented Dec 7, 2020 via email

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@glenn-jocher
Copy link
Member

Hello @kim0418,

Thank you for reporting this issue. The "segmentation fault" error usually indicates a memory access violation. This can be caused by various factors such as incompatible versions of dependencies or other system-related issues.

To help us diagnose the problem, could you please provide more information about your environment? Specifically, the versions of the following:

  • Python
  • PyTorch
  • CUDA
  • CUDNN

Additionally, it would be helpful if you could provide the complete stack trace and any relevant error messages that you receive. This will allow us to narrow down the cause of the issue and provide you with a more specific solution.

Thank you for your cooperation. We'll do our best to assist you further.

Best regards,
Glenn Jocher

@kim0418
Copy link

kim0418 commented Oct 16, 2023

Thank you for your answer. @glenn-jocher

python=3.11.2, torch=2.1.0.
I don't think CUDA and CUDDN are installed separately.
The commands I have installed
[git clone https://github.com/ultralytics/yolov5,
cd yolov5, pip install -r requires.txt] is everything.
And [python3 detect.py --source 0 --weight yolov5s.pt --conf 0.25] is everything

@glenn-jocher
Copy link
Member

Hi @kim0418,

Thank you for providing the information. It seems like the issue might be related to the versions of Python and PyTorch you are using. Currently, YOLOv5 is only officially tested and supported with Python 3.8 or later, and PyTorch 1.7 or later.

Additionally, CUDA and CUDNN are optional dependencies for YOLOv5 but are highly recommended for optimal performance, especially if you are using a GPU. You can check if CUDA and CUDNN are installed separately on your system by running the appropriate commands or checking the system documentation.

To resolve the issue, I would suggest the following steps:

  1. Upgrade your Python version to 3.8 or later.
  2. Upgrade your PyTorch version to 1.7 or later.
  3. Install CUDA and CUDNN if you have a compatible GPU and want to take advantage of GPU acceleration.

After completing these steps, please try running the detect.py command again. If the issue persists, please provide the complete stack trace and any additional error messages you receive, and we will further assist you.

Thank you for your understanding and cooperation. Let us know if you need further assistance.

Glenn Jocher

@kim0418
Copy link

kim0418 commented Oct 17, 2023

Thank you so much for your reply @glenn-jocher .

I will attach the version of my raspberry pi4, the Python version, the torch version, and the code where the current seg fault occurs.
Raspberry Pie 4, so there's no GPU that can be handled.
my python version is 3.11.2
torch version is 2.1.0
NO CUDA, NO CUDNN.

Can CUDA, CUDN be installed and used even without GPU?

Thank you so much for your answer.

Kim0418
rasp

@glenn-jocher
Copy link
Member

Thank you for providing the additional information, @kim0418.

Based on the details you've shared, it seems that you are using a Raspberry Pi 4, which does not have a compatible GPU for CUDA or CUDNN. CUDA and CUDNN are typically used for GPU acceleration in deep learning tasks.

In your case, since you don't have a GPU available, you won't be able to install or use CUDA or CUDNN. YOLOv5 can still run on the CPU without GPU acceleration, but it may be slower compared to running on a GPU.

Regarding the segmentation fault issue you encountered, it is likely caused by the incompatible versions of Python and PyTorch. As mentioned earlier, YOLOv5 is officially tested and supported with Python 3.8 or later, and PyTorch 1.7 or later. Since you are using Python 3.11.2 and PyTorch 2.1.0, this could be a potential source of the issue.

To resolve the segmentation fault issue, I recommend the following steps:

  1. Upgrade your Python version to 3.8 or later.
  2. Upgrade your PyTorch version to 1.7 or later.

After completing these steps, please try running the detect.py command again and see if the issue persists.

If you encounter any further issues or have any additional questions, please feel free to ask. We're here to assist you.

Kim0418

@kim0418
Copy link

kim0418 commented Oct 17, 2023

Hello @glenn-jocher

I tried breakpoint on detect.py .
As a result, we found a problem on experimental.py .

Line 88
model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval()) is segmentation fault has occurred.

Do you happen to know the cause?

@glenn-jocher
Copy link
Member

Hello @kim0418,

Thank you for providing the additional information. The segmentation fault occurring at line 88 of experimental.py suggests a problem with the fuse() function call. Specifically, it seems that the ckpt object may not have the fuse attribute, causing an AttributeError.

To investigate further, we need to understand the execution flow and the dependencies within the experimental.py module. Could you please share the relevant code snippets leading up to line 88? This will help us identify the exact cause of the issue and provide you with a more accurate solution.

Thank you for your patience and cooperation. We'll do our best to assist you further.

Glenn Jocher

@Nicoll2020
Copy link

Hi @kim0418,
I have the same error, did you manage to fix this error?

@kim0418
Copy link

kim0418 commented Oct 19, 2023

sorry @Nicoll2020
Unfortunately, I haven't solved it yet. I'm sorry I didn't help you.

@glenn-jocher
Copy link
Member

Hi @kim0418,

I apologize for the inconvenience. Unfortunately, I haven't been able to resolve the issue yet. I apologize for not being able to help you at this time.

Thank you for understanding.

Glenn Jocher

@kim0418
Copy link

kim0418 commented Oct 19, 2023

Thank you for your consideration to help.

@glenn-jocher
Copy link
Member

@Nicoll2020 thanks for reaching out to us! We appreciate your interest in YOLOv5 and we're here to assist you.

Regarding your query, could you please provide more details about the specific issue or error message you're encountering? This will help us better understand the problem and provide you with appropriate guidance.

Please include any relevant code snippets, error messages, or steps to reproduce the issue. Additionally, let us know the versions of Python, PyTorch, and any other relevant software you're using.

We'll do our best to assist you and troubleshoot the problem.

Thank you for your cooperation.

Glenn Jocher, Ultralytics YOLOv5

@SohaibKtb
Copy link

I have the same issue.
I am trying to solve it from 1 week.
still persisting 🙁
please can you help solving it.

@glenn-jocher
Copy link
Member

@SohaibKtb i understand that you're facing the same issue and have been trying to resolve it for a week without success. I'm here to help you troubleshoot and find a solution.

To assist you better, could you please provide more details about the specific issue you're facing? It would be helpful if you could provide any error messages or relevant code snippets. Additionally, please let me know the versions of Python, PyTorch, and any other relevant software you're using.

With this additional information, I'll be able to provide you with more targeted guidance to solve the problem.

Thank you for your patience, and I'm looking forward to helping you further.

Glenn Jocher, Ultralytics YOLOv5

@SohaibKtb
Copy link

SohaibKtb commented Oct 20, 2023

Hello @glenn-jocher, thanks for your fast response.

so I am using only these packages:

python=3.11.2
torch=2.1.0
torchaudio=2.1.0
torchvision=0.16.0
ultralytics=8.0.200

with yolov5 and with my own '.pt' file.

on Raspberry pi 4:

Linux raspberrypi 6.1.0-rpi4-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.54-1+rpt2 (2023-10-05) aarch64 GNU/Linux

the error that I am receiving is only "segmentaion fault" after "fusing layers...." message

@glenn-jocher
Copy link
Member

@SohaibKtb hello,

Thank you for providing the details of your setup and the error you are encountering.

The "segmentation fault" error after the "fusing layers...." message indicates that there is a problem during the fusion process. One possible cause could be an incompatibility between the versions of PyTorch and TorchVision you are using.

First, please ensure that the versions of PyTorch and TorchVision are compatible with each other. It is recommended to use the latest stable versions for both packages. In your case, you mentioned that you are using PyTorch 2.1.0 and TorchVision 0.16.0. However, please note that PyTorch 2.1.0 is not a valid version and it should be something like 1.1.0 or 1.2.0.

Next, please check if your '.pt' file is compatible with the YOLOv5 version you are using. Ensure that the model architecture and weights are compatible. If you trained the model using a different version of YOLOv5, it may cause compatibility issues.

Additionally, you mentioned that you are using a Raspberry Pi 4. Please make sure you have sufficient resources (RAM, GPU) available on your Raspberry Pi, as running YOLOv5 on resource-constrained devices can sometimes cause issues.

If you have already checked these points and the issue still persists, please provide the complete error message and any relevant code snippets so that we can further investigate the problem.

Best regards,
Glenn Jocher

@Nicoll2020
Copy link

Hi @SohaibKrb,

You are right, the solution is in the versions of the Pytorch, torchvsion and tensorflow.

thank you so much.

@glenn-jocher
Copy link
Member

Hi @Nicoll2020,

I'm glad to hear that you have made progress in resolving the issue. Indeed, ensuring the compatibility of the PyTorch, TorchVision, and TensorFlow versions is crucial for a smooth execution of YOLOv5.

If you have any further questions or need assistance with anything else, feel free to reach out. We're here to help!

Best regards,
Glenn Jocher

@kim0418
Copy link

kim0418 commented Oct 23, 2023

Hi. @glenn-jocher
Can it be solved by lowering the torch and python versions? or Can it be solved by lowering the torch and python versions?

@glenn-jocher
Copy link
Member

@kim0418 hello,

Thank you for reaching out.

Lowering the versions of PyTorch and Python might help resolve the issue. In some cases, YOLOv5 may have compatibility issues with certain versions of PyTorch and Python. By downgrading to older versions that are known to work well with YOLOv5, you may be able to overcome the problem.

I would recommend trying a different combination of PyTorch and Python versions and see if the issue persists. You can also refer to the YOLOv5 documentation or community forum for any specific version requirements or recommendations.

Please let me know if you have any further questions or need more assistance with this.

Thank you.

Glenn Jocher, Ultralytics YOLOv5

@kurniadi92
Copy link

Hi @SohaibKrb,

You are right, the solution is in the versions of the Pytorch, torchvsion and tensorflow.

thank you so much.

Hi, @Nicoll2020 may i know which version Pytorch, torchvsion you used to make it work ? I have same problem with raspberry pi 4 too

@OosululyoO
Copy link

Hi~I am using Raspberry pi 4 too.
@glenn-jocher I follow your suggestion ,I chage the PyTorch & TorchVision .
After that , there is an error "Illegal Instruction" without another error code.
2023-10-23-213351_1920x1080_scrot

@SohaibKtb
Copy link

SohaibKtb commented Oct 23, 2023

Hello, it was torch and torchvision version issue

I downgraded python to version 3.9.2, since version 3.11.2 can't work with pytorch less than version 2.1.0

I also downgraded raspberry pi OS to "2022-09-22-raspios-bullseye-arm64", since the latest one by defaut python version on the latest OS is 3.11.2

so the versions are:

python=3.9.2
torch=1.8.1
torchvision=0.9.1
torchaudio=0.10.0
ultralytics=8.0.200

this is the setup that worked for me

thanks for your help 🙏

@OosululyoO
Copy link

@SohaibKtb

Good news!

I use your setting,i can run yolo on raspberry pi 4 .BUT!
There is some warning.
2023-10-23-225257_1920x1080_scrot

@glenn-jocher
Copy link
Member

@SohaibKtb hello,

That's great to hear that you were able to run YOLOv5 on your Raspberry Pi 4. Congratulations on the progress!

Regarding the warning you mentioned in the screenshot, it appears to be related to the deprecation of certain functions in PyTorch. These warnings are informational and should not affect the functionality of YOLOv5.

As long as you are able to run YOLOv5 successfully and obtain the desired results, these warnings can be safely ignored.

If you have any further questions or need assistance with anything else, please feel free to ask.

Happy YOLOv5-ing!

Glenn Jocher, Ultralytics YOLOv5

@kim0418
Copy link

kim0418 commented Oct 23, 2023

hello. @glenn-jocher
Eventually, it was impossible to solve, so we implemented it as a docker-container.

python3 detect.py --source data/images --weights yolov5s is executed.

but python3 detect.py --source 0 --weights yolov5s.The pt does not work. I will attach the error that occurs when executing the code below.

I confirmed that the camera is operating, and there is no abnormality.

Code when implementing docker-container : sudo docker container run -it -d --name <container_name> --privileged --device /dev/vchiq --device /dev/video0:/dev/video0 ultralytics/yolov5:latest-arm64
20231024_03h10m50s_grim

@glenn-jocher
Copy link
Member

@kim0418 hi,

It's unfortunate that you were unable to resolve the issue and had to resort to implementing it as a Docker container. Thank you for sharing the error message you encountered.

In the error message, it seems that the code fails to execute when using the webcam as a source (--source 0) with the yolov5s weights. It's good to know that you have confirmed that the camera is operating fine.

To further investigate the issue, it would be helpful to obtain additional information, such as the versions of PyTorch, TorchVision, and Python that you are currently using. Additionally, providing the complete error stack trace could assist in identifying the root cause of the problem.

Looking forward to hearing back from you with more details.

Glenn Jocher, Ultralytics YOLOv5

@kim0418
Copy link

kim0418 commented Oct 28, 2023

@glenn-jocher I'm sorry to contact you all of a sudden.

Can I access dockerfile and modify the code?
When installing libcamera from the container, an error says g++ is missing when you build meson setup.

Can I solve this without modifying the dockerfile?

@kim0418
Copy link

kim0418 commented Oct 29, 2023

에러코드

@glenn-jocher
Copy link
Member

@kim0418 this error usually occurs when the g++ package is missing or not installed in your Docker container. To resolve this issue, you can try updating the package list and installing the g++ package in your Dockerfile before building Meson setup.

Here's an example of how you can add these steps to your Dockerfile:

# ... previous steps ...

# Update package list and install g++
RUN apt-get update && apt-get install -y g++

# Build Meson setup
RUN meson setup build

# ... continue with the rest of your Dockerfile ...

Make sure to rebuild your Docker image after making these changes.

By including the RUN apt-get update && apt-get install -y g++ step in your Dockerfile, you should be able to resolve the missing g++ error.

Let me know if you have any further questions or need additional assistance.

Glenn Jocher, Ultralytics YOLOv5

@kim0418
Copy link

kim0418 commented Nov 1, 2023

@glenn-jocher Thank you very much for your help.

I tried using journalctl to find the error.
As a result, we found a record of unicam fe801000.csi: Wrong width or height 640x480 (remote pad set to 1296x972).
Are there any files that need to be modified for this?

@glenn-jocher
Copy link
Member

@kim0418 hi there,

Thank you for sharing the information about the error you encountered and for investigating further using journalctl.

Regarding the unicam error message you found (unicam fe801000.csi: Wrong width or height 640x480), it seems to indicate a mismatch between the expected width/height and the actual values.

To address this issue, you may need to modify the camera configuration settings or parameters to ensure that the width and height values are set correctly. You can refer to the documentation or user manual of the camera module or seek assistance from the camera module manufacturer for guidance on adjusting the settings.

Please note that the specific files or settings to be modified can vary depending on your camera hardware and the software you are using. It's recommended to consult the camera documentation or seek support from the camera module provider for appropriate guidance.

If you have any further questions or need additional assistance, please let us know.

Glenn Jocher, Ultralytics YOLOv5

@kim0418
Copy link

kim0418 commented Nov 7, 2023

hello, @glenn-jocher

my jornalctl log is happen 'unicam fe8010000.csi : Failed to start media pipeline: -22'.
Are there any commands that need to be entered when creating docker containers?

Do you know anything about this?

@glenn-jocher
Copy link
Member

@kim0418 hi there,

To address the issue you're experiencing with the 'unicam fe8010000.csi : Failed to start media pipeline: -22' error message in the journalctl logs, there are a few things you can try when creating your Docker container.

Firstly, it's recommended to run the Docker container with the --privileged flag, which provides the container with access to system devices and capabilities. You can also try adding the --device /dev/vchiq and --device /dev/video0:/dev/video0 flags to explicitly grant access to the camera device.

For example:

sudo docker container run -it -d --name <container_name> --privileged --device /dev/vchiq --device /dev/video0:/dev/video0 <image_name>:<tag>

Make sure to replace <container_name> with your desired name and <image_name>:<tag> with the appropriate image and version you are using.

These steps should help provide the necessary permissions for the Docker container to access the camera device.

If the issue persists, please provide additional details or error messages, if any, so that we can further investigate and assist you with resolving the problem.

Best,
Glenn Jocher, Ultralytics YOLOv5

@kim0418
Copy link

kim0418 commented Nov 7, 2023

hello @glenn-jocher

Do you happen to have cap in dataloaders.py file.Can I write the location or information of the camera on behalf of cv2.VideoCapture(0)?
I will attach the information that appears when the camera is running.
libcamera

@glenn-jocher
Copy link
Member

@kim0418 it looks like you've encountered an error related to the libcamera library used for accessing the camera in the dataloaders.py file. The error message indicates that the library is unable to access the camera. This could be due to an issue with the specific camera model or its driver support.

In the dataloaders.py file, you can attempt to specify the camera location or information by modifying the cv2.VideoCapture() call. Instead of using cv2.VideoCapture(0), you can provide the specific device index or the path to the camera.

For example:

# Instead of
cap = cv2.VideoCapture(0)

# You can try
cap = cv2.VideoCapture('/dev/video0')  # Replace '/dev/video0' with the actual camera device path

By specifying the device path directly, you can potentially address the issue related to libcamera.

Please ensure that the path you specify matches the actual camera device on your system.

If you continue to encounter issues, it may be beneficial to refer to the libcamera documentation or seek support from the libcamera community for guidance on resolving the problem.

Let me know if you require any further assistance.

@Nicoll2020
Copy link

Hi @SohaibKrb,
You are right, the solution is in the versions of the Pytorch, torchvsion and tensorflow.
thank you so much.

Hi, @Nicoll2020 may i know which version Pytorch, torchvsion you used to make it work ? I have same problem with raspberry pi 4 too

Hi @kurniadi92
The following libraries worked for me
torch==2.0.1
torchvision==0.15.2
ultralytics==8.0.186
tensorflow==2.13.0
opencv-python==4.8.0.76

@glenn-jocher
Copy link
Member

@Nicoll2020 Thank you for sharing the versions of PyTorch, TorchVision, and TensorFlow that worked for you on Raspberry Pi 4. This information will be helpful for others facing similar issues.

If you have any further questions or need assistance with anything else related to YOLOv5, feel free to reach out.

@kim0418
Copy link

kim0418 commented Nov 12, 2023

@glenn-jocher With your help, the camera works normally. Thank you.

The camera was successfully operated. So it doesn't seem to be interrupted because there is no waitkey during the operation. Where should I enter the interruption key?

@glenn-jocher
Copy link
Member

@kim0418 I'm glad to hear that the camera is now functioning normally.

To add an interruption key for the camera operation, you can use the following code snippet as an example to incorporate the use of the 'waitKey' function from OpenCV, which allows you to capture a key press event. You can define the key to be used for interruption, such as ESC key (27 key code) in this example:

import cv2

# Open the camera
cap = cv2.VideoCapture(0)

while True:
    # Capture the frame
    ret, frame = cap.read()
    
    # Display the frame
    cv2.imshow('Camera Feed', frame)

    # Check for interruption key (ESC key)
    if cv2.waitKey(1) & 0xFF == 27:
        break

# Release the camera and close the window
cap.release()
cv2.destroyAllWindows()

In this example, the cv2.waitKey(1) & 0xFF line captures the key press event, and if the pressed key's ASCII value matches 27 (which is the code for ESC key), the loop breaks and the camera and window are released.

You can modify this example as needed to fit into your specific code structure.

If you have any further questions or need additional assistance, feel free to ask.

@kim0418
Copy link

kim0418 commented Nov 12, 2023

I think I need to write waitkey on the dataloaders.py file, which line should I write?

@glenn-jocher
Copy link
Member

@kim0418 in the dataloaders.py file, you can incorporate the use of the 'waitKey' function within the 'load_image' function to facilitate the ability to capture key press events and introduce an interruption mechanism. Specifically, you can add the 'waitKey' command right after displaying the image using OpenCV's 'imshow' function. Here's a general example:

import cv2

def load_image(self, index):
    ...
    # Your existing code to read and process the image

    # Display the image
    cv2.imshow('Image', image)

    # Check for interruption key (ESC key) and wait for a key press for a specified duration (e.g., 1 millisecond)
    if cv2.waitKey(1) & 0xFF == 27:
        break

    # Continue with other operations or processing as needed
    ...

By adding the 'waitKey' command and checking for a specific key press (e.g., the ESC key) within the 'load_image' function, you can enable the interruption capability when images are loaded and displayed in your YOLOv5 pipeline.

Please ensure to adapt this example to fit into your dataloaders.py file and the specific logic and requirements of your YOLOv5 implementation.

If you have any further questions or need assistance with the integration, feel free to ask.

@kim0418
Copy link

kim0418 commented Nov 20, 2023

hello @glenn-jocher

Thank you very much for your help. There is a problem in training the model this time, so I would like to ask you a question.

When I use yolov5 train.py , there are so many classes I want to learn that it is difficult to decompress at once. Therefore, I want to use the method of learning one class and then the next class. Is there any way?

@glenn-jocher
Copy link
Member

@kim0418 You can indeed train your YOLOv5 model to learn classes incrementally. This can be achieved by following a few steps:

  1. Prepare Data: Organize your dataset to include images and annotations for all classes. However, for the incremental learning approach, you will initially focus on a subset of classes. Ensure that the annotations accurately reflect the classes being learned at each stage.

  2. Define Data Configuration: In your data configuration file, specify the classes to be learned in each training phase. For example, if you have 10 classes, you might initially only focus on classes 1-5 in the first training phase, and then include all classes in subsequent phases.

  3. Run Training: Start the training process using the defined data configuration. For each training phase, specify the subset of classes that the model should learn. After completing training for the first subset of classes, you can expand the class set and train the model on the next set of classes.

  4. Evaluate and Iterate: Following each training phase, evaluate the model's performance and iterate on the training process as needed.

By employing this incremental learning approach, you can gradually expand your model's knowledge to encompass all desired classes.

If you have any further questions or require detailed guidance on implementing this approach, feel free to ask.

@kim0418
Copy link

kim0418 commented Nov 20, 2023

Can you give me an example?

@glenn-jocher
Copy link
Member

@kim0418 certainly! Here's an example of how you can approach incremental learning with YOLOv5:

  1. Prepare Data Configuration:

    • Create a data configuration file, such as data.yaml, to define the classes to be learned in each training phase.
    • Define the classes for each phase:

    train: ../path_to_train_images
    val: ../path_to_validation_images

    nc: 5 # Total number of classes

    Classes for first phase

    names: ['class1', 'class2', 'class3', 'class4', 'class5']

    
    
  2. First Training Phase:

    • Run the initial training phase with the first subset of classes:

    python train.py --data data.yaml --cfg yolov5s.yaml --epochs 50

    
    
  3. Update Data Configuration:

    • Expand the class set in the data configuration file for the next phase:

    train: ../path_to_train_images
    val: ../path_to_validation_images

    nc: 10 # Total number of classes

    All classes

    names: ['class1', 'class2', 'class3', 'class4', 'class5', 'class6', 'class7', 'class8', 'class9', 'class10']

    
    
  4. Subsequent Training Phases:

    • Conduct additional training phases with the expanded class set:

    python train.py --data data.yaml --cfg yolov5s.yaml --epochs 50

    
    

This approach allows you to gradually introduce and train on additional classes in a step-by-step manner. The data configuration file is updated to reflect the classes being learned at each phase, and training is performed accordingly.

Feel free to customize this example based on your specific class sets and training requirements. If you have further questions or need additional assistance, please let me know.

@kim0418
Copy link

kim0418 commented Nov 20, 2023

There are 40 classes we are trying to learn. And all the images in the class are 100,000. Cracks occur to decompress this on the google drive, making it difficult. That's why I want to know how to learn in division. The gradual method has the difficulty of finally releasing all images of 40 classes.

@glenn-jocher
Copy link
Member

@kim0418 i understand the challenge with handling a large dataset on Google Drive. An alternative approach for handling a large number of classes and images would be to leverage a technique known as "Curriculum Learning."

With Curriculum Learning, you can structure the learning process by initially training on a subset of the classes and gradually introducing additional classes as the model's training progresses. This approach allows you to manage the large dataset more effectively and incrementally introduce complexity to the learning process. Here's a high-level overview of how you can adopt Curriculum Learning for your scenario:

  1. Initial Training Phase:

    • Start with a subset of classes, for example, the first 10 classes.
    • Train the model using the subset of images and annotations for these classes.
  2. Subsequent Training Phases:

    • Expand the class set for each subsequent training phase.
    • Introduce new classes and images into the training process, gradually increasing the model's exposure to additional classes over multiple training iterations.

By following this approach, you can systematically introduce the entire 40-class dataset to the model while managing the computational and storage challenges associated with a large dataset.

If you have specific questions or require further assistance in implementing Curriculum Learning with YOLOv5, feel free to ask.

@pythonstuff8
Copy link

What is your raspberry os

@pythonstuff8
Copy link

version

@pythonstuff8
Copy link

this started happening to me when I changed to bookworm

@glenn-jocher
Copy link
Member

@pythonstuff8 i apologize for any confusion, but as the maintainer of the Ultralytics YOLOv5 repository, I don't have a specific Raspberry Pi OS version to provide. However, if you're experiencing issues after changing to Debian Bookworm, it's important to ensure that all dependencies for YOLOv5 are compatible with your current OS version.

Debian Bookworm is a testing branch that will eventually become the next stable release after Bullseye. Since it's a testing branch, you might encounter more issues due to the less stable nature of the packages.

Here are a few steps you can take to troubleshoot the issue:

  1. Update & Upgrade: Make sure your system is up to date with the latest packages.

    sudo apt update
    sudo apt full-upgrade
  2. Dependencies: Check that all required dependencies for YOLOv5 are installed and compatible with Bookworm. This includes Python, PyTorch, and other libraries.

  3. Python Environment: Consider creating a new Python virtual environment to ensure a clean setup for YOLOv5 dependencies.

  4. Check Logs: Look at the system logs and the output of dmesg to see if there are any kernel or driver issues that could be causing the segmentation fault.

  5. Hardware: Ensure that your Raspberry Pi hardware is functioning correctly and that there are no issues with the SD card or memory.

If you continue to experience issues, you might want to consider using a more stable version of Raspberry Pi OS, such as the latest stable release, or seeking assistance from the Raspberry Pi community forums or Debian support channels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

8 participants