Skip to content

VictorHe-1/mindocr_pse

Repository files navigation

Introduction

MindOCR is an open-source toolbox for OCR development and application based on MindSpore. It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfill image-text understanding needs.

Major Features
  • Modulation design: We decouple the OCR task into several configurable modules. Users can set up the training and evaluation pipeline easily for customized data and models with a few lines of modification.
  • High-performance: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
  • Low-cost-to-apply: We provide easy-to-use inference tools to perform text detection and recognition tasks.

Installation

Dependency

To install the dependency, please run

pip install -r requirements.txt

Additionally, please install MindSpore(>=1.9) following the official installation instructions for the best fit of your machine.

For distributed training, please install openmpi 4.0.3.

Environment Version
MindSpore >=1.9
Python >=3.7

Notes:

  • If you use MX Engine for Inference, the version of Python should be 3.9.
  • If scikit_image cannot be imported, you can use the following command line to set environment variable $LD_PRELOAD referring to here. Change path/to to your directory.
    export LD_PRELOAD=path/to/scikit_image.libs/libgomp-d22c30c5.so.1.0.0:$LD_PRELOAD

Install with PyPI

Coming soon

Install from Source

The latest version of MindOCR can be installed as follows:

pip install git+https://github.com/mindspore-lab/mindocr.git

Notes: MindOCR is only tested on MindSpore>=1.9, Linux on GPU/Ascend devices currently.

Quick Start

1. Model Training and Evaluation

1.1 Text Detection

We will take DBNet model and ICDAR2015 dataset as an example to illustrate how to configure the training process with a few lines of modification on the yaml file.

Please refer to DBNet readme for detailed instructions.

1.2 Text Recognition

We will take CRNN model and LMDB dataset as an illustration on how to configure and launch the training process easily.

Detailed instructions can be viewed in CRNN readme.

Note: The training pipeline is fully extendable. To train other text detection/recognition models on a new dataset, please configure the model architecture (backbone, neck, head) and data pipeline in the yaml file and launch the training script with python tools/train.py -c /path/to/yaml_config.

2. Inference and Deployment

2.1 Inference with MX Engine

MX, which is short for MindX, allows efficient model inference and deployment on Ascend devices.

MindOCR supports OCR model inference with MX Engine. Please refer to mx_infer for detailed illustrations.

2.2 Inference with MS Lite

Coming soon

2.3 Inference with native MindSpore

Coming soon

Model List

Text Detection
Text Recognition
  • CRNN (TPAMI'2016)
  • ABINet (CVPR'2021) [dev]
  • SVTR (IJCAI'2022) [infer only]

For the detailed performance of the trained models, please refer to configs.

For detailed inference performance using MX engine, please refer to mx inference performance

Datasets

Download

We give instructions on how to download the following datasets.

Text Detection

Conversion

After downloading these datasets in the DATASETS_DIR folder, you can run bash tools/convert_datasets.sh to convert all downloaded datasets into the target format. Here is an example of icdar2015 dataset converting.

Notes

Change Log

  • 2023/04/12
  1. Support parameter grouping, which can be configure by the grouping_strategy or no_weight_decay_params arg.
  • 2023/03/23
  1. Add dynamic loss scaler support, compatible with drop overflow update. To enable dynamic loss scaler, please set type of loss_scale as dynamic. A YAML example can be viewed in configs/rec/crnn/crnn_icdar15.yaml
  • 2023/03/20
  1. Arg names changed: output_keys -> output_columns, num_keys_to_net -> num_columns_to_net
  2. Data pipeline updated
  • 2023/03/13
  1. Add system test and CI workflow.
  2. Add modelarts adapter to allow training on OpenI platform. To train on OpenI:
  i)   Create a new training task on the openi cloud platform.
  ii)  Link the dataset (e.g., ic15_mindocr) on the webpage.
  iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
  iv)  Add run parameter `enable_modelarts` and set True on the website UI interface.
  v)   Fill in other blanks and launch.
  • 2023/03/08
  1. Add evaluation script with arg ckpt_load_path
  2. Arg ckpt_save_dir is moved from system to train in yaml.
  3. Add drop_overflow_update control

How to Contribute

We appreciate all kinds of contributions including issues and PRs to make MindOCR better.

Please refer to CONTRIBUTING.md for the contributing guideline. Please follow the Model Template and Guideline for contributing a model that fits the overall interface :)

License

This project follows the Apache License 2.0 open-source license.

Citation

If you find this project useful in your research, please consider citing:

@misc{MindSpore OCR 2023,
    title={{MindSpore OCR }:MindSpore OCR Toolbox},
    author={MindSpore Team},
    howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
    year={2023}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages