Skip to content

The largest open source Bengali NLP dataset (still in building phase, contributions are welcome)

Notifications You must be signed in to change notification settings

neuropark/bengali-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bengali Dataset

version - 0.1.0 (Pre-release)

Introduction

Bengali Dataset is the largest open source Bengali dataset for NLP. Solving NLP for Bengali comes with a broad set of challenges and difficulties. This is our first step to solve this problem. In future this dataset will be integrated with HuggingFace datasets library.

Number of Samples

This data set will contain 1M annotated samples

Contribute

This dataset is still in development phase, we need more contributors, developers to finish the initial 1M annotated Bengali dataset goal.

See the how to contribute guide

Contact the maintainers of the datasets

Join our discord community for further discussions.

LivingThings Community

About

The largest open source Bengali NLP dataset (still in building phase, contributions are welcome)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published