Skip to content

Country-Level Dialectal Arabic Lists: An Unsupervised Approach

Notifications You must be signed in to change notification settings

Maha-J-Althobaiti/Twt15DA_Lists

Repository files navigation

Twt15DA_Lists

Country-Level Dialectal Arabic Lists: An Unsupervised Approach

The lists of dialectal words for 15 countries are collected from Twitter. Every word in each Arabic dialect list is mentioned along with its PMI score, representing the word's degree of relatedness to that dialect.

The unsupervised approach to build the lists uses an iterative procedure consisting of three main steps: automatic creation of dialectal word lists, selection of seed words, and collection of dialectal sentences. The Pointwise Mutual Information (PMI) association measure, along with the geographical frequency of word occurrence online were used to classify dialectal words. The poor performance of MSA POS tagger on dialectal Arabic contents was exploited in order to extract the dialectal words.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).

https://creativecommons.org/licenses/by-nc-nd/4.0/

You are free to:
	Share — copy and redistribute the material in any medium or format 

Under the following terms:
	Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
	NonCommercial — You may not use the material for commercial purposes.
	NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material. 

Please cite our paper in any published work using this resource:

@article{althobaiti2021creation,
  title={Creation of annotated country-level dialectal Arabic resources: An unsupervised approach},
  author={Althobaiti, Maha J},
  journal={Natural Language Engineering},
  pages={1--42},
  year={2021},
  publisher={Cambridge University Press}
}

About

Country-Level Dialectal Arabic Lists: An Unsupervised Approach

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published