Skip to content

0.2 - Updated with v13 Mozilla Common Voice data

Compare
Choose a tag to compare
@KathyReid KathyReid released this 28 Mar 10:42
· 13 commits to master since this release
4abd602

The key changes in this version are:

  • The number of categories identified in the data have increased from 16 in the first version, to 20 in this one. The four additional categories are:

    • Linguistic heritage of speaker - indicating the speaker's language acquisition or immersion heritage, such as time spent in a location, or being born or raised in a location.
    • Socio-economic marker - indicating a speaker's association with a socio-economic group or class, such as Middle Class.
    • Hybrid dialect - indicating the speaker speaks using a dialect where two languages have come into contact - such as Denglish (German - Deutsch - and English) and Hinglish (Hindi and English, spoken in India).
    • Generational marker - indicating the speaker's association with a generation, belying their age range, such as Gen Z.
  • The number of individual accents identified has increased from 164 in the first version, to 235 in this one.

  • The number of relationships between individual accents, which indicate a co-occurrence between speaker-described accents, such as "German" and "England English", has increased from 297 in the first version, to 515 in this one.