Skip to content

toasted-nutbread/yomichan-bccwj-frequency-dictionary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

This repository contains the source code of a script which is used to generate a frequency dictionary for use with Yomichan. It uses the data from Balanced Corpus of Contemporary Written Japanese (BCCWJ), supporting both short and long unit words. The generated dictionary file does not contain part-of-speech information, as Yomichan does not currently support this.

Links

Usage

Prerequisites

This script uses a component from Yomichan's implementation, specifically the JapaneseUtil class from japanese-util.js.

This file must be manually copied into the same directory as main.js in order for the script to work.

Running

A node script is used to generate the dictionary data:

node main.js path/to/bccwj-data.tsv ./output [long-unit-words] [min-frequency]
  • [long-unit-words] (optional) - true if using the long unit words (LUW) list; false otherwise.
  • [min-frequency] (optional) - Integer representing the minimum number of occurrences. Default is 0.

The data can then be added to a .zip archive using any software. The example below uses the 7z command line executable to generate the archive:

7z a -tzip -mx=9 -mm=Deflate -mtc=off -mcu=on BCCWJ-SUW.zip ./output/*.json

About

Script to create a frequency dictionary for Yomichan

Resources

License

Stars

Watchers

Forks

Packages

No packages published