Skip to content

Releases: paithiov909/gibasa

v1.1.1

06 Jul 05:49
Compare
Choose a tag to compare

What's Changed

  • tokenize now warns rather than throws an error when an invalid input is given during partial parsing. With this change, tokenize is no longer entirely aborted even if an invalid string is given. Parsing of those strings is simply skipped.

Full Changelog: v1.1.0...v1.1.1

v1.1.0

17 Feb 04:01
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.0.1...v1.1.0

v1.0.1

03 Dec 01:29
Compare
Choose a tag to compare

New Feature: dictionary compiler is integrated 🚀

In this release, added wrappers around the 'dictionary compiler' of MeCab.
With source dictionaries and CSV files, you can build MeCab system/user dictionaries without leaving your R console.

Even in environments where MeCab is not installed, such as the Posit Cloud, you can try this snippet right away!!

require(gibasa)

if (requireNamespace("withr")) {
    # create a sample dictionary in temporary directory
    build_sys_dic(
        dic_dir = system.file("latin", package = "gibasa"),
        out_dir = tempdir(),
        encoding = "utf8"
    )
    # copy the 'dicrc' file
    file.copy(
        system.file("latin/dicrc", package = "gibasa"),
        tempdir()
    )
    # write a csv file and compile it into a user dictionary
    csv_file <- tempfile(fileext = ".csv")
    writeLines(
        c(
            "qa, 0, 0, 5, \u304f\u3041",
            "qi, 0, 0, 5, \u304f\u3043",
            "qu, 0, 0, 5, \u304f",
            "qe, 0, 0, 5, \u304f\u3047",
            "qo, 0, 0, 5, \u304f\u3049"
        ),
        csv_file
    )
    build_user_dic(
        dic_dir = tempdir(),
        file = (user_dic <- tempfile(fileext = ".dic")),
        csv_file = csv_file,
        encoding = "utf8"
    )
    # mocking a 'mecabrc' file to temporarily use the dictionary
    withr::with_envvar(
        c(
            "MECABRC" = if (.Platform$OS.type == "windows") {
                "nul"
            } else {
                "/dev/null"
            },
            "RCPP_PARALLEL_BACKEND" = "tinythread"
        ),
        {
            tokenize("quensan", sys_dic = tempdir(), user_dic = user_dic)
        }
    )
}

Full Changelog: v0.9.5...v1.0.1

v0.9.5

09 Jul 13:02
Compare
Choose a tag to compare

Full Changelog: v0.9.4...v0.9.5

v0.9.4

03 Jun 13:11
Compare
Choose a tag to compare

Updated Makevars for Unix alikes. Users can now use a file specified by the MECABRC environment variable or ~/.mecabrc to set up dictionaries.

Full Changelog: v0.9.3...v0.9.4

v0.9.3

20 Apr 23:28
Compare
Choose a tag to compare

This is a patch release. For CRAN's checks, removed unnecessary C++ files.

v0.9.2

12 Apr 12:39
Compare
Choose a tag to compare

Initial CRAN release 🚀😎✨

I'm excited to announce {gibasa} is now on CRAN!!
Now you can more easily install {gibasa} from CRAN as well as from r-universe.

Full Changelog: v0.8.1...v0.9.2

v0.8.1

14 Mar 08:13
Compare
Choose a tag to compare

Full Changelog: v0.8.0...v0.8.1

v0.8.0

04 Mar 11:43
Compare
Choose a tag to compare

What's changed

  • [Breaking Change] Changed numbering style of 'sentence_id' when split is FALSE.
  • Added grain_size argument to tokenize.
  • Added new bind_lr function.
  • Use RcppParallel::parallelFor instead of tbb::parallel_for.

Full Changelog: v0.7.1...v0.8.0

v0.7.1

20 Jan 14:38
Compare
Choose a tag to compare

What's Changed

gibasa 0.7.1

  • Fix documentations. There are no visible changes.

gibasa 0.7.0

  • tokenize can now accept a character vector in addition to a data.frame like object.
  • gbs_tokenize is now deprecated. Please use the tokenize function instead.

gibasa 0.6.4

  • Refactored is_blank.

gibasa 0.6.3

  • Added the partial argument to gbs_tokenize and tokenize. This argument controls the partial parsing mode, which forces to extract given chunks of sentences when activated.

gibasa 0.6.2

  • More friendly errors are returned when invalid dictionary path was provided.
  • Added new posDebugRcpp function.

gibasa 0.6.1

  • Revert some missing examples.

Full Changelog: v0.6.0...v0.7.1