Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency sentencepiece to v0.2.0 #58

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Dec 5, 2022

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
sentencepiece ==0.1.3 -> ==0.2.0 age adoption passing confidence

Release Notes

google/sentencepiece (sentencepiece)

v0.2.0

Compare Source

Major changes

N/A

New features

  • [ALL] Added SentencePieceNormalizer class in C++/Python. It supports almost the equivalent feature of spm_normalize. Python Sample C++ Sample
  • [ALL] Added SentencePieceProcessor::Normalize method in C++/Python Python Sample
    C++ Sample
  • [ALL] Added functionality to override the normalization spec before the processing. Python Sample

Bug fixes & minor changes

v0.1.99

Compare Source

Major changes

N/A

New features

N/A

Bug fixes & minor changes

v0.1.98

Compare Source

Major changes
  • Python 3.11 support (wheel packages for python 3.11 are available)
  • Includes the entire full sources in the source python package to reduce the pip install troubles.
  • Improves the algorithm to initialize unigram seed vocabulary. Coverage is improved.
New features
  • [ALL] Added the feature to train the model with pre-tokenization boundary constraints. (--pretokenization_delimiter) flag
Bug fixes & minor changes
  • [ALL] Makes the error message more descriptive.
  • [ALL] Fixes the crash error when std::random_device failed
  • [ALL] Fixes the build error on Raspberry pi around atomic operation
  • [ALL] Fixes the minor bugs in nbest enumeration
  • [ALL] Fixes the build error when using the external protobuf library.
  • [ALL] Fixes the build error on a big-endian machine.
  • [Windows] Use /MD build flag instead of /MT.

v0.1.97

Compare Source

Major changes

  • Migrated the C++ version from C++11 to C++17.
  • Migrated the CI environment from Travis-CI to Github actions
  • Started using cibuildtool to build pypi wheel packages

New features

  • [ALL] Support differential privacy while training. https://aclanthology.org/2022.findings-acl.171.pdf
  • [ALL] Introduced APIs that return the struct of ImmutableSentencePieceText, which encodes string-token, id, and utf-8 byte offsets at once. New API is available both from C++ and Python.
  • [ALL] Allow tab ‘\t’ to be included in user defined symbols.
  • [ALL] Added NFKD normalization rule. NFKD rule is provided as a TSV file.
  • [ALL] Added option to emit unknown symbol instead of raw symbol.
  • [Python]: Batch encode/decode requests are performed in native multi-threads.
  • [Python]: Supports to pass a custom log stream during training.
  • [Python]: Adds module-level version variable: spm.__version__
  • [Python]: Creates wheel package of Mac universal binary.

Bug fixes & minor changes

  • Uses the efficient encoding algorithm by default. Removed the functionality to switch the Viterbi tokenization algorithm.
  • Make the output of Encode and 1-best from NBestEncode same.
  • Use std::string_view as much as possible.
  • [Python] Removed pip package for ppc64le and s390x architecture as cibuiltool doesn’t support them.

v0.1.96

Compare Source

Updates

  • Improves the performance of unigram training
  • Updated the nfkc normalization with the latest ICU module.
  • Stop handling zero-width-joiner string as whitespace.

New features

  • added new sampling algorithm without replacement.
  • added API for new sampling and perplexity calculation.
  • added allow_whitespace_only_pieces mode.

v0.1.95

Compare Source

Updates

  • support to build sentencepiece with the external (official) abseil library.
  • upgraded protobuf 3.14.0
  • changed the type of input_sentence_size from int32 to uint64.

v0.1.94

Compare Source

Updates

  • added SetRandomGeneratorSeed function to set the seed value for random generator. This can allow to make reproducible sampling.
  • Validate the range of the vocab id in Python module.
  • Change the directory arrangement of python module.
  • Added protobuf python module.

Bug fixes

  • Support to build python wheel from source package.

v0.1.92

Compare Source

Bug fix

  • Fixed the regression bug around the flag --minloglevel
  • Fixed build break on Solaris.

Minor upgrade

  • upgrade builtin protobuf to 3.12.3
  • Implmeneted absl::flags port.

v0.1.91

Compare Source

New API

Bug Fix

  • Ignores nbest parameter in BPE-dropout
  • fixed build error when SPM_ENABLE_NFKC_COMPILE=ON
  • fixed the cost computation around user_defined_symbol and faster encoding introduced in the previous release.

v0.1.90

Compare Source

Renamed v0.1.9 to v0.1.90 because PyPI doesn't recognize 0.1.9 as the latest release.

v0.1.86

Compare Source

  • Support tf 1.5.1 2.0.0 2.0.1 2.1.0 and 2.2.0rc3
  • Added python wrapper for Python3.8 on Mac

v0.1.85

Compare Source

Support tf 1.15 and Python3.8 on Windows

v0.1.83

Compare Source

  • Use the official docker image to build tf_sentencepiece ops
  • support tf 1.14.0 and tf 2.0.0-beta1.

v0.1.82

Compare Source

Bug fix: fixed the behavior of is_unknown method in Python module.

v0.1.81

Compare Source

Fix: support tensorflow 0.13.1

v0.1.8

Compare Source

Feature: Get rid of the dependency to external protobuf
Feature: added (Encode|Decode)AsSerializedProto interface so Python module can get full access to the SentencePieceText proto including the byte offsets/aligments
Feature: added --treat_whitespace_as_suffix option to make _ be a suffix of word.
Feature: Added normalization rules to remove control characters in the default nmt_* normalizers
Minor fix: simplify the error messager
Minor fix: do not emit full source path in LOG(INFO)

For more detail: google/sentencepiece@v0.1.7...v0.1.8

v0.1.7

Compare Source

Deprecated: --mining_sentence_size and --training_sentence_size. Load all sentences by default. --input_sentence_size can be specified to limit the sentences to be loaded
Feature: added --unk_piece/--bos_piece/--eos_piece/--pad_piece flags to change the surface representations of these special symbols.
Bug fix: added third_party directory for cmake's subdirectory.

For more detail:

v0.1.6

Compare Source

  • Bug fix: do not apply normalization to the user-defined-symbols.
  • Bug fix: stop adding extra whitespaces before user-defined symbols
  • Feature: added --minloglevel flag to suppress LOG(INFO) message
  • Feature: added --split_by_number flag to allow numbers to attach other symbols.
  • Feature: added --max_sentence_length flag to control the maximum byte length of input sentence for training.
  • used tf-versioned so file for _sentencepiece_processor_ops to minimize ABI incompatibility for tf wapper.

For more detail: google/sentencepiece@v0.1.5...master

v0.1.5

Compare Source

v0.1.4

Initial SentencePiece releases


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/sentencepiece-0.x branch from 1509275 to 3b171f6 Compare April 17, 2023 11:42
@renovate renovate bot changed the title Update dependency sentencepiece to v0.1.97 Update dependency sentencepiece to v0.1.98 Apr 17, 2023
@renovate renovate bot force-pushed the renovate/sentencepiece-0.x branch from 3b171f6 to 411aa3e Compare May 28, 2023 11:07
@renovate renovate bot changed the title Update dependency sentencepiece to v0.1.98 Update dependency sentencepiece to v0.1.99 May 28, 2023
@renovate renovate bot force-pushed the renovate/sentencepiece-0.x branch from 411aa3e to f11f17d Compare February 19, 2024 19:34
@renovate renovate bot changed the title Update dependency sentencepiece to v0.1.99 Update dependency sentencepiece to v0.2.0 Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants