Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Commit

Permalink
fix UnicodeEncodeError when retrieving words from utf-8 encoded file
Browse files Browse the repository at this point in the history
Summary:
This commit fixes the issue #746
pybind11's `py::str` constructor [has a different behaviour](https://github.com/pybind/pybind11/blob/ccbe68b084806dece5863437a7dc93de20bd9b15/include/pybind11/pytypes.h#L930) between Python 2 and Python 3. When casting from C++ string to py::str, we should decode as utf-8, but we should also encode it back in order to construct `py::str` correctly.

Reviewed By: EdouardGrave

Differential Revision: D14783627

fbshipit-source-id: 8a7d4b16f42d6d892203cf3d72f144427008dd7f
  • Loading branch information
Celebio authored and facebook-github-bot committed Apr 16, 2019
1 parent 4cd9db0 commit 71c0ee5
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions python/fastText/pybind/fasttext_pybind.cc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ py::str castToPythonString(const std::string& s, const char* onUnicodeError) {
if (!handle) {
throw py::error_already_set();
}

// py::str's constructor from a PyObject assumes the string has been encoded
// for python 2 and not encoded for python 3 :
// https://github.com/pybind/pybind11/blob/ccbe68b084806dece5863437a7dc93de20bd9b15/include/pybind11/pytypes.h#L930
#if PY_MAJOR_VERSION < 3
handle = PyUnicode_AsEncodedString(handle, "utf-8", onUnicodeError);
#endif

return py::str(handle);
}

Expand Down

0 comments on commit 71c0ee5

Please sign in to comment.