Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characters in pages are garbled (get "mojibake") with Perl 5.22.0 or later #134

Merged
merged 3 commits into from
Dec 11, 2017

Conversation

ikedas
Copy link
Member

@ikedas ikedas commented Dec 4, 2017

This bug was reported by a japanese listmaster. Users in non-Latin-1 languages (language other than "en", "de", "fr", ...) will notice this bug. Latin-1 users rarely realize it but warnings are spit out to error log:

wwsympa.fcgi: Use of wide characters in FCGI::Stream::PRINT is deprecated and will stop wprking in a future version of FCGI at /usr/local/lib/perl5/site_perl/mach/5.24/Template.pm line 162.

Background: Starting with Perl 5.22.0, POSIX::strftime() returns Unicode (utf8 flag set) string under UTF-8 locale.

Sympa::Language::gettext_strftime() uses it to format localized date/time, and utf8-flagged texts were mixed in the output.

Fixed by dropping utf8 flag from the result of POSIX::strftime().


4 Dec Update: Added warning message.

Users in non-Latin-1 languages (language other than "en", "de", "fr", ...) will notice this bug.

Background: Starting with Perl 5.22.0, POSIX::strftime() returns Unicode (utf8 flag set) string under UTF-8 locale.
- https://metacpan.org/pod/release/SHAY/perl-5.26.1/pod/perl5220delta.pod#Better-heuristics-on-older-platforms-for-determining-locale-UTF-8ness
- https://perl5.git.perl.org/perl.git/commit/9717af6d049902fc887c412facb2d15e785ef1a4

Sympa::Language::gettext_strftime() uses it to format localized date/time, and utf8-flagged texts were mixed in the output.

Fixed by dropping utf8 flag from the result of POSIX::strftime().
@ikedas ikedas added the bug label Dec 4, 2017
@ikedas ikedas changed the title [bug] Most of characters in pages are garbled (get "mojibake") with Perl 5.22.0 or later Most of characters in pages are garbled (get "mojibake") with Perl 5.22.0 or later Dec 4, 2017
@racke
Copy link
Contributor

racke commented Dec 4, 2017

In most cases it should not be necessary to mess with the UTF-8 flags, so maybe the real problem is elsewhere?

@ikedas
Copy link
Member Author

ikedas commented Dec 4, 2017

Well, this type of problem messes non-Latin-1 people.

Sympa uses binary UTF-8 (utf8 flag off) as internal text encoding, not Unicode (utf8 flag on). Today several modules return Unicode strings as results. Mixture of binary UTF-8 and Unicode strings will be "upgraded" to Latin-1 encoded by Unicode (and print() will issue "wide characters" warnings).

So with Latin-1 people, problem will often not be actualized. However with non-Latin-1 people, things are serious: Most or all of non-ASCII letters will be broken.

@ikedas ikedas changed the title Most of characters in pages are garbled (get "mojibake") with Perl 5.22.0 or later Characters in pages are garbled (get "mojibake") with Perl 5.22.0 or later Dec 5, 2017
@ikedas ikedas merged commit dca9b6d into sympa-community:sympa-6.2 Dec 11, 2017
@ikedas ikedas added this to the 6.2.24 milestone Dec 11, 2017
@ikedas ikedas deleted the mojibake-strftime branch December 14, 2017 05:44
@ikedas ikedas mentioned this pull request Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants