Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError on lists with international users #8

Open
jpl166 opened this issue Jul 24, 2024 · 0 comments
Open

UnicodeDecodeError on lists with international users #8

jpl166 opened this issue Jul 24, 2024 · 0 comments

Comments

@jpl166
Copy link

jpl166 commented Jul 24, 2024

When trying to migrate lists with utf-8 characters anywhere in their configs, I get the a decode error from mm2s_unpickle:

Traceback (most recent call last):
File "./mm2s_unpickle.py", line 29, in
print(json.dumps(config_dict))
File "/usr/lib64/python2.7/json/init.py", line 244, in dumps
return _default_encoder.encode(obj)
File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xdc in position 0: invalid continuation byte

I modified mm2s_unpickle to print the raw unpickled config before the json.dumps() call, and digging around in that output I found a user as follows:

'[email protected]': '\xdcz\xfcc\xfchemzem'

Digging around further I found hundreds of examples in this one list's config and user community of non-ascii characters that are turning up in that output. In their passwords and in some users' names. I have other lists where we have such characters in the descriptions and info. All of these fail to migrate in explosive ways (tens of thousands of lines of console output VERY VERY QUICKLY). While some of these lists we could work around the problem by changing one subscriber's name to ascii, migrating, and then changing back, that one list has 14,000 subscribers and literally hundreds of examples of this breaking the migration.

I tried adding a ensure_ascii=False to the json.dumps() call and it made no difference in mm2s_unpickle.py. It appears that json.dumps() in Python3 would just do the right thing, but that won't load the mailman.bouncer module.

And of course, mailman2 itself has no problems with these characters, it's just the migration tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant