Japanese characters changed on paste in Editor #722

takarabune · 2023-11-25T12:07:05Z

System Information

Pythonista N/A (N/A), Default interpreter 3.10.4
iOS 17.1, model iPad11,1, resolution (portrait) 1536.0 x 2048.0 @ 2.0

Pythonista 3.4

Problem
Certain Japanese characters are changed upon pasting them into the IDE
Amongst the characters are
（〜で）で
だぢづでど　ぱぴぷぺぽ　がぎぐげご　バビブべボ
ゟ
ヿ
｟-ﾟ
～
㈠-㉃㊀-㋾㌀-㍿

when the above is pasted to the IDE it changes to
(〜で)で
だぢづでどぱぴぷぺぽがぎぐげごバビブべボ
より
コト
⦅-゚
~
(一)-(至)一-ヲアパート-株式会社

many characters are decomposed into separate characters or similar (but importantly different) characters are substituted. While some characters may look identical they have different codepoints. Note how the brackets have changed from full width. Also note the change in the tilde like character 〜　and the changes in the symbols on the last line.

If the correct characters are in the editor they can be copied from there and pasted correctly to another text editor.

demonstration code

import unicodedata

typed_text = ("で")
pasted_text = ("で") # manual copy and paste from previous line

print (typed_text, "==", pasted_text, ":", typed_text == pasted_text)

print("typed text: ")
for c in typed_text: print(ord(c), unicodedata.name(c))

print("pasted text: ")
for c in pasted_text: print(ord(c), unicodedata.name(c))

NB: do not copy and paste this into Pythonista IDE.
Use a text editor then run the file in Pythonista.

it outputs the following in the console

で == で : False
typed text: 
12391 HIRAGANA LETTER DE
pasted text: 
12390 HIRAGANA LETTER TE
12441 COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK

Expected behavior
I expect the characters copied to be the same characters after pasting them.

Comment
I noticed this when some regex patterns I pasted in weren’t correct.
When tokenizing some pasted text it showed how some characters had been decomposed.
When searching for a term in a sqlite database a pasted in query didn’t return an expected match because characters had been changed on pasting.

It can be worked around by not pasting I suppose. Or by using a 3rd party IDE or text editor.

When inputting data in a finished script it most likely isn’t an issue as data will be read from an external file.

It points to unicode not being handled properly on paste.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Japanese characters changed on paste in Editor #722

Japanese characters changed on paste in Editor #722

takarabune commented Nov 25, 2023

Japanese characters changed on paste in Editor #722

Japanese characters changed on paste in Editor #722

Comments

takarabune commented Nov 25, 2023