Incorrect processing result for keywords having symbols #5

brandonbai · 2019-03-01T04:12:37Z

Use word "s&m", "s & m", "2 girls 1 cups" ... to run profanity.censor with the default config got the incorrect result.
for example:

print(profanity.censor("s & m"))
# s & m

why ?

The text was updated successfully, but these errors were encountered:

snguyenthanh · 2019-03-01T05:34:11Z

Thank you for reporting the issue.

I'm in the first stage of troubleshooting the problem. It seems to be caused by function update_next_words_indices, which returns a wrong list of next words to be parsed.

I will keep this issue updated when I have any new findings.

snguyenthanh · 2019-03-14T16:24:51Z

From my side, 2 girls 1 cups returns the correct result.

s & m seems to be caused by update_next_words_indices, which doesn't create the expected list of words, due to the character &.

Take hello 123 as an example:

how the library works is, when the a word is identified (hello), it checks for if any continuous combination of it and the following word(s) forms a swear word in the wordlist.
What function update_next_words_indices does is, returning a list of following words starting from the current one found. So in this sample it will return a List ['123', ' 123']

However, for s & m, the & character is specified as a separated value (just as , and ), instead of being grouped into the List of following words from update_next_words_indices.

As I'm very busy with my study in this period, I won't be able to fix this bug anytime soon in ~1 month.
Please feel free to create a PR for this.

snguyenthanh · 2019-05-15T16:01:58Z

This is considered a major development for the library, which I wouldn't be able to do this in the near future, due to a tight schedule as a last-year student.

A suggestion on how to fix is to create a separated wordlist for special words, ones with separators different than an empty space ' ' and requires the separator(s) to have an exact match (such as s & m).
While parsing the text, if the current word and next word(s) matches a set of words in the special wordlist, return True if the separator is also identical to return True; otherwise, return False.

oliver408i · 2022-05-17T01:25:46Z

Can't you just run the check on the text first, then if there is no detect, use regex to remove duplicates, and try again?

snguyenthanh added bug Something isn't working help wanted Extra attention is needed labels Mar 14, 2019

snguyenthanh changed the title ~~Incorrect processing result~~ Incorrect processing result for keywords having symbols Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect processing result for keywords having symbols #5

Incorrect processing result for keywords having symbols #5

brandonbai commented Mar 1, 2019

snguyenthanh commented Mar 1, 2019

snguyenthanh commented Mar 14, 2019 •

edited

Loading

snguyenthanh commented May 15, 2019 •

edited

Loading

oliver408i commented May 17, 2022

Incorrect processing result for keywords having symbols #5

Incorrect processing result for keywords having symbols #5

Comments

brandonbai commented Mar 1, 2019

snguyenthanh commented Mar 1, 2019

snguyenthanh commented Mar 14, 2019 • edited Loading

snguyenthanh commented May 15, 2019 • edited Loading

oliver408i commented May 17, 2022

snguyenthanh commented Mar 14, 2019 •

edited

Loading

snguyenthanh commented May 15, 2019 •

edited

Loading