Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes in unescape routine #771

Merged
merged 10 commits into from
Jun 29, 2024
Merged

Fixes in unescape routine #771

merged 10 commits into from
Jun 29, 2024

Conversation

Mingun
Copy link
Collaborator

@Mingun Mingun commented Jun 26, 2024

This PR fixes 2 errors that I found when working on a new way to represent entity references:

  1. Currently, quick_xml does not allow too long numbers to represent character references. It, however, does not takes in account, that any number of leading zeroes are allowed, so the real character entity length can be any. Conformance test suite contains such tests.
  2. Allow override of default escape/unescape behavior in more situations #739 introduced regression where Attribute::unescape_value no longer unescaped predefined entities. That was fixed and checked that no similar places remains

Corresponding test (xmltest\valid\sa\042.xml -- one of):

<!DOCTYPE doc [
<!ELEMENT doc (#PCDATA)>
]>
<doc>&#00000000000000000000000000000000065;</doc>

@Mingun Mingun requested a review from dralley June 26, 2024 18:51
@Mingun Mingun added the bug label Jun 26, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 30.88235% with 47 lines in your changes missing coverage. Please review.

Project coverage is 60.16%. Comparing base (7558577) to head (0315ed0).
Report is 59 commits behind head on master.

Files Patch % Lines
src/escape.rs 24.13% 44 Missing ⚠️
src/events/attributes.rs 0.00% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #771      +/-   ##
==========================================
- Coverage   61.81%   60.16%   -1.66%     
==========================================
  Files          41       41              
  Lines       16798    16138     -660     
==========================================
- Hits        10384     9709     -675     
- Misses       6414     6429      +15     
Flag Coverage Δ
unittests 60.16% <30.88%> (-1.66%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/escape.rs Outdated
// We should not allow sign numbers, but u32::from_str_radix will accept them.
// Because we cannot directly construct necessary error, we use the function
// which would return it for us
b'+' | b'-' => u32::from_str_radix("=", 10),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this very much, personally. I can see the appeal of using from_str_radix and ParseIntError, the overlap is pretty significant, but this is definitely a hack around the mismatch in semantics. I'd rather define our own ParseIntError (or equivalent) that is easily convertible from the original and can be easily returned on its own. You can copy the variants and descriptions straight from the original.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking I realized that here we needed our own type of error in any case (ParseCharRefError), into which I moved the variants that make sense only when parsing the character reference.

Negative sing will return ParseIntError(InvalidDigit) because we parse into unsigned number,
but `+` sign is allowed by the parse method. To return the same error for both `+` and `-`
we check for sign itself before parse
///
/// Currently, only `0x0` character produces this error.
IllegalCharacter(u32),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// We should not allow sign numbers, but u32::from_str_radix will accept `+`.
// We also handle `-` to be consistent in returned errors
Some(b'+') | Some(b'-') => Err(ParseCharRefError::UnexpectedSign),
_ => u32::from_str_radix(src, radix).map_err(ParseCharRefError::InvalidNumber),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Much better, thanks

@Mingun Mingun merged commit 9a72c7b into tafia:master Jun 29, 2024
7 checks passed
@Mingun Mingun deleted the escape-fixes branch June 29, 2024 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants