Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing malformed closing tags #776

Closed
evilpie opened this issue Jun 30, 2024 · 7 comments · Fixed by #780
Closed

Parsing malformed closing tags #776

evilpie opened this issue Jun 30, 2024 · 7 comments · Fixed by #780
Labels

Comments

@evilpie
Copy link

evilpie commented Jun 30, 2024

For Ruffle it would be useful if we could parse these two kinds of malformed XML:

<a></a/>

From ruffle-rs/ruffle#15674. Note the extra /.

<a></a hello="world">

From ruffle-rs/ruffle#16862. Note the attribute in the closing tag!

Excuse me if I missed a config option for this.

@Mingun
Copy link
Collaborator

Mingun commented Jun 30, 2024

I think, that should be possible today, because we do not check (yet. I plan to add optional validate method to events) content of the start / end events.

To be clear, both should be recognized as Event::End. You should disable end name validation because names would be a/ and a hello="world".

@dralley
Copy link
Collaborator

dralley commented Jul 1, 2024

It's literally possible to construct such an event, however there's no way to get the attributes from it unless you parse them yourself. BytesEnd doesn't provide any mechanism for this.

Looking at the documentation for BytesEnd: https://docs.rs/quick-xml/latest/quick_xml/events/struct.BytesEnd.html

We do give an example of creating such an event (although I think this isn't a great idea because it is currently our only example - we should not have the only example provided be an instance of nonconformant XML).

But in this example the entire string, including the attributes, is returned when you call .name(). I'm guessing that probably is not what you want @evilpie ?

@evilpie
Copy link
Author

evilpie commented Jul 1, 2024

We don't actually care about the attributes or anything else in the closing tag. I just found about the config option check_end_names. Setting this to false almost gives us the Flash behavior.

However Flash treats <a></a foo="bar"> and <a></a/> as just <a></a>, but it will throw for <a></b>, because the tags don't match ... The config option also allows the last as well, so we can't use it directly.

@Mingun
Copy link
Collaborator

Mingun commented Jul 3, 2024

I know only one example which we do not support right now:

</a foo=">">

will be parsed as

Event::End(BytesEnd::new("a foo=\""));
Event::Text(BytesText::from_escaped("\">"));

Do you need a mode that will parse this as

Event::End(BytesEnd::new("a foo=\">\""));

?

@evilpie
Copy link
Author

evilpie commented Jul 3, 2024

Oh no. Flash does parse it ...

var xml = new XML('<a></a foo=">">')
trace(xml.toXMLString()) // <a/>

It does seem very unlikely to happen, so if that specific case would be very hard we might be able to skip it.

@Mingun
Copy link
Collaborator

Mingun commented Jul 4, 2024

No, that is simple to support. I think, we even may always parse this in such way, because anyway the tag name cannot contain neither " or ' and any compliant parser will throw an error anyway. I plan to add an optional validate abilities which will reject such XMLs in any case no matter as we would parse it without validation.

@Mingun
Copy link
Collaborator

Mingun commented Jul 8, 2024

With release 0.36.0 it is possible to parse such XMLs without surprises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants