fix: out of bounds panic when diffing large number of words #695

JonathanxD · 2024-04-05T15:16:20Z

I dropped all the old text because it does not make sense anymore.

The problem is basically that Rust's str::lines() will return the same number of lines regardless of the last character being a new line or not, and the assumptions made on inline and side-by-side display implementations depends on having a clear distinction of those cases.

Those changes introduces a function that will chain an additional element when the string ends with a newline. I'm not aware of any function from stdlib that does what we need (and I couldn't find one by looking at the documentation), the closest one was split, but it has obvious implications (\r\n, for example).

This does solves #688, #694, #682 and #681, I tested all the provided files and they're all working without any panics.

Update 2024-07-08:

I reworked on this patch and now it does not add a new line to the end of the string, therefore, all tests are passing, including the regression ones (which was failing before). In this new revision it fixes #682 again, which involves --display=inline and was not working anymore when I initially merged the changes from master into my branch.

I also tested another case that I found out and it fixes #656.

The new approach is more hacky: in one case it just removes the additional line range when the line is empty, and in another case it just checks if the index is within range before trying to access the index. However, unlike the previous solution, this one does not fail the regression tests.

JonathanxD · 2024-04-06T22:14:42Z

I ran a small script to compare the output of db281c6 with my branch to find the cause of regression, it's all the trailing newlines that were not being printed before.

Wilfred · 2024-04-07T23:32:53Z

Thanks for the pull request, and the clear explanation! This sounds like a regression from 8c004be.

JonathanxD · 2024-04-13T01:17:08Z

Thanks for the pull request, and the clear explanation!

You're welcome.

This sounds like a regression from 8c004be.

I'm not sure if this is the cause of the regression because unless I'm missing something, the changes in this commit are just inlining the content of split_on_newlines on the call-site, which was already using lines(). I thought it could be different but after trying on playground, both versions yielded the same results.

@Wilfred Do you have any plans to merge my changes in the near future? If so, I can update the regression file so the pipeline succeeds. If you don't want to merge, I can use the examples provided in the reported issues to bisect which commit caused the regression and bring it to the discussion.

fschoenm · 2024-05-24T14:54:04Z

What's the status of this PR? I'm running into this issue too.

JonathanxD · 2024-05-25T00:21:49Z

What's the status of this PR? I'm running into this issue too.

Idk, everything is fine from my side. It depends on how @Wilfred wants to go from there.

I've been personally just building difftastic with my patches on top since without them difft would crash 90% of the time for my use-case.

This approach removes the additional empty line that would cause the display logic to disagree in line count. The major difference is that this new approach does not change the output of the display implementation, therefore, no test files need to be updated, and no additional line is printed to the end-user. As for the cause of the index out-of-bounds, this is simply a disagreement between line sources, such as the `line-numbers` crate and Tree Sitter, which all reports the correct number of lines, with the `str::lines` iterator which does not iterate over the last new line character if the last line is empty.

This is caused by the same problem as before, but the disagreement seems to be with the Tree itself and happens inside `syntax::change_positions`, there may be another way to work around this by removing the correct entry from `rhs_positions` when the conditions are met.

JonathanxD · 2024-07-09T20:20:39Z

I noticed that #731 does the same thing as I did (but slightly differently), which is to ignore the last empty line, and it will fix most of the out-of-bounds panics except for the yaml one, which is caused by the same problem, but in a different location (mine "fixes" this one by skipping out of bounds indexes).

JonathanxD force-pushed the fix/out-of-bounds-panic branch 5 times, most recently from f4161f2 to b686f58 Compare April 6, 2024 03:25

JonathanxD force-pushed the fix/out-of-bounds-panic branch from 017381e to d6c22b8 Compare July 8, 2024 23:48

JonathanxD force-pushed the fix/out-of-bounds-panic branch from d6c22b8 to ab39730 Compare July 8, 2024 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: out of bounds panic when diffing large number of words #695

fix: out of bounds panic when diffing large number of words #695

JonathanxD commented Apr 5, 2024 •

edited

Loading

JonathanxD commented Apr 6, 2024

Wilfred commented Apr 7, 2024

JonathanxD commented Apr 13, 2024 •

edited

Loading

fschoenm commented May 24, 2024

JonathanxD commented May 25, 2024

JonathanxD commented Jul 9, 2024

fix: out of bounds panic when diffing large number of words #695

Are you sure you want to change the base?

fix: out of bounds panic when diffing large number of words #695

Conversation

JonathanxD commented Apr 5, 2024 • edited Loading

JonathanxD commented Apr 6, 2024

Wilfred commented Apr 7, 2024

JonathanxD commented Apr 13, 2024 • edited Loading

fschoenm commented May 24, 2024

JonathanxD commented May 25, 2024

JonathanxD commented Jul 9, 2024

JonathanxD commented Apr 5, 2024 •

edited

Loading

JonathanxD commented Apr 13, 2024 •

edited

Loading