-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word differ returns out-of-bounds column numbers #699
Comments
Aha: // Don't create a MatchedPos for empty positions at the start
// or end. We still want empty positions in the middle of
// multiline atoms, as a multiline string literal may include
// empty lines.
let pos = filter_empty_ends(pos); This ends badly, since the content of the tag starts with a newline and there is no corresponding filtering of that. So we take the byte offset of the I guess a possible fix (which also would save some RAM) would be to just work with byte offsets everywhere instead of line:col, but that would probably be hard. I guess you also have alternatives like this hack (which is mostly for illustration purposes): diff --git a/src/parse/syntax.rs b/src/parse/syntax.rs
index 4d9e22fa6..d25d28058 100644
--- a/src/parse/syntax.rs
+++ b/src/parse/syntax.rs
@@ -4,6 +4,7 @@
use std::{cell::Cell, env, fmt, hash::Hash, num::NonZeroU32};
+use line_numbers::LineNumber;
use line_numbers::LinePositions;
use line_numbers::SingleLineSpan;
use typed_arena::Arena;
@@ -673,6 +674,7 @@ pub(crate) struct MatchedPos {
fn split_atom_words(
content: &str,
pos: &[SingleLineSpan],
+ skipped_first: bool,
opposite_content: &str,
opposite_pos: &[SingleLineSpan],
kind: AtomKind,
@@ -700,6 +702,15 @@ fn split_atom_words(
let content_newlines = LinePositions::from(content);
let opposite_content_newlines = LinePositions::from(opposite_content);
+ let root_pos = if skipped_first {
+ SingleLineSpan {
+ line: LineNumber::from(pos[0].line.0 - 1),
+ start_col: pos[0].start_col,
+ end_col: pos[0].end_col,
+ }
+ } else {
+ pos[0]
+ };
let mut offset = 0;
let mut opposite_offset = 0;
@@ -716,7 +727,7 @@ fn split_atom_words(
},
pos: content_newlines.from_region_relative_to(
// TODO: don't assume a single line atom.
- pos[0],
+ root_pos,
offset,
offset + word.len(),
)[0],
@@ -728,7 +739,7 @@ fn split_atom_words(
// This word is present on both sides.
// TODO: don't assume this atom is on a single line.
let word_pos =
- content_newlines.from_region_relative_to(pos[0], offset, offset + word.len())
+ content_newlines.from_region_relative_to(root_pos, offset, offset + word.len())
[0];
let opposite_word_pos = opposite_content_newlines.from_region_relative_to(
opposite_pos[0],
@@ -787,18 +798,22 @@ fn has_common_words(word_diffs: &Vec<myers_diff::DiffResult<&&str>>) -> bool {
}
/// Skip line spans at the beginning or end that have zero width.
-fn filter_empty_ends(line_spans: &[SingleLineSpan]) -> Vec<SingleLineSpan> {
+fn filter_empty_ends(line_spans: &[SingleLineSpan]) -> (Vec<SingleLineSpan>, bool) {
let mut spans: Vec<SingleLineSpan> = vec![];
+ let mut skipped_first = false;
for (i, span) in line_spans.iter().enumerate() {
if (i == 0 || i == line_spans.len() - 1) && span.start_col == span.end_col {
+ if i == 0 {
+ skipped_first = true;
+ }
continue;
}
spans.push(*span);
}
- spans
+ (spans, skipped_first)
}
impl MatchedPos {
@@ -812,7 +827,7 @@ impl MatchedPos {
// or end. We still want empty positions in the middle of
// multiline atoms, as a multiline string literal may include
// empty lines.
- let pos = filter_empty_ends(pos);
+ let (pos, skipped_first) = filter_empty_ends(pos);
match ck {
ReplacedComment(this, opposite) | ReplacedString(this, opposite) => {
@@ -839,7 +854,14 @@ impl MatchedPos {
AtomKind::Comment
};
- split_atom_words(this_content, &pos, opposite_content, opposite_pos, kind)
+ split_atom_words(
+ this_content,
+ &pos,
+ skipped_first,
+ opposite_content,
+ opposite_pos,
+ kind,
+ )
}
Unchanged(opposite) => {
let opposite_pos = match opposite { |
It seems that when a large token is split into multiple Syntax atoms, and that token spans multiple lines, column numbers that are out-of-bounds can be created (this may be a bug in the line-numbers crate).
To reproduce, run difftastic on old.xml and new.xml (they are from the Chromium repository). Files in files.tar.gz
You will get:
The badness happens somewhere in split_atom_words(), and I would assume that it is related to the comment “TODO: don't assume this atom is on a single line”. For instance, we have a single atom '.' at some point, at offset 67. We have
pos
that looks like this (though we only use the first element, pos[0]):What we get back, and put in word_pos, is:
This is nonsensical; line 1103 has only 64 columns, so column “66 to 67” doesn't exist and the diff gets confused. But on column 67 on line 1102, there is indeed a period. Why it ends up with line 1103 is a bit beyond me, because it's not entirely clear to me exactly what
from_region_relative_to
is supposed to be doing.The text was updated successfully, but these errors were encountered: