Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make frontend normalize line endings to LF #3903

Merged
merged 8 commits into from
May 20, 2024

Conversation

kmill
Copy link
Collaborator

@kmill kmill commented Apr 14, 2024

To eliminate parsing differences between Windows and other platforms, the frontend now normalizes all CRLF line endings to LF, like in Rust.

Effects:

  • This makes Lake hashes be faithful to what Lean sees (Lake already normalizes line endings before computing hashes).
  • Docstrings now have normalized line endings. In particular, this fixes #guard_msgs failing multiline tests for Windows users using CRLF.
  • Now strings don't have different lengths depending on the platform. Before this PR, the following theorem is true for LF and false for CRLF files.
example : "
".length = 1 := rfl

Note: the normalization will take \r\r\n and turn it into \r\n. In the elaborator, we reject loose \r's that appear in whitespace. Rust instead takes the approach of making the normalization routine fail. They do this so that there's no downstream confusion about any \r\n that appears.

Implementation note: the LSP maintains its own copy of a source file that it updates when edit operations are applied. We are assuming that edit operations never split or join CRLFs. If this assumption is not correct, then the LSP copy of a source file can become slightly out of sync. If this is an issue, there is some discussion here.

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Apr 14, 2024
@@ -125,7 +125,7 @@ def applyDocumentChange (oldText : FileMap) : (change : Lsp.TextDocumentContentC
| TextDocumentContentChangeEvent.rangeChange (range : Range) (newText : String) =>
replaceLspRange oldText range newText
| TextDocumentContentChangeEvent.fullChange (newText : String) =>
newText.toFileMap
newText.crlfToLf.toFileMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move crlfToLf to toFileMap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replaceLspRange case made me decide not to incorporate crlfToLf into toFileMap. We only want to normalize the line endings of the new text, and it's worth making it explicit where we're normalizing.

There's still something unsatisfactory however, which is that if somehow a \r\n is split and re-merged by edit operations, then we can get into a situation where there's an unnormalized \r\n. I'm not sure if this is an issue that could crop up in practice.

If we're not happy with leaving it in the current form, here are some solutions that came to mind:

  1. Have a second FileMap with the unnormalized text. Make sure LSP operations edit that, and then re-normalize it to create the actual FileMap used by elaboration each time.
  2. Do that, but accelerate it somehow. There's probably some fancy data structure supporting inserting-and-normalizing.
  3. Make the LSP throw an error when it detects a loose \r. It's not clear how to get this information back to the user in a nice way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state seems fine. I'm more worried about accidentally forgetting crlfToLf somewhere in the future, but I agree that this is not that important and explicitly seeing the normalization in the code is helpful, too.

src/Init/Data/String/Extra.lean Show resolved Hide resolved
src/Init/Data/String/Extra.lean Show resolved Hide resolved
kmill added 3 commits May 7, 2024 10:31
To reduce differences between Windows and other platforms, the frontend now normalizes all CRLF line endings to LF.

Effects:
- Lake's hashes already use normalized line endings. This change makes the hashes be faithful to what Lean sees.
- Docstrings are affected by line endings. In particular, this fixes `#guard_msgs` failing multilines tests for Windows users using CRLF.
- Now strings don't have different lengths depending on the platform. The following theorem is true for LF and false for CRLF files.
```lean
example : "
".length = 1 := rfl
```
@kmill kmill marked this pull request as ready for review May 7, 2024 18:44
@leanprover-community-mathlib4-bot
Copy link
Collaborator

leanprover-community-mathlib4-bot commented May 7, 2024

Mathlib CI status (docs):

  • ❗ Std CI can not be attempted yet, as the nightly-testing-2024-05-07 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Std CI should run now. (2024-05-07 19:01:17)
  • ❗ Mathlib CI can not be attempted yet, as the nightly-testing-2024-05-17 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Mathlib CI should run now. (2024-05-19 17:36:03)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase b278f9dd3096a6f183812b6f9129fc79f8675f00 --onto f53b778c0d4a474383eab709d67db5e12357f39d. (2024-05-20 16:30:51)

@tydeu
Copy link
Member

tydeu commented May 7, 2024

Now that we have position termination proofs in core (e.g., termination_by text.utf8ByteSize - pos.byteIdx), it may be worth supporting these automatically through decreasing_trivial. For example, I did this over in my Partax library (code).

@kmill kmill enabled auto-merge May 19, 2024 17:33
@kmill kmill added this pull request to the merge queue May 19, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 19, 2024
@kmill kmill added the full-ci label May 19, 2024
@kmill kmill added this pull request to the merge queue May 20, 2024
Merged via the queue into leanprover:master with commit a7338c5 May 20, 2024
20 checks passed
grunweg added a commit to leanprover-community/mathlib4 that referenced this pull request May 21, 2024
Now that the Lean frontend normalises all line endings to LF (since leanprover/lean4#3903), this check is not necessary any more. It is also one fewer Python linter to rewrite in Lean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants