-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add colored text handling library #169
Conversation
Using #166 it is possible to implement many helper procedures, like this one, that can be used in lots of different reports (commands mismatches, typos, spelling correction for linter, diff implementations) proc stringMismatchMessage*(
input: string,
expected: openarray[string],
colored: bool = true,
fixSuggestion: bool = true,
showAll: bool = false,
): ColoredText =
## - TODO :: Better heuristics for missing/extraneous prefix/suffix
let expected = deduplicate(expected)
if expected.len == 0:
return clt("No matching alternatives")
var results = stringMismatchCandidates(input, expected).
sortedByIt(it.distance)
let best = results[0]
if best.distance > int(input.len.float * 0.8):
result = &"No close matches to {toRed(input, colored)}, possible " &
namedItemListing(
clt("alternative"),
results[0 .. min(results.high, 3)].mapIt(it.target.toYellow()),
clt("or")
)
else:
result = clt(&"Did you mean to use '{toYellow(best.target, colored)}'?")
if fixSuggestion:
if best.edits.len < min(3, input.len div 2):
result &= " (" & getEditVisual(
toSeq(input),
toSeq(best.target),
best.edits,
dollar[char]
) & ")"
else:
result &= clt(
&" ({toRed(input, colored)} -> {toGreen(best.target, colored)})")
if showAll and expected.len > 1:
result &= " ("
for idx, alt in results[1 ..^ 1]:
if idx > 0:
result &= " "
result &= (toItalic(alt.target, colored) & "?") + tcGrey63
result &= ")"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, I am not a fan of this. It adds several data structures and 765 lines of code just to handle colored text in a very inefficient form. Then it seems to oversee many attributes of unicode and how styling text is done in general, that I can't approve of this addition.
I've added colored to Nim as well, in astalgo all it does is it declares some constants in AnsiEscapes and adds them to the strings when needed. No additional types necessary.
From me this is a no to this PR, sorry.
lib/experimental/colortext.nim
Outdated
|
||
type | ||
ColStyle* = object | ||
case use8Bit*: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you store per rune a boolean that says if it uses 8 bit or not? Seems very inefficient if you ask me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses 8-bit colors styling, not 8 bits per color
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added colored to Nim as well
You mean this one, correct? nim-lang/Nim@20a21aa
Well define what you see as "correct". It is correct in terms of ANSI color escapes, and it is very tiny. But it is certainly not correct in terms of platform agnostic. It just supports one single (standardized) type of text coloring in a minimalistic way.
Then it seems to oversee many attributes of Unicode and how styling text is done in general,
I don't understand what attributes of Unicode I'm missing, or what I'm missing from text styling. I do support all 16-color and 256-color schemes, as well as all regular text styling attributes.
https://www.compart.com/en/unicode/U+200F
grapheme clusters will break vertical alignment as far as I can see
That suffers from exactly the same issue as I've described in the PR title - all code is littered with logical checks, you can't remove colors after they have been added, you can't do anything else on the text after it has been colorized (alignment does not work properly, you can't pad things left/right, get length etc., only append new text)
Well except from the logical checks, that are cetainly bad/annoying. This is designed to be minimal, yes. Only implement what is actually necessary.
Maybe you should start with a spec/requirements sheet first, so that I can judge according to the spec.
Again, this is not a high-performance string handling library, it is a create-and-write-out solution. I don't plan to store these strings anywhere, or try to constantly rearrange them beyond what is minimally necessary to produce formatted output. I can try and squeeze a couple of bytes from the color handling, for example by using 64-bit integer and some bit hacking if you want this, but considering it is almost guaranteed that I will call
stdout.write
orstderr.write
afterwards for each constructed string, I see little to no reason to do these micro-optimizations and the do IO that would dwarf all the runtime.
Actually I did measurements in the past, string handling and measurements are a performance bottleneck in the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string handling and measurements are a performance bottleneck in the compiler.
Which part of the string handling - error reporting or PSym/PIdent/nkStrLit handling? Those are two different use cases, and this library is not going to be used to represent all existing string handling operations from now on, so general measurements hardly apply here. Yes, it is slower than concatenating \e[31m
from const, but whether this becomes a bottleneck is another question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grapheme clusters will break vertical alignment as far as I can see
Is that common issue that comes up in handing of the compiler messages (which are mostly English text) and some basic Unicode drawing characters?
To be honest, I'm not even sure if I understand the issue at hand, I would need to read Unicode spec/documentation or something of that nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is designed to be minimal, yes. Only implement what is actually necessary.
Yes, it is minimal and does exactly what was necessary at the time, which is to colorize a couple string elements with hardcoded styles. That's where things usually start, and if there is no demand to change anything, it is generally fine. But in this specific case, I'm trying to add a module that would make it easier to quickly progress and iterate through different error message formats and styles, without getting constantly bogged down in details.
How about we add red underline for the error message location, like clang/rust/elm etc. do? Or implement inline type diffs like clang does on templates? Maybe some other way of error formatting - without being able to even align text blocks to each other, we won't get anywhere if we try to cling to the "minimal" approach here.
Try printing out highlighted code on "but expression has type" instead of garbled monochrome mess. Or do something for the abysmal quality of the type mismatch errors - try styling mismatched parts differently, or doing some smarter highlighting inside the failed candidate.
If we have to constantly drag around if conf.useColors(): wantedTypeColor & typeToStr(arg.typ) & ansiCloso else: typeToStr(arg.typ)
or even conf.maybeWrap(typeToStr(arg.typ), wantedTypeColor)
for the sake of some ephemeral performance gains in an area where easy iterative improvements are much more important, we won't get anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that common issue that comes up in handing of the compiler messages (which are mostly English text) and some basic Unicode drawing characters?
To be honest, I'm not even sure if I understand the issue at hand, I would need to read Unicode spec/documentation or something of that nature.
The question is, is this for compiler internal use only, or is it supposed to be a string coloring library exposed for the public to be used? If it is compiler internal only, I think you would get almost away with ASCII only, When it is a public API, you have to dig into the Unicode spec/documentation, there is no way to avoid its complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now it is most for compiler use, just like experimental/diff
, in the future it can be expanded as necessary and moved out of experimental
lib/experimental/colortext.nim
Outdated
|
||
of false: | ||
fg*: ForegroundColor | ||
bg*: BackgroundColor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 bit colors are usually used to save storage space, but with this condition, no storage space will be saved at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I'm not aiming to save storage space here, this is not a high-performance string handling library, it is a convenience wrapper that would make it easier to write colored output
func `+`*(text: ColRuneLine, style: ColStyle): ColRuneLine = | ||
for ch in text: | ||
result.add ch + style | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so many +
operators. Nim uses &
for string concatenation. Why differ here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is not a string concatenation, it is a text + style
operation. I don't specifically care what operator to use here, bit &
wasn't used specifically because it is not a concatenation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use the mathematical +
operator for this. Using operators this way, was the primary reason many programming languages abandoned operator overloading entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What operator do you want to use then? Except for &
which is used to denote concatenation, which is not what is going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+
is already used for sets, and for style + style
operations it is exactly what happens - I merge to sets of formatting options.
Using &
for text + style
will prevent code like "red" + fgRed & "blue" + fgBlue
and require ("red" & fgRed) & ("blue" & fgBlue)
for no apparent benefit
%
might be used for text % style
it has higher precedence, and already used for string interpolation in std/strutils
and generally is not considered to be commutative (C uses it for mod etc.), so probably is a good solution here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually mixing %
and +
suffers from the same issues - need to add extraneous ()
all over the place to avoid compilation errors, so it can be any single operator that is not &
and has a higher precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure yet. I don't know about any established style operators yet. Generally I would prefer a real verb instead of an operator. From my experience in language design it is as important to read something as it is to write it.
Design improvements, if we're running into performance issues makes sense. With that said, reading the PR description we've got colouring in many locations, so implementing that in 4 plus places I think some abstraction makes sense. I'm trying to think about the API and if it precludes optimization. But it seems to create a template and we interpolate into it to get coloured output. |
You mean this one, correct? nim-lang/Nim@20a21aa
I don't understand what attributes of Unicode I'm missing, or what I'm missing from text styling. I do support all 16-color and 256-color schemes, as well as all regular text styling attributes.
That suffers from exactly the same issue as I've described in the PR title - all code is littered with logical checks, you can't remove colors after they have been added, you can't do anything else on the text after it has been colorized (alignment does not work properly, you can't pad things left/right, get length etc., only append new text) Again, this is not a high-performance string handling library, it is a create-and-write-out solution. I don't plan to store these strings anywhere, or try to constantly rearrange them beyond what is minimally necessary to produce formatted output. I can try and squeeze a couple of bytes from the color handling, for example by using 64-bit integer and some bit hacking if you want this, but considering it is almost guaranteed that I will call |
proc treeRepr*(
pnode: PNode,
pathIndexed: bool = false,
positionIndexed: bool = true,
maxdepth: int = 120,
maxlen: int = 30,
lineInfo: bool = false
): ColoredText =
coloredResult()
proc aux(n: PNode, level: int, idx: seq[int]) =
let pref = joinPrefix(level, idx, pathIndexed, positionIndexed)
add pref
if isNil(n):
add "<nil>" + fgRed
return
if level > maxdepth:
add " ..."
return
add hshow(n.kind)
proc addComment(sep: bool = true) =
if n.comment.len > 0:
add "\n"
for idx, line in enumerate(
split(n.comment.strip(leading = false), '\n')
):
if idx > 0: add "\n"
add pref & " # " & line + fgCyan
elif sep:
add " "
proc addFlags() =
if not isNil(n.typ):
add " "
add n.typ.lispRepr(colored)
if n.flags.len > 0:
add " nflags:" & to8Bit($n.flags, 2, 0, 3)
if lineInfo:
add "@"
add $n.info.fileIndex.int + fgBlue
add "/"
add $n.info.line + fgCyan
add ":"
add $n.info.col + fgCyan
add " "
case n.kind:
of nkStrKinds:
add " "
add "\"" & n.getStrVal() + fgYellow & "\""
addFlags()
addComment()
of nkIntKinds:
add " "
add $n.intVal + fgBlue
addFlags()
addComment()
of nkFloatKinds:
add " "
add $n.floatVal + fgMagenta
addFlags()
addComment()
of nkIdent:
add " "
add n.getStrVal() + fgGreen
addFlags()
addComment()
of nkSym:
add " "
add n.getStrVal() + fgCyan
add " sk:"
add ($n.sym.kind)[2 ..^ 1] + fgCyan
if n.sym.flags.len > 0:
add " flags:" & to8Bit($n.sym.flags, 2, 0, 3)
if n.sym.magic != mNone:
add " magic:" & to8Bit($n.sym.magic, 2, 0, 5)
addFlags()
addComment()
of nkCommentStmt:
addFlags()
addComment()
else:
discard
if n.kind notin nkTokenKinds:
addFlags()
if n.len > 0:
add "\n"
addComment(false)
for newIdx, subn in n:
aux(subn, level + 1, idx & newIdx)
if level + 1 > maxDepth:
break
if newIdx > maxLen:
break
if newIdx < n.len - 1:
add "\n"
aux(pnode, 0, @[]) example API usage - I've decided to not make |
1cccd6f
to
1f8b971
Compare
Just as an inspiration, nos as, this is how it should be done. In emacs a styled string is a tuple of a string, and a data structure that contains all styling for that string. So the split is at a higher level, rather than at a lower level. This that also allows an easy style removal while also being effective when there is little style to be applied. The cost isn't spent per character. But this is also only how emacs does it for string literals, styled text within buffers is probably stored somewhat differently, otherwise insertion would break all these styling properties all the time. There is certainly something to read about how text styling and alignment can be done. |
Yes, if this actually happens to be a bottleneck (which I really doubt), we can invest more time optimizing this. Pretty sure there are a lot of other places where we can speed up the compiler, and I'd rather prefer to make accent of making error message better rather than failing faster. I don't especially care how quickly a compiler can tell me my code is wrong, I care how much time do I need to spend fixing it up (and understanding what was wrong) - and by extensions, it should be easy for compiler authors to iterate on the messages. |
I've actually measured the compiler performance in the past and inefficient data structures for text generation (mostly tiny ropes for backend code) is actually affecting the performance measurably. I know this is for error messages and not for code generation, but it just happend to be two days ago that I actually went in the compiler and just print something on all the hidden reports that the compiler generates but doesn't print on screen (for whatever reason). I wanted to find out why tunittests doesn't print a deprecation warning for the usage of Anyway, I think I was annoying enough for now. |
Yes, with the old approach to error generation it would've been a more noticeable performance hit, but with structured reports it becomes completely irrelevant, as we only pay for messages that actually are going to be displayed and nothing else. The old system eagerly formatted all the reports and had randomly placed 'optimization' hacks for |
746b4de
to
78b20ca
Compare
Checklist done, ready to be reviewed and merged. |
8e0352c
to
4dd9e3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick skim of the docs. Will go through it again tomorrow, as in struggling to hold things in my head right now (tired).
Add colored text handling library for use in message reporting, treeRepr pretty printing, `compiler/debugutils.render()`, testament diff, spellsuggest corrections and more. Basic design overview: `seq[Rune + Styling]` is a "colored text". Rune is used to allow proper formatting of the Unicode text. Helper procs to merge string primitives and formatting styles (`"test" + fgRed`) and do basic formatting operations (align left/right/center, indent)
4dd9e3e
to
4235ba0
Compare
bors r+ |
Build succeeded: |
Add colored text handling library for use in message reporting,
treeRepr
pretty printing,compiler/debugutils.render()
, testament diff, spellsuggest corrections and more. Implementation-wise it is complete (ready for review), but missing some other parts:Basic design overview:
seq[Rune + Styling]
is a "colored text".Rune
is used to allow proper formatting of the Unicode text. Using just escape codes with string would've been highly impractical due to the need of filtering them out later with--colors=off
. Current implementation provides necessary abstraction layer that allowscli_report
generator to always generate colored reports texts that are simply rendered as non-colored ones.