add colored text handling library #169

haxscramper · 2022-01-17T15:49:13Z

Add colored text handling library for use in message reporting, treeRepr pretty printing, compiler/debugutils.render(), testament diff, spellsuggest corrections and more. Implementation-wise it is complete (ready for review), but missing some other parts:

Text formatting tests, at least for basic SGR ANSI escape codes
Documentation for public procs
Adequate commit messages
Examples of usage

Basic design overview: seq[Rune + Styling] is a "colored text". Rune is used to allow proper formatting of the Unicode text. Using just escape codes with string would've been highly impractical due to the need of filtering them out later with --colors=off. Current implementation provides necessary abstraction layer that allows cli_report generator to always generate colored reports texts that are simply rendered as non-colored ones.

haxscramper · 2022-01-17T15:56:50Z

Using #166 it is possible to implement many helper procedures, like this one, that can be used in lots of different reports (commands mismatches, typos, spelling correction for linter, diff implementations)

proc stringMismatchMessage*(
    input: string,
    expected: openarray[string],
    colored: bool = true,
    fixSuggestion: bool = true,
    showAll: bool = false,
  ): ColoredText =
  ## - TODO :: Better heuristics for missing/extraneous prefix/suffix

  let expected = deduplicate(expected)

  if expected.len == 0:
    return clt("No matching alternatives")


  var results = stringMismatchCandidates(input, expected).
    sortedByIt(it.distance)

  let best = results[0]

  if best.distance > int(input.len.float * 0.8):
    result = &"No close matches to {toRed(input, colored)}, possible " &
      namedItemListing(
        clt("alternative"),
        results[0 .. min(results.high, 3)].mapIt(it.target.toYellow()),
        clt("or")
      )

  else:
    result = clt(&"Did you mean to use '{toYellow(best.target, colored)}'?")

    if fixSuggestion:
      if best.edits.len < min(3, input.len div 2):
        result &= " (" & getEditVisual(
          toSeq(input),
          toSeq(best.target),
          best.edits,
          dollar[char]
        ) & ")"

      else:
        result &= clt(
          &" ({toRed(input, colored)} -> {toGreen(best.target, colored)})")

    if showAll and expected.len > 1:
      result &= " ("
      for idx, alt in results[1 ..^ 1]:
        if idx > 0:
          result &= " "

        result &= (toItalic(alt.target, colored) & "?") + tcGrey63

      result &= ")"

krux02

Generally speaking, I am not a fan of this. It adds several data structures and 765 lines of code just to handle colored text in a very inefficient form. Then it seems to oversee many attributes of unicode and how styling text is done in general, that I can't approve of this addition.

I've added colored to Nim as well, in astalgo all it does is it declares some constants in AnsiEscapes and adds them to the strings when needed. No additional types necessary.

From me this is a no to this PR, sorry.

krux02 · 2022-01-17T23:25:48Z

lib/experimental/colortext.nim

+
+type
+  ColStyle* = object
+    case use8Bit*: bool


so you store per rune a boolean that says if it uses 8 bit or not? Seems very inefficient if you ask me.

It uses 8-bit colors styling, not 8 bits per color

I've added colored to Nim as well

You mean this one, correct? nim-lang/Nim@20a21aa

Well define what you see as "correct". It is correct in terms of ANSI color escapes, and it is very tiny. But it is certainly not correct in terms of platform agnostic. It just supports one single (standardized) type of text coloring in a minimalistic way.

Then it seems to oversee many attributes of Unicode and how styling text is done in general,

I don't understand what attributes of Unicode I'm missing, or what I'm missing from text styling. I do support all 16-color and 256-color schemes, as well as all regular text styling attributes.

https://www.compart.com/en/unicode/U+200F
grapheme clusters will break vertical alignment as far as I can see

That suffers from exactly the same issue as I've described in the PR title - all code is littered with logical checks, you can't remove colors after they have been added, you can't do anything else on the text after it has been colorized (alignment does not work properly, you can't pad things left/right, get length etc., only append new text)

Well except from the logical checks, that are cetainly bad/annoying. This is designed to be minimal, yes. Only implement what is actually necessary.

Maybe you should start with a spec/requirements sheet first, so that I can judge according to the spec.

Again, this is not a high-performance string handling library, it is a create-and-write-out solution. I don't plan to store these strings anywhere, or try to constantly rearrange them beyond what is minimally necessary to produce formatted output. I can try and squeeze a couple of bytes from the color handling, for example by using 64-bit integer and some bit hacking if you want this, but considering it is almost guaranteed that I will call stdout.write or stderr.write afterwards for each constructed string, I see little to no reason to do these micro-optimizations and the do IO that would dwarf all the runtime.

Actually I did measurements in the past, string handling and measurements are a performance bottleneck in the compiler.

string handling and measurements are a performance bottleneck in the compiler.

Which part of the string handling - error reporting or PSym/PIdent/nkStrLit handling? Those are two different use cases, and this library is not going to be used to represent all existing string handling operations from now on, so general measurements hardly apply here. Yes, it is slower than concatenating \e[31m from const, but whether this becomes a bottleneck is another question.

grapheme clusters will break vertical alignment as far as I can see

Is that common issue that comes up in handing of the compiler messages (which are mostly English text) and some basic Unicode drawing characters?

To be honest, I'm not even sure if I understand the issue at hand, I would need to read Unicode spec/documentation or something of that nature.

This is designed to be minimal, yes. Only implement what is actually necessary.

Yes, it is minimal and does exactly what was necessary at the time, which is to colorize a couple string elements with hardcoded styles. That's where things usually start, and if there is no demand to change anything, it is generally fine. But in this specific case, I'm trying to add a module that would make it easier to quickly progress and iterate through different error message formats and styles, without getting constantly bogged down in details.

How about we add red underline for the error message location, like clang/rust/elm etc. do? Or implement inline type diffs like clang does on templates? Maybe some other way of error formatting - without being able to even align text blocks to each other, we won't get anywhere if we try to cling to the "minimal" approach here.

https://clang.llvm.org/diagnostics.html

https://elm-lang.org/news/compiler-errors-for-humans

Try printing out highlighted code on "but expression has type" instead of garbled monochrome mess. Or do something for the abysmal quality of the type mismatch errors - try styling mismatched parts differently, or doing some smarter highlighting inside the failed candidate.

If we have to constantly drag around if conf.useColors(): wantedTypeColor & typeToStr(arg.typ) & ansiCloso else: typeToStr(arg.typ) or even conf.maybeWrap(typeToStr(arg.typ), wantedTypeColor) for the sake of some ephemeral performance gains in an area where easy iterative improvements are much more important, we won't get anywhere.

Is that common issue that comes up in handing of the compiler messages (which are mostly English text) and some basic Unicode drawing characters?

To be honest, I'm not even sure if I understand the issue at hand, I would need to read Unicode spec/documentation or something of that nature.

The question is, is this for compiler internal use only, or is it supposed to be a string coloring library exposed for the public to be used? If it is compiler internal only, I think you would get almost away with ASCII only, When it is a public API, you have to dig into the Unicode spec/documentation, there is no way to avoid its complexity.

For now it is most for compiler use, just like experimental/diff, in the future it can be expanded as necessary and moved out of experimental

krux02 · 2022-01-17T23:27:53Z

lib/experimental/colortext.nim

+
+      of false:
+        fg*: ForegroundColor
+        bg*: BackgroundColor


8 bit colors are usually used to save storage space, but with this condition, no storage space will be saved at all.

Because I'm not aiming to save storage space here, this is not a high-performance string handling library, it is a convenience wrapper that would make it easier to write colored output

krux02 · 2022-01-17T23:33:34Z

lib/experimental/colortext.nim

+func `+`*(text: ColRuneLine, style: ColStyle): ColRuneLine =
+  for ch in text:
+    result.add ch + style
+


so many + operators. Nim uses & for string concatenation. Why differ here?

Because it is not a string concatenation, it is a text + style operation. I don't specifically care what operator to use here, bit & wasn't used specifically because it is not a concatenation.

Please don't use the mathematical + operator for this. Using operators this way, was the primary reason many programming languages abandoned operator overloading entirely.

What operator do you want to use then? Except for & which is used to denote concatenation, which is not what is going on here.

+ is already used for sets, and for style + style operations it is exactly what happens - I merge to sets of formatting options.

Using & for text + style will prevent code like "red" + fgRed & "blue" + fgBlue and require ("red" & fgRed) & ("blue" & fgBlue) for no apparent benefit

% might be used for text % style it has higher precedence, and already used for string interpolation in std/strutils and generally is not considered to be commutative (C uses it for mod etc.), so probably is a good solution here

Actually mixing % and + suffers from the same issues - need to add extraneous () all over the place to avoid compilation errors, so it can be any single operator that is not & and has a higher precedence.

I am not sure yet. I don't know about any established style operators yet. Generally I would prefer a real verb instead of an operator. From my experience in language design it is as important to read something as it is to write it.

saem · 2022-01-18T03:44:14Z

Generally speaking, I am not a fan of this. It adds several data structures and 765 lines of code just to handle colored text in a very inefficient form. Then it seems to oversee many attributes of unicode and how styling text is done in general, that I can't approve of this addition.

Design improvements, if we're running into performance issues makes sense. With that said, reading the PR description we've got colouring in many locations, so implementing that in 4 plus places I think some abstraction makes sense.

I'm trying to think about the API and if it precludes optimization. But it seems to create a template and we interpolate into it to get coloured output.

haxscramper · 2022-01-18T06:57:18Z

I've added colored to Nim as well

You mean this one, correct? nim-lang/Nim@20a21aa

Then it seems to oversee many attributes of Unicode and how styling text is done in general,

I don't understand what attributes of Unicode I'm missing, or what I'm missing from text styling. I do support all 16-color and 256-color schemes, as well as all regular text styling attributes.

I've added colored to Nim as well, in astalgo all it does is it declares some constants in AnsiEscapes and adds them to the strings when needed.

That suffers from exactly the same issue as I've described in the PR title - all code is littered with logical checks, you can't remove colors after they have been added, you can't do anything else on the text after it has been colorized (alignment does not work properly, you can't pad things left/right, get length etc., only append new text)

Again, this is not a high-performance string handling library, it is a create-and-write-out solution. I don't plan to store these strings anywhere, or try to constantly rearrange them beyond what is minimally necessary to produce formatted output. I can try and squeeze a couple of bytes from the color handling, for example by using 64-bit integer and some bit hacking if you want this, but considering it is almost guaranteed that I will call stdout.write or stderr.write afterwards for each constructed string, I see little to no reason to do these micro-optimizations and the do IO that would dwarf all the runtime.

haxscramper · 2022-01-18T07:00:15Z

@saem

But it seems to create a template and we interpolate into it to get coloured output.

proc treeRepr*(
    pnode: PNode,
    pathIndexed: bool     = false,
    positionIndexed: bool = true,
    maxdepth: int         = 120,
    maxlen: int           = 30,
    lineInfo: bool        = false
  ): ColoredText =

  coloredResult()

  proc aux(n: PNode, level: int, idx: seq[int]) =
    let pref = joinPrefix(level, idx, pathIndexed, positionIndexed)

    add pref
    if isNil(n):
      add "<nil>" + fgRed
      return

    if level > maxdepth:
      add " ..."
      return

    add hshow(n.kind)

    proc addComment(sep: bool = true) =
      if n.comment.len > 0:
        add "\n"
        for idx, line in enumerate(
          split(n.comment.strip(leading = false), '\n')
        ):
          if idx > 0: add "\n"
          add pref & "  # " & line + fgCyan

      elif sep:
        add " "

    proc addFlags() =
      if not isNil(n.typ):
        add " "
        add n.typ.lispRepr(colored)

      if n.flags.len > 0:
        add " nflags:" & to8Bit($n.flags, 2, 0, 3)


    if lineInfo:
      add "@"
      add $n.info.fileIndex.int + fgBlue
      add "/"
      add $n.info.line + fgCyan
      add ":"
      add $n.info.col + fgCyan
      add " "

    case n.kind:
      of nkStrKinds:
        add " "
        add "\"" & n.getStrVal() + fgYellow & "\""
        addFlags()
        addComment()

      of nkIntKinds:
        add " "
        add $n.intVal + fgBlue
        addFlags()
        addComment()

      of nkFloatKinds:
        add " "
        add $n.floatVal + fgMagenta
        addFlags()
        addComment()

      of nkIdent:
        add " "
        add n.getStrVal() + fgGreen
        addFlags()
        addComment()

      of nkSym:
        add " "
        add n.getStrVal() + fgCyan
        add " sk:"
        add ($n.sym.kind)[2 ..^ 1] + fgCyan

        if n.sym.flags.len > 0:
          add " flags:" & to8Bit($n.sym.flags, 2, 0, 3)

        if n.sym.magic != mNone:
          add " magic:" & to8Bit($n.sym.magic, 2, 0, 5)

        addFlags()
        addComment()

      of nkCommentStmt:
        addFlags()
        addComment()

      else:
        discard


    if n.kind notin nkTokenKinds:
      addFlags()
      if n.len > 0:
        add "\n"

      addComment(false)

      for newIdx, subn in n:
        aux(subn, level + 1, idx & newIdx)
        if level + 1 > maxDepth:
          break

        if newIdx > maxLen:
          break

        if newIdx < n.len - 1:
          add "\n"

  aux(pnode, 0, @[])

example API usage - I've decided to not make add varargs, because template does not work very good with these. Maybe add should've been a proc though

krux02 · 2022-01-18T13:44:19Z

Just as an inspiration, nos as, this is how it should be done.

In emacs a styled string is a tuple of a string, and a data structure that contains all styling for that string. So the split is at a higher level, rather than at a lower level. This that also allows an easy style removal while also being effective when there is little style to be applied. The cost isn't spent per character.

But this is also only how emacs does it for string literals, styled text within buffers is probably stored somewhat differently, otherwise insertion would break all these styling properties all the time.

There is certainly something to read about how text styling and alignment can be done.

haxscramper · 2022-01-18T14:09:08Z

Yes, if this actually happens to be a bottleneck (which I really doubt), we can invest more time optimizing this. Pretty sure there are a lot of other places where we can speed up the compiler, and I'd rather prefer to make accent of making error message better rather than failing faster. I don't especially care how quickly a compiler can tell me my code is wrong, I care how much time do I need to spend fixing it up (and understanding what was wrong) - and by extensions, it should be easy for compiler authors to iterate on the messages.

krux02 · 2022-01-18T15:51:04Z

Yes, if this actually happens to be a bottleneck (which I really doubt), we can invest more time optimizing this. Pretty sure there are a lot of other places where we can speed up the compiler, and I'd rather prefer to make accent of making error message better rather than failing faster. I don't especially care how quickly a compiler can tell me my code is wrong, I care how much time do I need to spend fixing it up (and understanding what was wrong) - and by extensions, it should be easy for compiler authors to iterate on the messages.

I've actually measured the compiler performance in the past and inefficient data structures for text generation (mostly tiny ropes for backend code) is actually affecting the performance measurably. I know this is for error messages and not for code generation, but it just happend to be two days ago that I actually went in the compiler and just print something on all the hidden reports that the compiler generates but doesn't print on screen (for whatever reason). I wanted to find out why tunittests doesn't print a deprecation warning for the usage of macros.callsite, which is used in unittests. And damn it was a lot of messages, I did not expect it. I didn't save them though because I focused on that deprecation warning alone, so sorry I can't give you an exact number. I can't do it right now, but maybe later I can come back with an exact number.

Anyway, I think I was annoying enough for now.

haxscramper · 2022-01-18T16:47:10Z

print something on all the hidden reports that the compiler generates but doesn't print on screen (for whatever reason)

Yes, with the old approach to error generation it would've been a more noticeable performance hit, but with structured reports it becomes completely irrelevant, as we only pay for messages that actually are going to be displayed and nothing else.

The old system eagerly formatted all the reports and had randomly placed 'optimization' hacks for compiles(), tryConstExpr(), presentFailedCandidates() to partially mitigate this.

haxscramper · 2022-01-19T12:35:53Z

Checklist done, ready to be reviewed and merged.

saem

I did a quick skim of the docs. Will go through it again tomorrow, as in struggling to hold things in my head right now (tired).

lib/experimental/colortext.nim

Add colored text handling library for use in message reporting, treeRepr pretty printing, `compiler/debugutils.render()`, testament diff, spellsuggest corrections and more. Basic design overview: `seq[Rune + Styling]` is a "colored text". Rune is used to allow proper formatting of the Unicode text. Helper procs to merge string primitives and formatting styles (`"test" + fgRed`) and do basic formatting operations (align left/right/center, indent)

haxscramper · 2022-01-21T09:19:20Z

bors r+

bors · 2022-01-21T10:00:35Z

Build succeeded:

haxscramper changed the title ~~experimental: add colored text handling library~~ add colored text handling library Jan 17, 2022

krux02 requested changes Jan 17, 2022

View reviewed changes

haxscramper force-pushed the colortext branch 2 times, most recently from 1cccd6f to 1f8b971 Compare January 18, 2022 13:21

haxscramper force-pushed the colortext branch 4 times, most recently from 746b4de to 78b20ca Compare January 19, 2022 12:33

haxscramper force-pushed the colortext branch 4 times, most recently from 8e0352c to 4dd9e3e Compare January 19, 2022 16:17

saem requested changes Jan 20, 2022

View reviewed changes

haxscramper force-pushed the colortext branch from 4dd9e3e to 4235ba0 Compare January 20, 2022 15:51

saem approved these changes Jan 21, 2022

View reviewed changes

bors bot merged commit e6119b4 into nim-works:devel Jan 21, 2022

haxscramper added the stdlib Standard library label Nov 21, 2022

haxscramper added this to the Standard library additions or cleanu milestone Dec 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add colored text handling library #169

add colored text handling library #169

haxscramper commented Jan 17, 2022 •

edited

Loading

haxscramper commented Jan 17, 2022 •

edited

Loading

krux02 left a comment

krux02 Jan 17, 2022

haxscramper Jan 18, 2022

krux02 Jan 18, 2022

haxscramper Jan 18, 2022 •

edited

Loading

haxscramper Jan 18, 2022 •

edited

Loading

haxscramper Jan 18, 2022 •

edited

Loading

krux02 Jan 18, 2022

haxscramper Jan 18, 2022

krux02 Jan 17, 2022

haxscramper Jan 18, 2022 •

edited

Loading

krux02 Jan 17, 2022

haxscramper Jan 18, 2022

krux02 Jan 18, 2022

haxscramper Jan 18, 2022

haxscramper Jan 18, 2022 •

edited

Loading

haxscramper Jan 18, 2022

krux02 Jan 18, 2022

saem commented Jan 18, 2022

haxscramper commented Jan 18, 2022

haxscramper commented Jan 18, 2022 •

edited

Loading

krux02 commented Jan 18, 2022

haxscramper commented Jan 18, 2022 •

edited

Loading

krux02 commented Jan 18, 2022

haxscramper commented Jan 18, 2022 •

edited

Loading

haxscramper commented Jan 19, 2022 •

edited

Loading

saem left a comment

haxscramper commented Jan 21, 2022

bors bot commented Jan 21, 2022

add colored text handling library #169

add colored text handling library #169

Conversation

haxscramper commented Jan 17, 2022 • edited Loading

haxscramper commented Jan 17, 2022 • edited Loading

krux02 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haxscramper Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

haxscramper Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

haxscramper Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haxscramper Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haxscramper Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saem commented Jan 18, 2022

haxscramper commented Jan 18, 2022

haxscramper commented Jan 18, 2022 • edited Loading

krux02 commented Jan 18, 2022

haxscramper commented Jan 18, 2022 • edited Loading

krux02 commented Jan 18, 2022

haxscramper commented Jan 18, 2022 • edited Loading

haxscramper commented Jan 19, 2022 • edited Loading

saem left a comment

Choose a reason for hiding this comment

haxscramper commented Jan 21, 2022

bors bot commented Jan 21, 2022

haxscramper commented Jan 17, 2022 •

edited

Loading

haxscramper commented Jan 17, 2022 •

edited

Loading

haxscramper Jan 18, 2022 •

edited

Loading

haxscramper Jan 18, 2022 •

edited

Loading

haxscramper Jan 18, 2022 •

edited

Loading

haxscramper Jan 18, 2022 •

edited

Loading

haxscramper Jan 18, 2022 •

edited

Loading

haxscramper commented Jan 18, 2022 •

edited

Loading

haxscramper commented Jan 18, 2022 •

edited

Loading

haxscramper commented Jan 18, 2022 •

edited

Loading

haxscramper commented Jan 19, 2022 •

edited

Loading