Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add colored text handling library #169

Merged
merged 1 commit into from
Jan 21, 2022
Merged

Conversation

haxscramper
Copy link
Collaborator

@haxscramper haxscramper commented Jan 17, 2022

Add colored text handling library for use in message reporting, treeRepr pretty printing, compiler/debugutils.render(), testament diff, spellsuggest corrections and more. Implementation-wise it is complete (ready for review), but missing some other parts:

  • Text formatting tests, at least for basic SGR ANSI escape codes
  • Documentation for public procs
  • Adequate commit messages
  • Examples of usage

Basic design overview: seq[Rune + Styling] is a "colored text". Rune is used to allow proper formatting of the Unicode text. Using just escape codes with string would've been highly impractical due to the need of filtering them out later with --colors=off. Current implementation provides necessary abstraction layer that allows cli_report generator to always generate colored reports texts that are simply rendered as non-colored ones.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Jan 17, 2022

Using #166 it is possible to implement many helper procedures, like this one, that can be used in lots of different reports (commands mismatches, typos, spelling correction for linter, diff implementations)

proc stringMismatchMessage*(
    input: string,
    expected: openarray[string],
    colored: bool = true,
    fixSuggestion: bool = true,
    showAll: bool = false,
  ): ColoredText =
  ## - TODO :: Better heuristics for missing/extraneous prefix/suffix

  let expected = deduplicate(expected)

  if expected.len == 0:
    return clt("No matching alternatives")


  var results = stringMismatchCandidates(input, expected).
    sortedByIt(it.distance)

  let best = results[0]

  if best.distance > int(input.len.float * 0.8):
    result = &"No close matches to {toRed(input, colored)}, possible " &
      namedItemListing(
        clt("alternative"),
        results[0 .. min(results.high, 3)].mapIt(it.target.toYellow()),
        clt("or")
      )

  else:
    result = clt(&"Did you mean to use '{toYellow(best.target, colored)}'?")

    if fixSuggestion:
      if best.edits.len < min(3, input.len div 2):
        result &= " (" & getEditVisual(
          toSeq(input),
          toSeq(best.target),
          best.edits,
          dollar[char]
        ) & ")"

      else:
        result &= clt(
          &" ({toRed(input, colored)} -> {toGreen(best.target, colored)})")

    if showAll and expected.len > 1:
      result &= " ("
      for idx, alt in results[1 ..^ 1]:
        if idx > 0:
          result &= " "

        result &= (toItalic(alt.target, colored) & "?") + tcGrey63

      result &= ")"

image

@haxscramper haxscramper changed the title experimental: add colored text handling library add colored text handling library Jan 17, 2022
Copy link
Contributor

@krux02 krux02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, I am not a fan of this. It adds several data structures and 765 lines of code just to handle colored text in a very inefficient form. Then it seems to oversee many attributes of unicode and how styling text is done in general, that I can't approve of this addition.

I've added colored to Nim as well, in astalgo all it does is it declares some constants in AnsiEscapes and adds them to the strings when needed. No additional types necessary.

From me this is a no to this PR, sorry.


type
ColStyle* = object
case use8Bit*: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you store per rune a boolean that says if it uses 8 bit or not? Seems very inefficient if you ask me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses 8-bit colors styling, not 8 bits per color

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added colored to Nim as well

You mean this one, correct? nim-lang/Nim@20a21aa

Well define what you see as "correct". It is correct in terms of ANSI color escapes, and it is very tiny. But it is certainly not correct in terms of platform agnostic. It just supports one single (standardized) type of text coloring in a minimalistic way.

Then it seems to oversee many attributes of Unicode and how styling text is done in general,

I don't understand what attributes of Unicode I'm missing, or what I'm missing from text styling. I do support all 16-color and 256-color schemes, as well as all regular text styling attributes.

https://www.compart.com/en/unicode/U+200F
grapheme clusters will break vertical alignment as far as I can see

That suffers from exactly the same issue as I've described in the PR title - all code is littered with logical checks, you can't remove colors after they have been added, you can't do anything else on the text after it has been colorized (alignment does not work properly, you can't pad things left/right, get length etc., only append new text)

Well except from the logical checks, that are cetainly bad/annoying. This is designed to be minimal, yes. Only implement what is actually necessary.

Maybe you should start with a spec/requirements sheet first, so that I can judge according to the spec.

Again, this is not a high-performance string handling library, it is a create-and-write-out solution. I don't plan to store these strings anywhere, or try to constantly rearrange them beyond what is minimally necessary to produce formatted output. I can try and squeeze a couple of bytes from the color handling, for example by using 64-bit integer and some bit hacking if you want this, but considering it is almost guaranteed that I will call stdout.write or stderr.write afterwards for each constructed string, I see little to no reason to do these micro-optimizations and the do IO that would dwarf all the runtime.

Actually I did measurements in the past, string handling and measurements are a performance bottleneck in the compiler.

Copy link
Collaborator Author

@haxscramper haxscramper Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string handling and measurements are a performance bottleneck in the compiler.

Which part of the string handling - error reporting or PSym/PIdent/nkStrLit handling? Those are two different use cases, and this library is not going to be used to represent all existing string handling operations from now on, so general measurements hardly apply here. Yes, it is slower than concatenating \e[31m from const, but whether this becomes a bottleneck is another question.

Copy link
Collaborator Author

@haxscramper haxscramper Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grapheme clusters will break vertical alignment as far as I can see

Is that common issue that comes up in handing of the compiler messages (which are mostly English text) and some basic Unicode drawing characters?

To be honest, I'm not even sure if I understand the issue at hand, I would need to read Unicode spec/documentation or something of that nature.

Copy link
Collaborator Author

@haxscramper haxscramper Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is designed to be minimal, yes. Only implement what is actually necessary.

Yes, it is minimal and does exactly what was necessary at the time, which is to colorize a couple string elements with hardcoded styles. That's where things usually start, and if there is no demand to change anything, it is generally fine. But in this specific case, I'm trying to add a module that would make it easier to quickly progress and iterate through different error message formats and styles, without getting constantly bogged down in details.

How about we add red underline for the error message location, like clang/rust/elm etc. do? Or implement inline type diffs like clang does on templates? Maybe some other way of error formatting - without being able to even align text blocks to each other, we won't get anywhere if we try to cling to the "minimal" approach here.

Try printing out highlighted code on "but expression has type" instead of garbled monochrome mess. Or do something for the abysmal quality of the type mismatch errors - try styling mismatched parts differently, or doing some smarter highlighting inside the failed candidate.

If we have to constantly drag around if conf.useColors(): wantedTypeColor & typeToStr(arg.typ) & ansiCloso else: typeToStr(arg.typ) or even conf.maybeWrap(typeToStr(arg.typ), wantedTypeColor) for the sake of some ephemeral performance gains in an area where easy iterative improvements are much more important, we won't get anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that common issue that comes up in handing of the compiler messages (which are mostly English text) and some basic Unicode drawing characters?

To be honest, I'm not even sure if I understand the issue at hand, I would need to read Unicode spec/documentation or something of that nature.

The question is, is this for compiler internal use only, or is it supposed to be a string coloring library exposed for the public to be used? If it is compiler internal only, I think you would get almost away with ASCII only, When it is a public API, you have to dig into the Unicode spec/documentation, there is no way to avoid its complexity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it is most for compiler use, just like experimental/diff, in the future it can be expanded as necessary and moved out of experimental


of false:
fg*: ForegroundColor
bg*: BackgroundColor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 bit colors are usually used to save storage space, but with this condition, no storage space will be saved at all.

Copy link
Collaborator Author

@haxscramper haxscramper Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I'm not aiming to save storage space here, this is not a high-performance string handling library, it is a convenience wrapper that would make it easier to write colored output

func `+`*(text: ColRuneLine, style: ColStyle): ColRuneLine =
for ch in text:
result.add ch + style

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so many + operators. Nim uses & for string concatenation. Why differ here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is not a string concatenation, it is a text + style operation. I don't specifically care what operator to use here, bit & wasn't used specifically because it is not a concatenation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use the mathematical + operator for this. Using operators this way, was the primary reason many programming languages abandoned operator overloading entirely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What operator do you want to use then? Except for & which is used to denote concatenation, which is not what is going on here.

Copy link
Collaborator Author

@haxscramper haxscramper Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ is already used for sets, and for style + style operations it is exactly what happens - I merge to sets of formatting options.

Using & for text + style will prevent code like "red" + fgRed & "blue" + fgBlue and require ("red" & fgRed) & ("blue" & fgBlue) for no apparent benefit

% might be used for text % style it has higher precedence, and already used for string interpolation in std/strutils and generally is not considered to be commutative (C uses it for mod etc.), so probably is a good solution here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually mixing % and + suffers from the same issues - need to add extraneous () all over the place to avoid compilation errors, so it can be any single operator that is not & and has a higher precedence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure yet. I don't know about any established style operators yet. Generally I would prefer a real verb instead of an operator. From my experience in language design it is as important to read something as it is to write it.

@saem
Copy link
Collaborator

saem commented Jan 18, 2022

Generally speaking, I am not a fan of this. It adds several data structures and 765 lines of code just to handle colored text in a very inefficient form. Then it seems to oversee many attributes of unicode and how styling text is done in general, that I can't approve of this addition.

Design improvements, if we're running into performance issues makes sense. With that said, reading the PR description we've got colouring in many locations, so implementing that in 4 plus places I think some abstraction makes sense.

I'm trying to think about the API and if it precludes optimization. But it seems to create a template and we interpolate into it to get coloured output.

@haxscramper
Copy link
Collaborator Author

I've added colored to Nim as well

You mean this one, correct? nim-lang/Nim@20a21aa

Then it seems to oversee many attributes of Unicode and how styling text is done in general,

I don't understand what attributes of Unicode I'm missing, or what I'm missing from text styling. I do support all 16-color and 256-color schemes, as well as all regular text styling attributes.

I've added colored to Nim as well, in astalgo all it does is it declares some constants in AnsiEscapes and adds them to the strings when needed.

That suffers from exactly the same issue as I've described in the PR title - all code is littered with logical checks, you can't remove colors after they have been added, you can't do anything else on the text after it has been colorized (alignment does not work properly, you can't pad things left/right, get length etc., only append new text)

Again, this is not a high-performance string handling library, it is a create-and-write-out solution. I don't plan to store these strings anywhere, or try to constantly rearrange them beyond what is minimally necessary to produce formatted output. I can try and squeeze a couple of bytes from the color handling, for example by using 64-bit integer and some bit hacking if you want this, but considering it is almost guaranteed that I will call stdout.write or stderr.write afterwards for each constructed string, I see little to no reason to do these micro-optimizations and the do IO that would dwarf all the runtime.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Jan 18, 2022

@saem

But it seems to create a template and we interpolate into it to get coloured output.

proc treeRepr*(
    pnode: PNode,
    pathIndexed: bool     = false,
    positionIndexed: bool = true,
    maxdepth: int         = 120,
    maxlen: int           = 30,
    lineInfo: bool        = false
  ): ColoredText =

  coloredResult()

  proc aux(n: PNode, level: int, idx: seq[int]) =
    let pref = joinPrefix(level, idx, pathIndexed, positionIndexed)

    add pref
    if isNil(n):
      add "<nil>" + fgRed
      return

    if level > maxdepth:
      add " ..."
      return

    add hshow(n.kind)

    proc addComment(sep: bool = true) =
      if n.comment.len > 0:
        add "\n"
        for idx, line in enumerate(
          split(n.comment.strip(leading = false), '\n')
        ):
          if idx > 0: add "\n"
          add pref & "  # " & line + fgCyan

      elif sep:
        add " "

    proc addFlags() =
      if not isNil(n.typ):
        add " "
        add n.typ.lispRepr(colored)

      if n.flags.len > 0:
        add " nflags:" & to8Bit($n.flags, 2, 0, 3)


    if lineInfo:
      add "@"
      add $n.info.fileIndex.int + fgBlue
      add "/"
      add $n.info.line + fgCyan
      add ":"
      add $n.info.col + fgCyan
      add " "

    case n.kind:
      of nkStrKinds:
        add " "
        add "\"" & n.getStrVal() + fgYellow & "\""
        addFlags()
        addComment()

      of nkIntKinds:
        add " "
        add $n.intVal + fgBlue
        addFlags()
        addComment()

      of nkFloatKinds:
        add " "
        add $n.floatVal + fgMagenta
        addFlags()
        addComment()

      of nkIdent:
        add " "
        add n.getStrVal() + fgGreen
        addFlags()
        addComment()

      of nkSym:
        add " "
        add n.getStrVal() + fgCyan
        add " sk:"
        add ($n.sym.kind)[2 ..^ 1] + fgCyan

        if n.sym.flags.len > 0:
          add " flags:" & to8Bit($n.sym.flags, 2, 0, 3)

        if n.sym.magic != mNone:
          add " magic:" & to8Bit($n.sym.magic, 2, 0, 5)

        addFlags()
        addComment()

      of nkCommentStmt:
        addFlags()
        addComment()

      else:
        discard


    if n.kind notin nkTokenKinds:
      addFlags()
      if n.len > 0:
        add "\n"

      addComment(false)

      for newIdx, subn in n:
        aux(subn, level + 1, idx & newIdx)
        if level + 1 > maxDepth:
          break

        if newIdx > maxLen:
          break

        if newIdx < n.len - 1:
          add "\n"

  aux(pnode, 0, @[])

example API usage - I've decided to not make add varargs, because template does not work very good with these. Maybe add should've been a proc though

@haxscramper haxscramper force-pushed the colortext branch 2 times, most recently from 1cccd6f to 1f8b971 Compare January 18, 2022 13:21
@krux02
Copy link
Contributor

krux02 commented Jan 18, 2022

Just as an inspiration, nos as, this is how it should be done.

In emacs a styled string is a tuple of a string, and a data structure that contains all styling for that string. So the split is at a higher level, rather than at a lower level. This that also allows an easy style removal while also being effective when there is little style to be applied. The cost isn't spent per character.

But this is also only how emacs does it for string literals, styled text within buffers is probably stored somewhat differently, otherwise insertion would break all these styling properties all the time.

There is certainly something to read about how text styling and alignment can be done.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Jan 18, 2022

Yes, if this actually happens to be a bottleneck (which I really doubt), we can invest more time optimizing this. Pretty sure there are a lot of other places where we can speed up the compiler, and I'd rather prefer to make accent of making error message better rather than failing faster. I don't especially care how quickly a compiler can tell me my code is wrong, I care how much time do I need to spend fixing it up (and understanding what was wrong) - and by extensions, it should be easy for compiler authors to iterate on the messages.

@krux02
Copy link
Contributor

krux02 commented Jan 18, 2022

Yes, if this actually happens to be a bottleneck (which I really doubt), we can invest more time optimizing this. Pretty sure there are a lot of other places where we can speed up the compiler, and I'd rather prefer to make accent of making error message better rather than failing faster. I don't especially care how quickly a compiler can tell me my code is wrong, I care how much time do I need to spend fixing it up (and understanding what was wrong) - and by extensions, it should be easy for compiler authors to iterate on the messages.

I've actually measured the compiler performance in the past and inefficient data structures for text generation (mostly tiny ropes for backend code) is actually affecting the performance measurably. I know this is for error messages and not for code generation, but it just happend to be two days ago that I actually went in the compiler and just print something on all the hidden reports that the compiler generates but doesn't print on screen (for whatever reason). I wanted to find out why tunittests doesn't print a deprecation warning for the usage of macros.callsite, which is used in unittests. And damn it was a lot of messages, I did not expect it. I didn't save them though because I focused on that deprecation warning alone, so sorry I can't give you an exact number. I can't do it right now, but maybe later I can come back with an exact number.

Anyway, I think I was annoying enough for now.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Jan 18, 2022

print something on all the hidden reports that the compiler generates but doesn't print on screen (for whatever reason)

Yes, with the old approach to error generation it would've been a more noticeable performance hit, but with structured reports it becomes completely irrelevant, as we only pay for messages that actually are going to be displayed and nothing else.

The old system eagerly formatted all the reports and had randomly placed 'optimization' hacks for compiles(), tryConstExpr(), presentFailedCandidates() to partially mitigate this.

@haxscramper haxscramper force-pushed the colortext branch 4 times, most recently from 746b4de to 78b20ca Compare January 19, 2022 12:33
@haxscramper
Copy link
Collaborator Author

haxscramper commented Jan 19, 2022

Checklist done, ready to be reviewed and merged.

@haxscramper haxscramper force-pushed the colortext branch 4 times, most recently from 8e0352c to 4dd9e3e Compare January 19, 2022 16:17
Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick skim of the docs. Will go through it again tomorrow, as in struggling to hold things in my head right now (tired).

lib/experimental/colortext.nim Outdated Show resolved Hide resolved
lib/experimental/colortext.nim Outdated Show resolved Hide resolved
lib/experimental/colortext.nim Outdated Show resolved Hide resolved
lib/experimental/colortext.nim Show resolved Hide resolved
lib/experimental/colortext.nim Outdated Show resolved Hide resolved
lib/experimental/colortext.nim Outdated Show resolved Hide resolved
lib/experimental/colortext.nim Show resolved Hide resolved
lib/experimental/colortext.nim Outdated Show resolved Hide resolved
lib/experimental/colortext.nim Outdated Show resolved Hide resolved
lib/experimental/colortext.nim Outdated Show resolved Hide resolved
Add colored text handling library for use in message reporting, treeRepr
pretty printing, `compiler/debugutils.render()`, testament diff,
spellsuggest corrections and more.

Basic design overview: `seq[Rune + Styling]` is a "colored text". Rune is
used to allow proper formatting of the Unicode text. Helper procs to merge
string primitives and formatting styles (`"test" + fgRed`) and do basic
formatting operations (align left/right/center, indent)
@haxscramper
Copy link
Collaborator Author

bors r+

@bors
Copy link
Contributor

bors bot commented Jan 21, 2022

Build succeeded:

@bors bors bot merged commit e6119b4 into nim-works:devel Jan 21, 2022
@haxscramper haxscramper added the stdlib Standard library label Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Standard library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants