Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured reports #94

Merged
merged 1 commit into from
Jan 17, 2022
Merged

Structured reports #94

merged 1 commit into from
Jan 17, 2022

Conversation

haxscramper
Copy link
Collaborator

@haxscramper haxscramper commented Dec 4, 2021

Full rework of the compiler message handling pipeline. Remove old-style message generation that was based purely on strings that were formatted in-place, and instead implement structured logging where main compiler code only instantiates objects with required information.

This would allow to

  • decouple error presentation logic from the semantic checking and other parts of the compiler. Added bonus: user-facing error messages can be improved much faster, since changes don't have to be propagated through the test suite each time.
  • allow compilation reports to be printed out as a stream of S-expression or JSON lines
  • Added bonus: user-facing error messages can be improved much faster, since changes don't have to be propagated through the test suite each time.

In addition to architectural improvements this PR also reorganizes and documents various error kinds. Whole thing has to be rewritten from the ground up, so RFCs like
nim-lang/RFCs#323 nim-lang/RFCs#324 nim-lang/RFCs#325 will be implemented as a collateral

Decoupling of the data and presentation also allows to properly test compilation errors, warnings and hints generated by the compiler and finally untie compiler unit tests from the hardcoded string formatting in the semantic checking phase.

This PR is an orthogonal to the nkError project, and is only concerned with a content of the error report, while nkError is all about presence of errors during compilation.



type
ReportLineInfo* = object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because of reasons we're going to bias towards report points, line and column pair, but do we want to support the option to have a report span, so we have a start and end line and column pairs?

This would be in a variant because sometimes points still make sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it into a combination of a separate types for point and range that are combined in a single variant - this way cases where explicit range or point is necessary can still be handled correctly (without creating two different alternatives).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only report event I can recall right now, one that would actually require spans, is a "but expression <XXX> has a type" (ideally the report should contain both expanded an unexpanded form, so people could make sense of this mess - mapIt().filterIt() generates absolutely unreadable garbage mountain when used)

## This module provides type definitions for all structured report entries
## that compiler can provide.
##
## Note that this module specifically does not import anything else from
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways this can be handled - either by getting all the necessary information when we need it and populating various string fields (for type names, arguments and so on), effectively replicating a section of the compiler data types, or by importing ast.nim and adding it to the reports. First approach would make it easier to write out full report (in json), and generate pretty-printed version of it (no need to pull in ModuleGraph and ConfigRef to be able to print things out), but it will require some duplication.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to see if I can find a good example of how one report can be generated with these two approaches and then figure out which one is more promising.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main use case for the import-ast solution is reporting of the overload resolution errors - "but expression has type", known proc signatures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msgs.writeContex() (and some other procedures) report information in form of "XXX of <name of the procedure/template/macro>, maybe with defined in <location>. Report might contain type like

SemReportEntry = object
  name*: string
  kind*: enum = sreProc, sreMacro, sreField, sreXXX ...
  definedIn*: ReportLineInfo
  sigDescription*: <object that encodes whole description of the used entry signature>
                   <for example for procs it would be an arg/return pragmas and so on>

If ast is used we would need to reconstruct the whole "sig description" report using procs that traverse the AST and get necessary information for pretty printing, and this kind of thing would be very annoying to serialize properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to decouple these sections of the compiler at the cost of some repetition, since we would still have to do exactly the same thing for pretty-printing or serialization, so why not prepare the data fully beforehand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modularity and data oriented go hand in hand I find. Basically the nkError stuff bright back a whole bunch of old lessons for me confined with a bunch of other old lessons from seemingly unrelated experience. I'll speak generally first, then try and go more specific.


general

Generally speaking I think module decomposition should work like so:

  1. Write a module that covers a while topic or responsibility and call it good

Once the dependencies start arising out they're there from the get go, because existing code the fuller decomposition of modules is more like this:

  • data module: fundamental data for the topic, optimized for write or reporting of that's much more expensive
  • command or write module: contains the constructors, necessary or minimal read functionality for queries required for write decision making, traversal, etc; all the sort of create commands
  • effects or reporting or read module: all the heavy reads and consequent actions, uses the above module to traverse and decide where necessary; might produce secondary or additional data structures related to its bookkeeping

The various dependencies force a separation like this anyhow because otherwise you end up with recursive import issues.

So now back to this, the current AST isn't data oriented, but also not entirely the worst, and this has a mild cascade effect that it as some chonky boys like ConfigRef and friends to come along for the ride. The abstractions to work around this aren't great either.


specific

I'm always more partial to fundamental fixes, which I think would begin with wrapping all node construction with a config aware constructor. That would let us use the config as a context which will enable making the whole AST data oriented. Locating all the construction sites might not be too bad, if we can leverage error or deprecation pragma or some other trickery.

If the above feels like a mountain then I would head the data duplication route, but make this data oriented so we can more easily converge later, somehow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't exactly go into fundamental fixes there, because it would create this crooked intermediate state where I spend just as much time making work continiously instead of burning this whole contraption to the ground. Example - "template/generic instantiation of" is built on top of the construction that essentially appears to be a stack of pairs that are push-pop edited via msgs.pushInfoContext and msgs.popInfoContext. This stack contains strings, or "detail" information, which is generated immediately in various places. So I need to get rid of this string as well, otherwise I'm unable to even construct SemContext object meaningfully. I could've just made it into a string and passed it along, but this feels like an absolute hack that I would have to remove anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but make this data oriented so we can more easily converge later, somehow.

Can you elaborate on this one specifically - does my approach seem like a move in a right direction for that, or I need to change something? Because right now I just want to go in the "Create an object that thoroughly describes the report" route. Later this object is simply converter to a string (for pretty-printing) or json (for structured reporting)

Copy link
Collaborator

@saem saem Dec 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe (as in - syms are not invalidated at some point?) to push syms into configRef

Yes, I believe this is safe, the only time things get a bit weird is that we could be in metaprogramming land, like a compilation inside a compilation, and so the inner stuff might be throw away. If we hang onto a ref of the symbols from the inner speculative compilation then we will have potential space leak (that the right term?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this one specifically - does my approach seem like a move in a right direction for that, or I need to change something? Because right now I just want to go in the "Create an object that thoroughly describes the report" route. Later this object is simply converter to a string (for pretty-printing) or json (for structured reporting)

From a strict performance perspective it isn't. I don't care about that and it totally misses the primary benefit of data oriented design.

I'm looking at the data and seeing about normalizing it, mostly. You've got no refs which is a good sign. The only thing that stands out is the inheritance, which isn't always a non-starter, and I'd consider the look aside arrays/tables approach, but I'm not a purist. I see it potentially needing to change but I'll admit this might be easier to read.

For me I'd say yes for now, but watch out if you feel the urge to add refs in the middle of the data.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 5, 2021

I'm not exactly sure if separation of report kinds by topics is the right approach - it makes creation and processing more meaningful, but that has to deal with setting/unsetting various hints will be somewhat more ugly. Previously, absolutely all compilation messages were stored in the TMsgKinds, including things that should probably be a configuration options (like hintSource). errGenerated is a complete disaster - it has to be split in ~80 separate enum values that actually make report contextually valuable. With current approach, each compiler subsystem has its own XXXReport type - SemReport, LexerReport, BackendReport and so on., which makes it easier to add more specific fields, especially compared to a single object with single enum type that ranges over all pipeline elements at once. But if I want to set the hint from the string (via --hint[] flag) I don't really have a distinct mapping - "HintLineTooLong" belongs to a LexerReportKind, but "HintPattern" is a sem layer (hlo.nim pass when TRM macros are expanded).


UPD: this problem was solved by using a single enum and defining object variants for each category. Then adding helper asserts for the valid ranges of each report category. Ideally this could've been fixed by using range[CatMin .. CatHigh] as a discriminant, but this is currently not allowed because discriminant must have a low() equal to zero.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 5, 2021

As per our earlier discussion, I think that nkError node should store a Report object, although not itself, but rather index into the table of reports, since doing otherwise would either needlessly increase the size of the PNode, or use ref Report which is not in the "middle" of the data, is on the very top. It still seems like .sons field must be accessible in the nkError, so maybe something like makes sense?

    else:
      report*: ref Report ## Report associated with a node (can (in most cases is) be nil)
      sons*: TNodeSeq

or

    else:
      report*: ReportId ## Id of the report, the real list is in the `.msgContext` field in
                        ## the config.
      sons*: TNodeSeq

compiler/reports.nim Outdated Show resolved Hide resolved
@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 7, 2021

image

Brought to you by

const
  errInvalidCmdLineOption = "invalid command line option: '$1'"
  errOnOrOffExpectedButXFound = "'on' or 'off' expected, but '$1' found"
  errOnOffOrListExpectedButXFound = "'on', 'off' or 'list' expected, but '$1' found"
  errOffHintsError = "'off', 'hint' or 'error' expected, but '$1' found"

const
  errNoneBoehmRefcExpectedButXFound = "'arc', 'orc', 'markAndSweep', 'boehm', 'go', 'none', 'regions', or 'refc' expected, but '$1' found"
  errNoneSpeedOrSizeExpectedButXFound = "'none', 'speed' or 'size' expected, but '$1' found"
  errGuiConsoleOrLibExpectedButXFound = "'gui', 'console' or 'lib' expected, but '$1' found"
  errInvalidExceptionSystem = "'goto', 'setjump', 'cpp' or 'quirky' expected, but '$1' found"

@haxscramper
Copy link
Collaborator Author

This would also allow ensuring that the compiler is exhaustively tested for all possible user-facing reports, previously it would've been a test to one's sanity (and continuous maintenance burden that would make changing error messages much harder)

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 10, 2021

And ensure everything is documented, what tf is "union type may not have an object header" is even responsible for? It was added here, 21fc8b4 of course ctrlf for this commit includes one occurence, so my assumption is that this supposedly user-facing (emitted via localError, not internalError) might potentially bubble up for users to see (although right now I'm not even sure what "object header" is exactly - I know object, I know {.header: - is that object with header? Or some 'header' part of an object type?).


UPD: yes, "object header" is not a {.header, it is an implicit chunk of data that is added from the inheritance (maybe in some other contexts as well)

type
  A {.union.} = object of RootObj
    field: int
    field2: float

Crude search for tests against this error rg 'object\sheader' shows it is not tested (with current test architecture, I would've found errormsg: "union type may not have an object header" or something similar.

lib/system/alloc.nim
968:    sysAssert(cast[ptr FreeCell](x).zeroField == 1, "dealloc: object header corrupted")

lib/system/arc.nim
13:at offset 0 then. The ``ref`` object header is independent from the
51:    rc: int # the object header is now a single RC field.

compiler/sizealignoffsetimpl.nim
379:          localError(conf, info, "union type may not have an object header")

tests/misc/tsizeof.nim
326:      # only there the object header is smaller than the first member

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 10, 2021

When I stated this PR, I thought of it as a detour that I might take before returning to the https://github.com/nim-works/nimskull/projects/2 PR, but now I understand it is a hard requirement to get any meaningful progress on the part of the specification that are concerned with errors, warnings and hints.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 10, 2021

I'm not sure if I have the capacity to clean up the reporting and think about global implications of various errors, so this part need to be revised separately as well - I'm adding tons of enums for different errors and diagnostics, but I'm pretty sure at least some of them can be generalized after careful inspection. Current implementation should still be considered a proof-of-concept wrt. to general design of the reporting system.

I will return to this when I get to the actual formatting part, which hopefully will happen soon enough. There are ~160 files in the compiler/ folder, I've progressed through ~30 of them, so I'm around 20% through with basic changes on this PR (file-wise, not SLOC-wise, that part will be much smaller - looking at the current trend of code edits, I would say that most of the remaining stuff is dumb edits like

-            localError(g.config, s.info,
-              "{.intdefine.} const was set to an invalid integer: '" &
-                g.config.symbols[s.name.s] & "'")
+            g.config.localError SemReport(
+              kind: rsemInvalidIntdefine, expressionStr: g.config.symbols[s.name.s])

@saem
Copy link
Collaborator

saem commented Dec 20, 2021

Caught up on this PR after a while, it's definitely headed in the right direction. The approach of focusing on the necessary mechanical cleanup of the callsites is looking quite promising.

@juancarlospaco
Copy link
Contributor

const
  errInvalidCmdLineOption = "invalid command line option: '$1'"
  errOnOrOffExpectedButXFound = "'on' or 'off' expected, but '$1' found"
  errOnOffOrListExpectedButXFound = "'on', 'off' or 'list' expected, but '$1' found"
  errOffHintsError = "'off', 'hint' or 'error' expected, but '$1' found"

I think if you gonna do it this way, for single formatted arguments,
work with the wording in a way that allows you to skip the formatting string ops at all, kinda like:

const someErr = "Invalid Foo in your Bar: "

echo someErr & wrongArgument

@haxscramper
Copy link
Collaborator Author

@juancarlospaco more like this.

proc report(rep: ExternallReport) = 
  case rep.kind:
    of rextUnexpectedValue:
      echo(
        "Unexpected value for ",
        $rep.cmdlineSwitch,
        " expected one of ",
        $rep.cmdlineAllowed,
        ", but found ",
        $rep.cmdlineProvided
      )

    of ...:
     ...

or this

proc report(rep: ExternallReport) = 
  case rep.kind:
    of rextUnexpectedValue:
      echo "Unexpected value for $1. Expected one of $2, but found $3 (did you mean $4)" % [
        $rep.cmdlineSwitch,
        $rep.cmdlineAllowed,
        $rep.cmdlineProvided,
        suggestCorrections(rep.cmdlineProvided, rep.cmdlineAllowed)
      ]

    of ...:
     ...

I see absolutely no value in moving things into const strings for templating, if the report is done a single function anyway. The current compiler probably used it because the same message could be generated in dozens of places, and the only option was to copy-paste the message (which was used pretty often), or move into const strings.

@haxscramper
Copy link
Collaborator Author

Structured reports would also allow us to actually implement a real typo suggestion - not just for missing identifiers, but also for fields, named arguments, obj.call() and so on. Command-line options.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 25, 2021

For many errors, I have absolutely no idea whether it is a user-facing error, or it is a "internal assert of some kind" that just somehow ended up using localError or globalError

    if result.kind notin {kind, skTemp}:
      localError(c.config, n.info, "cannot use symbol of kind '$1' as a '$2'" %
        [result.kind.toHumanStr, kind.toHumanStr])
    localError(c.config, n.info,
      "VM: immediate value does not fit into an int8")
  if d >= 0:
    globalError(c.config, n.info, "VM problem: dest register is set")

and so on

@haxscramper
Copy link
Collaborator Author

Current count: 328 different kinds of diagnostics

@haxscramper
Copy link
Collaborator Author

Current count: 373 different kinds of diagnostics

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 27, 2021

This PR can be applied in several steps - I did a nuclear refactor of the msgs.nim, which was mandatory to weed out all the string-based report handling, but in the end changes can be split into several steps:

  1. Reorganize compiler files and add report type definitions - test suite fully passes as it is simply file shuffling
  2. Replace string-based errors with strings wrapped in the Report nodes - test suite fully passes, as we just wrapped string into a special type.
  3. Replace all localError calls etc. with Reports (major part of this PR, and most time-consuming one). Test suite is completely broken due to changed error reporting strategies
  4. Automatically patch found error messages in the tests and examine for changes
    1. Tests must be automatically rewritable - thankfully that part of the testament is relatively separated. I need to replace discard """errmsg: ''""" with a new one
    2. New compiler output is structured, but for the time being it can be compared as a literal string. Implementing line-by-line diffing (for better test failure reporting) is not a hard prerequisite
    3. Manually examine all changed tests - previous compiler output did not have any structure, so the best thing we can do is to rely on the human checks

Change is fully applied, and nim compiler is transitioned to the structured output

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 27, 2021

I think steps 3 and 4 can be applied iteratively, although I don't see any real value in this, especially considering that the test suite itself is a mix of everything at once in every subdirectory.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 27, 2021

There must be some kind of error organization as well. Right now, compiler reports (generally speaking) things that are:

  • Expected (rsemExpectedObjectForOf, rsemExpectedTypeOrValue)
  • Disallowed (rsemDisallowedNilDeref)
  • Unexpected (rsemUnexpectedYield)
  • Invalid (rsemInvalidTupleSubscript), Illegal (rsemIllegalNimvmContext, rsemIllegalConversion`)
  • Cannot (rsemCannotBeConvertedTo)
  • Requires (rsemRequiresDeepCopyEnabled, rsemEnableNotNilExperimental) - ideally dozens of separate (and often broken) language dialects need to be phased out, but for a transition period warnings/errors are necessary.
  • General errors like rsemExprHasNoAddress, rsemIndexOutOfBounds

At least naming-wise those need to be stabilized somehow

Fixup also created several hints and diagnostics that were never configurable, and a couple existing ones have really weird names like rsemUnusedRaises = "Effect". For now, I used old string mappings, but in future those should be reconsidered as well

@haxscramper
Copy link
Collaborator Author

Current count: 493 different kinds of diagnostics

@haxscramper
Copy link
Collaborator Author

Current count: 524 different kinds of diagnostics

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 28, 2021

All report refactoring is in, total count of the compiler diagnostics is:

557

Implementation is not finished (I can't run the compiled compiler since I haven't implemented the reporting hook itself yet), but I think I will manage to implement proof-of-concept form today as well.

@haxscramper
Copy link
Collaborator Author

haxscramper commented Dec 28, 2021

Quick & dirty test for different modes of output (I omitted printing of the symbols/types/nodes as they require manual writing - PSym by itself contains ~15 different state fields, none of which are necessary for the end user)

S-expression (~75 lines to implement)

(Sem-XDeclaredButNotUsed :expression nil :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind XDeclaredButNotUsed :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/lib/system/mmdisp.nim" :line 17 :col 2)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/lookups.nim" :line 360 :col 12))
(Sem-XDeclaredButNotUsed :expression nil :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind XDeclaredButNotUsed :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/lib/system/mmdisp.nim" :line 38 :col 2)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/lookups.nim" :line 360 :col 12))
(Sem-XDeclaredButNotUsed :expression nil :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind XDeclaredButNotUsed :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/lib/system/mmdisp.nim" :line 39 :col 2)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/lookups.nim" :line 360 :col 12))
(Sem-XDeclaredButNotUsed :expression nil :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind XDeclaredButNotUsed :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/lib/system/osalloc.nim" :line 27 :col 6)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/lookups.nim" :line 360 :col 12))
(Sem-XDeclaredButNotUsed :expression nil :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind XDeclaredButNotUsed :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/lib/system/repr.nim" :line 145 :col 7)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/lookups.nim" :line 360 :col 12))
(Sem-XDeclaredButNotUsed :expression nil :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind XDeclaredButNotUsed :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/lib/system/seqs_v2_reimpl.nim" :line 19 :col 9)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/lookups.nim" :line 360 :col 12))
(Sem-XDeclaredButNotUsed :expression nil :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind XDeclaredButNotUsed :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/lib/system/sysstr.nim" :line 118 :col 5)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/lookups.nim" :line 360 :col 12))
(Sem-Processing :expression nil :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind Processing :processing (:isNimscript nil :importStackLen 1 :fromModule "" :isToplevel t :moduleStatus "import" :path "/mnt/workspace/github/nimskull/file.nim") :context nil :location nil :reportInst (:file "/mnt/workspace/github/nimskull/compiler/modulegraphs.nim" :line 607 :col 8))
(Sem-ProcessingStmt :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind ProcessingStmt :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/file.nim" :line 2 :col 2)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/passaux.nim" :line 29 :col 13))
(Sem-rsemUndeclaredField :expression (Ident field12) :expressionStr "" :rtype <type> :psym <sym> :msg "" :kind rsemUndeclaredField :context nil :location (:isRange nil :lpoint (:file "/mnt/workspace/github/nimskull/file.nim" :line 5 :col 6)) :reportInst (:file "/mnt/workspace/github/nimskull/compiler/semcall.nim" :line 379 :col 27))

json (for testing I reused serializer from hmisc, so it only took 4 lines to add)

{"category": "repSem", "semReport": {"kind": "XDeclaredButNotUsed", "expression": "[TODO write PNode]", "expressionStr": "", "rtype": "[TODO write PType]", "psym": "[TODO write PSym]", "msg": "", "context": [], "location": {"isRange": false, "lpoint": {"file": "/mnt/workspace/github/nimskull/lib/system/osalloc.nim", "line": 27, "col": 6}}, "reportInst": {"file": "/mnt/workspace/github/nimskull/compiler/lookups.nim", "line": 360, "col": 12}}}
{"category": "repSem", "semReport": {"kind": "XDeclaredButNotUsed", "expression": "[TODO write PNode]", "expressionStr": "", "rtype": "[TODO write PType]", "psym": "[TODO write PSym]", "msg": "", "context": [], "location": {"isRange": false, "lpoint": {"file": "/mnt/workspace/github/nimskull/lib/system/repr.nim", "line": 145, "col": 7}}, "reportInst": {"file": "/mnt/workspace/github/nimskull/compiler/lookups.nim", "line": 360, "col": 12}}}
{"category": "repSem", "semReport": {"kind": "XDeclaredButNotUsed", "expression": "[TODO write PNode]", "expressionStr": "", "rtype": "[TODO write PType]", "psym": "[TODO write PSym]", "msg": "", "context": [], "location": {"isRange": false, "lpoint": {"file": "/mnt/workspace/github/nimskull/lib/system/seqs_v2_reimpl.nim", "line": 19, "col": 9}}, "reportInst": {"file": "/mnt/workspace/github/nimskull/compiler/lookups.nim", "line": 360, "col": 12}}}
{"category": "repSem", "semReport": {"kind": "XDeclaredButNotUsed", "expression": "[TODO write PNode]", "expressionStr": "", "rtype": "[TODO write PType]", "psym": "[TODO write PSym]", "msg": "", "context": [], "location": {"isRange": false, "lpoint": {"file": "/mnt/workspace/github/nimskull/lib/system/sysstr.nim", "line": 118, "col": 5}}, "reportInst": {"file": "/mnt/workspace/github/nimskull/compiler/lookups.nim", "line": 360, "col": 12}}}
{"category": "repSem", "semReport": {"kind": "Processing", "expression": "[TODO write PNode]", "expressionStr": "", "rtype": "[TODO write PType]", "psym": "[TODO write PSym]", "msg": "", "processing": {"isNimscript": false, "importStackLen": 1, "fromModule": "", "isToplevel": true, "moduleStatus": "import", "path": "/mnt/workspace/github/nimskull/file.nim"}, "context": [], "location": null, "reportInst": {"file": "/mnt/workspace/github/nimskull/compiler/modulegraphs.nim", "line": 607, "col": 8}}}
{"category": "repCmd", "cmdReport": {"kind": "rcmdExecuting", "cmd": "gcc   -o /mnt/workspace/github/nimskull/file  /tmp/nimcache/stdlib_digitsutils.nim.c.o /tmp/nimcache/stdlib_assertions.nim.c.o /tmp/nimcache/stdlib_dollars.nim.c.o /tmp/nimcache/stdlib_system.nim.c.o /tmp/nimcache/@mfile.nim.c.o    -ldl", "msg": "", "code": 0, "context": [], "location": null, "reportInst": {"file": "/mnt/workspace/github/nimskull/compiler/extccomp.nim", "line": 405, "col": 6}}}
{"category": "repInternal", "internalReport": {"kind": "rintSuccessX", "msg": "", "buildParams": {"isCompilation": true, "project": "/mnt/workspace/github/nimskull/file.nim", "output": "/mnt/workspace/github/nimskull/file", "mem": 69410816, "isMaxMem": true, "sec": 1.179445743560791, "threads": false, "backend": "", "buildMode": "debug", "optimize": "debug", "gc": "refc"}, "context": [], "location": null, "reportInst": {"file": "/mnt/workspace/github/nimskull/compiler/msgs.nim", "line": 799, "col": 6}}}

@haxscramper
Copy link
Collaborator Author

Change is fully applied, and nim compiler is transitioned to the structured output

After spending some more time in the error generation, I think this approach is a little too idealistic. If we had a proper test suite that allowed us to safely do the compiler refactoring, this would've been a non-issue, but considering we test at most sem/lexer/parser, and not every possible report at that, I don't want to take any changes and go with "smart" plans. For now, I will reroll old error output as close as I possibly can - even with this, benefits of structured reporting would be accessible, since ALL pretty-printing will still happen in the single file and single callback that we can iterate on (for better error formatting), or swap out at moment's notice (json and S-expression reports would be readily available at all times).

@haxscramper haxscramper force-pushed the structured-reports branch 2 times, most recently from ac794e6 to d33eca4 Compare January 4, 2022 09:13
@haxscramper haxscramper marked this pull request as ready for review January 4, 2022 11:10
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 16, 2022
Full rework of the compiler message handling pipeline. Remove old-style message generation that was
based purely on strings that were formatted in-place, and instead implement structured logging where
main compiler code only instantiates objects with required information.

Explanation of changes for this commit will be split into several sections, matching number of edit
iterations I had to make in order for this to work properly.

* File reorganization

In addition to the architectural changes this PR requires some type definition movements.

- PNode, PSym and PType definitions (and all associated types and enums) were moved to the
  ast_types.nim file which is then reexported in the ast.nim
- Enum defininitions for the nilability checking were moved to nilcheck_enums.nim and then
  reexported
- Enum definitions for the VM were moved to to vm_enums.nim

* New files

- Main definition of the report types is provided in the reports.nim file, together with minor
  helper procedures and definition of the ReportList type. This file is imported by options, msgs
  and other parts of the compiler.
- Implementation of the default error reporting is provided in the cli_reporter.nim - all
  disguisting hardcoded string hacks were moved to this single file. Now you can really witness the
  "error messages are good enough for me" thinking that prevailed in the compiler UX since it's
  inception.

* Changed files

Most of the changes follow very simple pattern - old-style hardcoded hacks are replaced with
structural report that is constructed in-place and then converted to string /when necessary/. I'm
pretty sure this change already puts me close to getting CS PHD - it was *unimaginably hard* to
figure out that hardcoding text formatting in place is not the best architecture. Damn, I can
probably apply to the nobel prize right now after I figure that out.

** Changes in message reporting pipeline

Old error messages where reportined via random combinations of the following:

- Direct calls to `msgs.liMessage` proc - it was mostly used in the helper templates like
  `lintReport`
- `message`
- `rawMessage`
- `fatal`
- `globalError` - template for reporting global errors. Misused to the point where name completely
  lost it's meaning and documentation does not make any sense whatsoever. [fn:global-err]
- `localError` - template for reporting necessary error location, wrapper around `liMessage`
- `warningDeprecated` - used two times in the whole compiler in order to print out error message
  about deprecated command switches.

Full pipeline of the error processing was:

- Message was generated in-place, without any colored formatting (was added in `liMessage`)
  - There were several ways message could be generated - all of them were used interchangeably,
    without any clear system.
    1. Some file had a collection of constant strings, such as `errCannotInferStaticParam = "cannot
       infer the value of the static param '$1'"` that were used in the `localReport` message
       formatting immediately. Some constants were used pretty often, some were used only once.
    2. Warning and hint messages were categorized to some extent, so and the enum was used to
       provide message formatting via `array[TMsgKind, string]` where `string` was a `std/strutils`
       formatting string.
    3. In a lot of cases error message was generated using hardcoded (and repeatedly copy-pasted)
       string
- It was passed down to the `liMessage` (removed now) procedure, that dispatched based on the global
  configuration state (whether `ignoreMsgBecauseOfIdeTools` holds for example)
- Then message, together with necessary bits of formatting (such as `Hint` prefix with coloring) was
  passed down to the `styledMsgWriteln(args: varargs[typed])` template, whcih did the final dispatch
  into
- One of the two different /macros/ for writing text out - `callStyledWriteLineStderr` and
  `callIgnoringStyle`.
  - Dispatch could happen into stdout/stderr or message writer hook if it was non-nil
- When message was written out it went into `writeLnHook` callback (misused for
  `{.explain.}` [fn:writeln-explain]) (hacked for `compiles()` [fn:writeln-compiles]) and was
  written out to the stdout/stderr.

It is now replaced with:

- `Report` object is instantiated at some point of a compilation process (it might be an immediate
  report via `localError` or delayed via `nkError` and written out later)
- `structuredReportHook` converts it to string internally. All decitions for formatting, coloring
  etc. happen inside of the structured report hook. Where to write message, which format and so on.
- `writeLnHook` /can/ be used by the `structuredReportHook` if needed - right now it is turned into
  simple convenience callback. In future this could even be replaced with simple helper proc, for
  now I left it as it is now because I'm not 100% sure that I can safely remove this.

** Changes in the warning and hint filtering

Report filtering is done in the particular *implementation* of the error hook -
`options.isEnabled()` provides a default predicate to check whether specific report can be written
out, but it must still be called manually. This allows to still see all the reports if needed. For
example, `cli_reporter.reportHook()` uses additional checks to force write some reports (such as
debug trace in `compiles()` context).

Most of the hint and warning were already categorized, altough some reports had to be split into
several (such as `--hint=Performance` that actually controlled reports for `CopiesToSink` and
`CannotMakeSink`, `--hint=Name` that controlled three reports).

None of the errors were categorized (reports from the `nkError` progress was incorporated, but in
I'm mostly describing changes wrt. to old reporting system) and all were generated in-place

** Minor related changes

- List of allowed reports is now stored in the `noteSets: array[ConfNoteSet, ReportKinds]` field of
  the `ConfigRef`, instead of *seven* separate feilds. Access is now done via getter/setter procs,
  which allows to track changes in the current configuration state.
- Renamed list of local options to `localOptions` - added accessors to track changes in the state.
- Move all default report filter configuration to `lineinfos.computeNotesVerbosity()` - previously
  it was scattered in two different places: `NotesVerbosity` for `--verbosity:N` was calculated in
  `lineinfos`, foreignPackageNotesDefault was constructed in `options`
- Changed formatting of the `internalAssert` and `internalError` messages
- Removed lots of string formatting procs from the various `sem*` modules - ideally they should
  *all* be moved to the `cli_reporter` and related.

-------

Additional notes

[fn:global-err] As I said earlier, `globalError` was misused - it is still not possible to fully get
rid of this sickening hack, since it is used for /control flow/ (you read this correct - it is an
error reporting template, and it is used for control flow. Wonderful design right?).

So, in short - `globalError` can raise `ERecoverableError` that can be caught in some other layer
(for example `sem.tryConstExpr` abuses that in order to bail out (pretty sure it was done as an
'optimization' of some sort, just like 'compiles') when expression fails to evaluate at
compile-time [fn:except])

[fn:except] nim-works#94 (comment)

[fn:writeln-explain] Of course you can't have a proper error reporting in the nim compiler, so this
hook was also misused to the point of complete nonsense. Most notable clusterfuck where you could
spot this little shit is implementation of `{.explain.}` pragma for concepts. It was implemented via
really 'smart' (aka welcome to hell) solution in

nim-works@74a8098 (sigmatch
~704) and then further "improved" in
nim-lang/Nim@fe48dd1 by slicing out parts
of the error message with `let msg = s.replace("Error:", errorPrefix)`

For now I had to reuse similar hack - I *do* override error reporting hook in order to collect all
the missing report information. In the future it would be unnecessary - when concept is matched it's
body will be evaluated to `nkError` with reports that are written out later.

[fn:writeln-compiles] when `compiles()` check is executed, all error output is temporarily disabled
(both stderr and stdout) via setting allowed output flags to `{}` (by default it is set to
Copy link
Contributor

@alaviss alaviss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I got out from a quick skim

.gitignore Outdated Show resolved Hide resolved
compiler/ast.nim Show resolved Hide resolved
compiler/ast_types.nim Show resolved Hide resolved
compiler/cli_reporter.nim Show resolved Hide resolved
compiler/cli_reporter.nim Show resolved Hide resolved
compiler/nim.cfg Outdated Show resolved Hide resolved
compiler/options.nim Show resolved Hide resolved
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 16, 2022
Full rework of the compiler message handling pipeline. Remove old-style message generation that was
based purely on strings that were formatted in-place, and instead implement structured logging where
main compiler code only instantiates objects with required information.

Explanation of changes for this commit will be split into several sections, matching number of edit
iterations I had to make in order for this to work properly.

* File reorganization

In addition to the architectural changes this PR requires some type definition movements.

- PNode, PSym and PType definitions (and all associated types and enums) were moved to the
  ast_types.nim file which is then reexported in the ast.nim
- Enum defininitions for the nilability checking were moved to nilcheck_enums.nim and then
  reexported
- Enum definitions for the VM were moved to to vm_enums.nim

* New files

- Main definition of the report types is provided in the reports.nim file, together with minor
  helper procedures and definition of the ReportList type. This file is imported by options, msgs
  and other parts of the compiler.
- Implementation of the default error reporting is provided in the cli_reporter.nim - all
  disguisting hardcoded string hacks were moved to this single file. Now you can really witness the
  "error messages are good enough for me" thinking that prevailed in the compiler UX since it's
  inception.

* Changed files

Most of the changes follow very simple pattern - old-style hardcoded hacks are replaced with
structural report that is constructed in-place and then converted to string /when necessary/. I'm
pretty sure this change already puts me close to getting CS PHD - it was *unimaginably hard* to
figure out that hardcoding text formatting in place is not the best architecture. Damn, I can
probably apply to the nobel prize right now after I figure that out.

** Changes in message reporting pipeline

Old error messages where reportined via random combinations of the following:

- Direct calls to `msgs.liMessage` proc - it was mostly used in the helper templates like
  `lintReport`
- `message`
- `rawMessage`
- `fatal`
- `globalError` - template for reporting global errors. Misused to the point where name completely
  lost it's meaning and documentation does not make any sense whatsoever. [fn:global-err]
- `localError` - template for reporting necessary error location, wrapper around `liMessage`
- `warningDeprecated` - used two times in the whole compiler in order to print out error message
  about deprecated command switches.

Full pipeline of the error processing was:

- Message was generated in-place, without any colored formatting (was added in `liMessage`)
  - There were several ways message could be generated - all of them were used interchangeably,
    without any clear system.
    1. Some file had a collection of constant strings, such as `errCannotInferStaticParam = "cannot
       infer the value of the static param '$1'"` that were used in the `localReport` message
       formatting immediately. Some constants were used pretty often, some were used only once.
    2. Warning and hint messages were categorized to some extent, so and the enum was used to
       provide message formatting via `array[TMsgKind, string]` where `string` was a `std/strutils`
       formatting string.
    3. In a lot of cases error message was generated using hardcoded (and repeatedly copy-pasted)
       string
- It was passed down to the `liMessage` (removed now) procedure, that dispatched based on the global
  configuration state (whether `ignoreMsgBecauseOfIdeTools` holds for example)
- Then message, together with necessary bits of formatting (such as `Hint` prefix with coloring) was
  passed down to the `styledMsgWriteln(args: varargs[typed])` template, whcih did the final dispatch
  into
- One of the two different /macros/ for writing text out - `callStyledWriteLineStderr` and
  `callIgnoringStyle`.
  - Dispatch could happen into stdout/stderr or message writer hook if it was non-nil
- When message was written out it went into `writeLnHook` callback (misused for
  `{.explain.}` [fn:writeln-explain]) (hacked for `compiles()` [fn:writeln-compiles]) and was
  written out to the stdout/stderr.

It is now replaced with:

- `Report` object is instantiated at some point of a compilation process (it might be an immediate
  report via `localError` or delayed via `nkError` and written out later)
- `structuredReportHook` converts it to string internally. All decitions for formatting, coloring
  etc. happen inside of the structured report hook. Where to write message, which format and so on.
- `writeLnHook` /can/ be used by the `structuredReportHook` if needed - right now it is turned into
  simple convenience callback. In future this could even be replaced with simple helper proc, for
  now I left it as it is now because I'm not 100% sure that I can safely remove this.

** Changes in the warning and hint filtering

Report filtering is done in the particular *implementation* of the error hook -
`options.isEnabled()` provides a default predicate to check whether specific report can be written
out, but it must still be called manually. This allows to still see all the reports if needed. For
example, `cli_reporter.reportHook()` uses additional checks to force write some reports (such as
debug trace in `compiles()` context).

Most of the hint and warning were already categorized, altough some reports had to be split into
several (such as `--hint=Performance` that actually controlled reports for `CopiesToSink` and
`CannotMakeSink`, `--hint=Name` that controlled three reports).

None of the errors were categorized (reports from the `nkError` progress was incorporated, but in
I'm mostly describing changes wrt. to old reporting system) and all were generated in-place

** Minor related changes

- List of allowed reports is now stored in the `noteSets: array[ConfNoteSet, ReportKinds]` field of
  the `ConfigRef`, instead of *seven* separate feilds. Access is now done via getter/setter procs,
  which allows to track changes in the current configuration state.
- Renamed list of local options to `localOptions` - added accessors to track changes in the state.
- Move all default report filter configuration to `lineinfos.computeNotesVerbosity()` - previously
  it was scattered in two different places: `NotesVerbosity` for `--verbosity:N` was calculated in
  `lineinfos`, foreignPackageNotesDefault was constructed in `options`
- Changed formatting of the `internalAssert` and `internalError` messages
- Removed lots of string formatting procs from the various `sem*` modules - ideally they should
  *all* be moved to the `cli_reporter` and related.

-------

Additional notes

[fn:global-err] As I said earlier, `globalError` was misused - it is still not possible to fully get
rid of this sickening hack, since it is used for /control flow/ (you read this correct - it is an
error reporting template, and it is used for control flow. Wonderful design right?).

So, in short - `globalError` can raise `ERecoverableError` that can be caught in some other layer
(for example `sem.tryConstExpr` abuses that in order to bail out (pretty sure it was done as an
'optimization' of some sort, just like 'compiles') when expression fails to evaluate at
compile-time [fn:except])

[fn:except] nim-works#94 (comment)

[fn:writeln-explain] Of course you can't have a proper error reporting in the nim compiler, so this
hook was also misused to the point of complete nonsense. Most notable clusterfuck where you could
spot this little shit is implementation of `{.explain.}` pragma for concepts. It was implemented via
really 'smart' (aka welcome to hell) solution in

nim-works@74a8098 (sigmatch
~704) and then further "improved" in
nim-lang/Nim@fe48dd1 by slicing out parts
of the error message with `let msg = s.replace("Error:", errorPrefix)`

For now I had to reuse similar hack - I *do* override error reporting hook in order to collect all
the missing report information. In the future it would be unnecessary - when concept is matched it's
body will be evaluated to `nkError` with reports that are written out later.

[fn:writeln-compiles] when `compiles()` check is executed, all error output is temporarily disabled
(both stderr and stdout) via setting allowed output flags to `{}` (by default it is set to
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 16, 2022
Full rework of the compiler message handling pipeline. Remove old-style message generation that was
based purely on strings that were formatted in-place, and instead implement structured logging where
main compiler code only instantiates objects with required information.

Explanation of changes for this commit will be split into several sections, matching number of edit
iterations I had to make in order for this to work properly.

* File reorganization

In addition to the architectural changes this PR requires some type definition movements.

- PNode, PSym and PType definitions (and all associated types and enums) were moved to the
  ast_types.nim file which is then reexported in the ast.nim
- Enum defininitions for the nilability checking were moved to nilcheck_enums.nim and then
  reexported
- Enum definitions for the VM were moved to to vm_enums.nim

* New files

- Main definition of the report types is provided in the reports.nim file, together with minor
  helper procedures and definition of the ReportList type. This file is imported by options, msgs
  and other parts of the compiler.
- Implementation of the default error reporting is provided in the cli_reporter.nim - all
  disguisting hardcoded string hacks were moved to this single file. Now you can really witness the
  "error messages are good enough for me" thinking that prevailed in the compiler UX since it's
  inception.

* Changed files

Most of the changes follow very simple pattern - old-style hardcoded hacks are replaced with
structural report that is constructed in-place and then converted to string /when necessary/. I'm
pretty sure this change already puts me close to getting CS PHD - it was *unimaginably hard* to
figure out that hardcoding text formatting in place is not the best architecture. Damn, I can
probably apply to the nobel prize right now after I figure that out.

** Changes in message reporting pipeline

Old error messages where reportined via random combinations of the following:

- Direct calls to `msgs.liMessage` proc - it was mostly used in the helper templates like
  `lintReport`
- `message`
- `rawMessage`
- `fatal`
- `globalError` - template for reporting global errors. Misused to the point where name completely
  lost it's meaning and documentation does not make any sense whatsoever. [fn:global-err]
- `localError` - template for reporting necessary error location, wrapper around `liMessage`
- `warningDeprecated` - used two times in the whole compiler in order to print out error message
  about deprecated command switches.

Full pipeline of the error processing was:

- Message was generated in-place, without any colored formatting (was added in `liMessage`)
  - There were several ways message could be generated - all of them were used interchangeably,
    without any clear system.
    1. Some file had a collection of constant strings, such as `errCannotInferStaticParam = "cannot
       infer the value of the static param '$1'"` that were used in the `localReport` message
       formatting immediately. Some constants were used pretty often, some were used only once.
    2. Warning and hint messages were categorized to some extent, so and the enum was used to
       provide message formatting via `array[TMsgKind, string]` where `string` was a `std/strutils`
       formatting string.
    3. In a lot of cases error message was generated using hardcoded (and repeatedly copy-pasted)
       string
- It was passed down to the `liMessage` (removed now) procedure, that dispatched based on the global
  configuration state (whether `ignoreMsgBecauseOfIdeTools` holds for example)
- Then message, together with necessary bits of formatting (such as `Hint` prefix with coloring) was
  passed down to the `styledMsgWriteln(args: varargs[typed])` template, whcih did the final dispatch
  into
- One of the two different /macros/ for writing text out - `callStyledWriteLineStderr` and
  `callIgnoringStyle`.
  - Dispatch could happen into stdout/stderr or message writer hook if it was non-nil
- When message was written out it went into `writeLnHook` callback (misused for
  `{.explain.}` [fn:writeln-explain]) (hacked for `compiles()` [fn:writeln-compiles]) and was
  written out to the stdout/stderr.

It is now replaced with:

- `Report` object is instantiated at some point of a compilation process (it might be an immediate
  report via `localError` or delayed via `nkError` and written out later)
- `structuredReportHook` converts it to string internally. All decitions for formatting, coloring
  etc. happen inside of the structured report hook. Where to write message, which format and so on.
- `writeLnHook` /can/ be used by the `structuredReportHook` if needed - right now it is turned into
  simple convenience callback. In future this could even be replaced with simple helper proc, for
  now I left it as it is now because I'm not 100% sure that I can safely remove this.

** Changes in the warning and hint filtering

Report filtering is done in the particular *implementation* of the error hook -
`options.isEnabled()` provides a default predicate to check whether specific report can be written
out, but it must still be called manually. This allows to still see all the reports if needed. For
example, `cli_reporter.reportHook()` uses additional checks to force write some reports (such as
debug trace in `compiles()` context).

Most of the hint and warning were already categorized, altough some reports had to be split into
several (such as `--hint=Performance` that actually controlled reports for `CopiesToSink` and
`CannotMakeSink`, `--hint=Name` that controlled three reports).

None of the errors were categorized (reports from the `nkError` progress was incorporated, but in
I'm mostly describing changes wrt. to old reporting system) and all were generated in-place

** Minor related changes

- List of allowed reports is now stored in the `noteSets: array[ConfNoteSet, ReportKinds]` field of
  the `ConfigRef`, instead of *seven* separate feilds. Access is now done via getter/setter procs,
  which allows to track changes in the current configuration state.
- Renamed list of local options to `localOptions` - added accessors to track changes in the state.
- Move all default report filter configuration to `lineinfos.computeNotesVerbosity()` - previously
  it was scattered in two different places: `NotesVerbosity` for `--verbosity:N` was calculated in
  `lineinfos`, foreignPackageNotesDefault was constructed in `options`
- Changed formatting of the `internalAssert` and `internalError` messages
- Removed lots of string formatting procs from the various `sem*` modules - ideally they should
  *all* be moved to the `cli_reporter` and related.

-------

Additional notes

[fn:global-err] As I said earlier, `globalError` was misused - it is still not possible to fully get
rid of this sickening hack, since it is used for /control flow/ (you read this correct - it is an
error reporting template, and it is used for control flow. Wonderful design right?).

So, in short - `globalError` can raise `ERecoverableError` that can be caught in some other layer
(for example `sem.tryConstExpr` abuses that in order to bail out (pretty sure it was done as an
'optimization' of some sort, just like 'compiles') when expression fails to evaluate at
compile-time [fn:except])

[fn:except] nim-works#94 (comment)

[fn:writeln-explain] Of course you can't have a proper error reporting in the nim compiler, so this
hook was also misused to the point of complete nonsense. Most notable clusterfuck where you could
spot this little shit is implementation of `{.explain.}` pragma for concepts. It was implemented via
really 'smart' (aka welcome to hell) solution in

nim-works@74a8098 (sigmatch
~704) and then further "improved" in
nim-lang/Nim@fe48dd1 by slicing out parts
of the error message with `let msg = s.replace("Error:", errorPrefix)`

For now I had to reuse similar hack - I *do* override error reporting hook in order to collect all
the missing report information. In the future it would be unnecessary - when concept is matched it's
body will be evaluated to `nkError` with reports that are written out later.

[fn:writeln-compiles] when `compiles()` check is executed, all error output is temporarily disabled
(both stderr and stdout) via setting allowed output flags to `{}` (by default it is set to
@haxscramper
Copy link
Collaborator Author

bors r+

bors bot added a commit that referenced this pull request Jan 16, 2022
94: Structured reports r=haxscramper a=haxscramper

Full rework of the compiler message handling pipeline. Remove old-style message generation that was based purely on strings that were formatted in-place, and instead implement structured logging where main compiler code only instantiates objects with required information.

This would allow to
- decouple error presentation logic from the semantic checking and other parts of the compiler. Added bonus: user-facing error messages can be improved much faster, since changes don't have to be propagated through the test suite each time.
- allow compilation reports to be printed out as a stream of S-expression or JSON lines
- Added bonus: user-facing error messages can be improved much faster, since changes don't have to be propagated through the test suite each time.

In addition to architectural improvements this PR also reorganizes and documents various error kinds. Whole thing has to be rewritten from the ground up, so RFCs like 
nim-lang/RFCs#323 nim-lang/RFCs#324 nim-lang/RFCs#325 will be implemented as a collateral

Decoupling of the data and presentation also allows to properly test compilation errors, warnings and hints generated by the compiler and finally untie compiler unit tests from the hardcoded string formatting in the semantic checking phase.

This PR is an orthogonal to the `nkError` project, and is only concerned with a *content* of the error report, while `nkError` is all about *presence* of errors during compilation. 

Co-authored-by: haxscramper <[email protected]>
@saem
Copy link
Collaborator

saem commented Jan 16, 2022

Bors r+

@bors
Copy link
Contributor

bors bot commented Jan 16, 2022

Already running a review

@haxscramper haxscramper mentioned this pull request Jan 16, 2022
52 tasks
@bors
Copy link
Contributor

bors bot commented Jan 16, 2022

Build failed:

# Path not registered in the filename table - most likely an
# instantiation info report location
if conf.filenameOption == foCanonical and
path.startsWith(compilerRoot):
Copy link
Contributor

@alaviss alaviss Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a bit more explaination (code comments) for this code path.

Also, the whole full source path query should be gated behind excessiveStackTrace as discussed in chat

Full rework of the compiler message handling pipeline. Remove old-style message generation that was
based purely on strings that were formatted in-place, and instead implement structured logging where
main compiler code only instantiates objects with required information.

Explanation of changes for this commit will be split into several sections, matching number of edit
iterations I had to make in order for this to work properly.

* File reorganization

In addition to the architectural changes this PR requires some type definition movements.

- PNode, PSym and PType definitions (and all associated types and enums) were moved to the
  ast_types.nim file which is then reexported in the ast.nim
- Enum defininitions for the nilability checking were moved to nilcheck_enums.nim and then
  reexported
- Enum definitions for the VM were moved to to vm_enums.nim

* New files

- Main definition of the report types is provided in the reports.nim file, together with minor
  helper procedures and definition of the ReportList type. This file is imported by options, msgs
  and other parts of the compiler.
- Implementation of the default error reporting is provided in the cli_reporter.nim - all
  disguisting hardcoded string hacks were moved to this single file. Now you can really witness the
  "error messages are good enough for me" thinking that prevailed in the compiler UX since it's
  inception.

* Changed files

Most of the changes follow very simple pattern - old-style hardcoded hacks are replaced with
structural report that is constructed in-place and then converted to string /when necessary/. I'm
pretty sure this change already puts me close to getting CS PHD - it was *unimaginably hard* to
figure out that hardcoding text formatting in place is not the best architecture. Damn, I can
probably apply to the nobel prize right now after I figure that out.

** Changes in message reporting pipeline

Old error messages where reportined via random combinations of the following:

- Direct calls to `msgs.liMessage` proc - it was mostly used in the helper templates like
  `lintReport`
- `message`
- `rawMessage`
- `fatal`
- `globalError` - template for reporting global errors. Misused to the point where name completely
  lost it's meaning and documentation does not make any sense whatsoever. [fn:global-err]
- `localError` - template for reporting necessary error location, wrapper around `liMessage`
- `warningDeprecated` - used two times in the whole compiler in order to print out error message
  about deprecated command switches.

Full pipeline of the error processing was:

- Message was generated in-place, without any colored formatting (was added in `liMessage`)
  - There were several ways message could be generated - all of them were used interchangeably,
    without any clear system.
    1. Some file had a collection of constant strings, such as `errCannotInferStaticParam = "cannot
       infer the value of the static param '$1'"` that were used in the `localReport` message
       formatting immediately. Some constants were used pretty often, some were used only once.
    2. Warning and hint messages were categorized to some extent, so and the enum was used to
       provide message formatting via `array[TMsgKind, string]` where `string` was a `std/strutils`
       formatting string.
    3. In a lot of cases error message was generated using hardcoded (and repeatedly copy-pasted)
       string
- It was passed down to the `liMessage` (removed now) procedure, that dispatched based on the global
  configuration state (whether `ignoreMsgBecauseOfIdeTools` holds for example)
- Then message, together with necessary bits of formatting (such as `Hint` prefix with coloring) was
  passed down to the `styledMsgWriteln(args: varargs[typed])` template, whcih did the final dispatch
  into
- One of the two different /macros/ for writing text out - `callStyledWriteLineStderr` and
  `callIgnoringStyle`.
  - Dispatch could happen into stdout/stderr or message writer hook if it was non-nil
- When message was written out it went into `writeLnHook` callback (misused for
  `{.explain.}` [fn:writeln-explain]) (hacked for `compiles()` [fn:writeln-compiles]) and was
  written out to the stdout/stderr.

It is now replaced with:

- `Report` object is instantiated at some point of a compilation process (it might be an immediate
  report via `localError` or delayed via `nkError` and written out later)
- `structuredReportHook` converts it to string internally. All decitions for formatting, coloring
  etc. happen inside of the structured report hook. Where to write message, which format and so on.
- `writeLnHook` /can/ be used by the `structuredReportHook` if needed - right now it is turned into
  simple convenience callback. In future this could even be replaced with simple helper proc, for
  now I left it as it is now because I'm not 100% sure that I can safely remove this.

** Changes in the warning and hint filtering

Report filtering is done in the particular *implementation* of the error hook -
`options.isEnabled()` provides a default predicate to check whether specific report can be written
out, but it must still be called manually. This allows to still see all the reports if needed. For
example, `cli_reporter.reportHook()` uses additional checks to force write some reports (such as
debug trace in `compiles()` context).

Most of the hint and warning were already categorized, altough some reports had to be split into
several (such as `--hint=Performance` that actually controlled reports for `CopiesToSink` and
`CannotMakeSink`, `--hint=Name` that controlled three reports).

None of the errors were categorized (reports from the `nkError` progress was incorporated, but in
I'm mostly describing changes wrt. to old reporting system) and all were generated in-place

** Minor related changes

- List of allowed reports is now stored in the `noteSets: array[ConfNoteSet, ReportKinds]` field of
  the `ConfigRef`, instead of *seven* separate feilds. Access is now done via getter/setter procs,
  which allows to track changes in the current configuration state.
- Renamed list of local options to `localOptions` - added accessors to track changes in the state.
- Move all default report filter configuration to `lineinfos.computeNotesVerbosity()` - previously
  it was scattered in two different places: `NotesVerbosity` for `--verbosity:N` was calculated in
  `lineinfos`, foreignPackageNotesDefault was constructed in `options`
- Changed formatting of the `internalAssert` and `internalError` messages
- Removed lots of string formatting procs from the various `sem*` modules - ideally they should
  *all* be moved to the `cli_reporter` and related.

-------

Additional notes

[fn:global-err] As I said earlier, `globalError` was misused - it is still not possible to fully get
rid of this sickening hack, since it is used for /control flow/ (you read this correct - it is an
error reporting template, and it is used for control flow. Wonderful design right?).

So, in short - `globalError` can raise `ERecoverableError` that can be caught in some other layer
(for example `sem.tryConstExpr` abuses that in order to bail out (pretty sure it was done as an
'optimization' of some sort, just like 'compiles') when expression fails to evaluate at
compile-time [fn:except])

[fn:except] nim-works#94 (comment)

[fn:writeln-explain] Of course you can't have a proper error reporting in the nim compiler, so this
hook was also misused to the point of complete nonsense. Most notable clusterfuck where you could
spot this little shit is implementation of `{.explain.}` pragma for concepts. It was implemented via
really 'smart' (aka welcome to hell) solution in

nim-works@74a8098 (sigmatch
~704) and then further "improved" in
nim-lang/Nim@fe48dd1 by slicing out parts
of the error message with `let msg = s.replace("Error:", errorPrefix)`

For now I had to reuse similar hack - I *do* override error reporting hook in order to collect all
the missing report information. In the future it would be unnecessary - when concept is matched it's
body will be evaluated to `nkError` with reports that are written out later.

[fn:writeln-compiles] when `compiles()` check is executed, all error output is temporarily disabled
(both stderr and stdout) via setting allowed output flags to `{}` (by default it is set to
@haxscramper
Copy link
Collaborator Author

bors r+

@bors
Copy link
Contributor

bors bot commented Jan 17, 2022

Build succeeded:

@bors bors bot merged commit fff9e80 into nim-works:devel Jan 17, 2022
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 27, 2022
- Removes main drnim directory along with all the hacks added into `sem*`
  implementation - special keywords for
  `ensures/requires/assume/invariant`, more logic built into sempass and so
  on. Sem reports that were added in the main compilation.
- External tooling for code analysis should better be implemented via
  completely external modifications, instead of clobbering the main
  compiler codebase with logic for barely functioning tooling. With dod
  refactor this task becomes much easier, since extra metada can be
  attached to any symbol.
- We definitely want to move in the direction of automatic code correctness
  and static analysis, but right now this is not a priority, and
  considering dubious implementation of the tool it would be easier to
  remove it for now and come up with a proper implementation later on, or
  wen can turn all hacks back into sem.
- Not testsed as a part of CI so right now it can't even be built, due to
  file reorganization nim-works#177.
- Adds reports for `--staticBounds:on|off` that were mistakenly assumed to
  be drnim-only during earlier refactor
  nim-works#94 for structured reports.

Closes nim-works#130 as this pull removes
`drnim` as well, and the author of that PR is no longer responsive. Due to
multiple merge conflicts it is highly unlikely the old PR will be revived.
@haxscramper haxscramper mentioned this pull request Jan 27, 2022
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 27, 2022
- Removes main drnim directory along with all the hacks added into `sem*`
  implementation - special keywords for
  `ensures/requires/assume/invariant`, more logic built into sempass and so
  on. Sem reports that were added in the main compilation.
- External tooling for code analysis should better be implemented via
  completely external modifications, instead of clobbering the main
  compiler codebase with logic for barely functioning tooling. With DOD
  refactor this task becomes much easier, since extra metadata can be
  attached to any symbol.
- We definitely want to move in the direction of automatic code correctness
  and static analysis, but right now this is not a priority, and
  considering dubious implementation of the tool, it would be easier to
  remove it for now and come up with a proper implementation later on, or
  when can turn all hacks back into sem.
- drnim itself not tested as a part of CI so right now it can't even be
  built, due to file reorganization
  nim-works#177.
- Adds reports for `--staticBounds:on|off` that were mistakenly assumed to
  be drnim-only during earlier refactor
  nim-works#94 for structured reports.

Closes nim-works#130 as this pull removes
`drnim` as well, and the author of that PR is no longer responsive. Due to
multiple merge conflicts it is highly unlikely the old PR will be revived.
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 27, 2022
- Removes main drnim directory along with all the hacks added into `sem*`
  implementation - special keywords for
  `ensures/requires/assume/invariant`, more logic built into sempass and so
  on. Sem reports that were added in the main compilation.
- External tooling for code analysis should better be implemented via
  completely external modifications, instead of clobbering the main
  compiler codebase with logic for barely functioning tooling. With DOD
  refactor this task becomes much easier, since extra metadata can be
  attached to any symbol.
- We definitely want to move in the direction of automatic code correctness
  and static analysis, but right now this is not a priority, and
  considering dubious implementation of the tool, it would be easier to
  remove it for now and come up with a proper implementation later on, or
  when can turn all hacks back into sem.
- drnim itself not tested as a part of CI so right now it can't even be
  built, due to file reorganization
  nim-works#177.
- Adds reports for `--staticBounds:on|off` that were mistakenly assumed to
  be drnim-only during earlier refactor
  nim-works#94 for structured reports.

Closes nim-works#130 as this pull removes
`drnim` as well, and the author of that PR is no longer responsive. Due to
multiple merge conflicts it is highly unlikely the old PR will be revived.
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 27, 2022
- Removes main drnim directory along with all the hacks added into `sem*`
  implementation - special keywords for
  `ensures/requires/assume/invariant`, more logic built into sempass and so
  on. Sem reports that were added in the main compilation.
- External tooling for code analysis should better be implemented via
  completely external modifications, instead of clobbering the main
  compiler codebase with logic for barely functioning tooling. With DOD
  refactor this task becomes much easier, since extra metadata can be
  attached to any symbol.
- We definitely want to move in the direction of automatic code correctness
  and static analysis, but right now this is not a priority, and
  considering dubious implementation of the tool, it would be easier to
  remove it for now and come up with a proper implementation later on, or
  when can turn all hacks back into sem.
- drnim itself not tested as a part of CI so right now it can't even be
  built, due to file reorganization
  nim-works#177.
- Adds reports for `--staticBounds:on|off` that were mistakenly assumed to
  be drnim-only during earlier refactor
  nim-works#94 for structured reports.

Closes nim-works#130 as this pull removes
`drnim` as well, and the author of that PR is no longer responsive. Due to
multiple merge conflicts it is highly unlikely the old PR will be revived.
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 27, 2022
- Removes main drnim directory along with all the hacks added into `sem*`
  implementation - special keywords for
  `ensures/requires/assume/invariant`, more logic built into sempass and so
  on. Sem reports that were added in the main compilation.
- External tooling for code analysis should better be implemented via
  completely external modifications, instead of clobbering the main
  compiler codebase with logic for barely functioning tooling. With DOD
  refactor this task becomes much easier, since extra metadata can be
  attached to any symbol.
- We definitely want to move in the direction of automatic code correctness
  and static analysis, but right now this is not a priority, and
  considering dubious implementation of the tool, it would be easier to
  remove it for now and come up with a proper implementation later on, or
  when can turn all hacks back into sem.
- drnim itself not tested as a part of CI so right now it can't even be
  built, due to file reorganization
  nim-works#177.
- Adds reports for `--staticBounds:on|off` that were mistakenly assumed to
  be drnim-only during earlier refactor
  nim-works#94 for structured reports.

Closes nim-works#130 as this pull removes
`drnim` as well, and the author of that PR is no longer responsive. Due to
multiple merge conflicts it is highly unlikely the old PR will be revived.
haxscramper added a commit to haxscramper/nimskull that referenced this pull request Jan 27, 2022
- Removes main drnim directory along with all the hacks added into `sem*`
  implementation - special keywords for
  `ensures/requires/assume/invariant`, more logic built into sempass and so
  on. Sem reports that were added in the main compilation.
- External tooling for code analysis should better be implemented via
  completely external modifications, instead of clobbering the main
  compiler codebase with logic for barely functioning tooling. With DOD
  refactor this task becomes much easier, since extra metadata can be
  attached to any symbol.
- We definitely want to move in the direction of automatic code correctness
  and static analysis, but right now this is not a priority, and
  considering dubious implementation of the tool, it would be easier to
  remove it for now and come up with a proper implementation later on, or
  when can turn all hacks back into sem.
- drnim itself not tested as a part of CI so right now it can't even be
  built, due to file reorganization
  nim-works#177.
- Adds reports for `--staticBounds:on|off` that were mistakenly assumed to
  be drnim-only during earlier refactor
  nim-works#94 for structured reports.

Closes nim-works#130 as this pull removes
`drnim` as well, and the author of that PR is no longer responsive. Due to
multiple merge conflicts it is highly unlikely the old PR will be revived.
bors bot added a commit that referenced this pull request Jan 28, 2022
207: Unbundle drnim r=saem a=haxscramper

- Removes main drnim directory along with all the hacks added into `sem*`
  implementation - special keywords for
  `ensures/requires/assume/invariant`, more logic built into sempass and so
  on. Sem reports that were added in the main compilation.
- External tooling for code analysis should better be implemented via
  completely external modifications, instead of clobbering the main
  compiler codebase with logic for barely functioning tooling. With DOD
  refactor this task becomes much easier, since extra metadata can be
  attached to any symbol.
- We definitely want to move in the direction of automatic code correctness
  and static analysis, but right now this is not a priority, and
  considering dubious implementation of the tool, it would be easier to
  remove it for now and come up with a proper implementation later on, or
  when can turn all hacks back into sem.
- drnim itself Not tested as a part of CI so right now it can't even be built, due to
  file reorganization #177.
- Adds reports for `--staticBounds:on|off` that were mistakenly assumed to
  be drnim-only during earlier refactor
  #94 for structured reports.

Closes #130 as this pull removes
`drnim` as well, and the author of that PR is no longer responsive. Due to
multiple merge conflicts it is highly unlikely the old PR will be revived.



Co-authored-by: haxscramper <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants