Add builtin support for truncation #25

kzh · 2021-08-03T12:38:15Z

This PR adds in truncation support to the redact library. This is a proof of concept of what this feature could look like. It is exposed through the API as a wrapper function similar to redact.Safe.

func Truncate(a interface{}, len int) TruncatedValue {
    return TruncatedValue{Value: a, Length: len}
}

An example usage is:

log.Infof(ctx, "initiating a split of this range at key %s [r%d] (%s)%s",
	splitKey.StringWithDirs(nil /* valDirs */, 50 /* maxLen */), rightRangeID, reason, extra)
// --->
log.Infof(ctx, "initiating a split of this range at key %s [r%d] (%s)%s",
	redact.Truncate(splitKey, 50), rightRangeID, reason, extra)

Truncation limits the rune length of a printed value and appends a truncation marker ... in the case that the limit is exceeded. The motivation behind this feature is from cockroachdb/cockroach#67065 where printed keys can take too much screen real estate hence the need for truncation. It seems appropriate to insert this logic into the redaction library since the notation of unsafe/safe segments of a printed value is construed within. It would feel somewhat redundant to interpret these semantics outside.

Regarding overhead, the truncation logic is embedded in such a way that there is minimal additional performance/memory demands when writing to the buffer and only when necessary.

Notes:

The current behavior of truncation inside of redactable strings that contain both unsafe/safe segments is that if the truncation cuts into an unsafe segment, the ending marker is abruptly added and the truncation marker is appended afterwards. Example:

Truncate("hello", 3) -> "‹hel›..."

I'm unsure if this is the expected behavior of truncation. I'm a tad worried this could be somewhat misleading in suggesting that the unsafe segment is complete in contrast to "‹hel...›".

The truncation logic currently expects valid unicode character inputs. This seems like a reasonable requirement since the outputted redacted strings are intended to be sent to logs for human consumption.

@knz I'd love to hear your thoughts on this!

This change is

cockroach-teamcity · 2021-08-03T12:38:18Z

All committers have signed the CLA.

dhartunian

Thx for the detailed PR message.

Reviewed 5 of 6 files at r1.
Reviewable status: 5 of 6 files reviewed, 4 unresolved discussions (waiting on @abarganier, @knz, @kzh, and @rimadeodhar)

markers_test.go, line 237 at r1 (raw file):

				)
			}}, 20))
		}, "Inside saf... ‹uns›..."},

btw yeah I see how the nested truncate looks odd with the multiple ... in there. I think it's something we can get feedback from users on once people have a chance to use the truncation feature. I imagine these complex cases will be less common since we'll truncate at the top level usually.

interfaces/interfaces.go, line 151 at r1 (raw file):

}

type TruncatedValue struct {

nit: struct inside a file named interfaces.go. Maybe there's a better place for it.

internal/buffer/buffer.go, line 37 at r1 (raw file):

	// TODO(kzh): describe these fields
	runes int
	limit int

nit: wonder if runeCount and runeLimit might be better...?

internal/buffer/buffer.go, line 149 at r1 (raw file):

func (b *Buffer) WriteString(s string) (n int, err error) {
	if b.limit != 0 {
		r := []rune(s)

move r definition below if statement.

internal/buffer/buffer.go, line 222 at r1 (raw file):

}

type TruncateState struct {

nit: add docstring

knz

Thanks for the experiment.

There are two discussions needed here.

I am not too convinced yet by the idea of a helper Truncate function. There are several unanswered questions with it:

What does Truncate(Truncate(a, 20), 3) even mean? What about T(T(a, 3), 20? i.e. does it commute? do we want it to? Is the current behavior of the implementation deliberate or merely an accident?
What does Safe(Truncate(x, n)) mean? (This needs tests)
What does Truncate(x, 0) mean? (This needs tests)
What does Truncate(x, n) do if x is formatted using a mix of safe and unsafe sub-strings? (This needs tests)

If I am not mistaken, the current behavior of Truncate(x, n) is (approximately)

format x as usual
truncate x to n characters, or n-1 if it starts with a redaction marker
re-introduce a closing redaction marker if it was truncated away

Then, why not implement Truncate like this - without changing the formatting code?

when we first discussed this, you had proposed a formatting flag, for example %#20v. Why did you not do this? It looks to me as if the implementation of this would be simpler.

Reviewed 5 of 6 files at r1.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @abarganier, @kzh, and @rimadeodhar)

internal/buffer/buffer.go, line 149 at r1 (raw file):

func (b *Buffer) WriteString(s string) (n int, err error) {
	if b.limit != 0 {
		r := []rune(s)

please don't implement this like this. The redact library really means to go the longest way possible to avoid heap allocations. Here you're creating a new slice on the heap for each string argument and using sub-slices for something that could be implemented with a for loop.

internal/markers/constants.go, line 28 at r1 (raw file):

	EscapeMarkS = string(EscapeMark)
	RedactedS   = StartS + "×" + EndS
	Truncate    = `...`

I'd recommend … instead.

kzh

Thanks for the feedback @dhartunian and @knz! Sorry for the long wait. I brushed this to the side for a bit to focus on other work (namely the Datadog Agent in CC).

What does Truncate(Truncate(a, 20), 3) even mean? What about T(T(a, 3), 20? i.e. does it commute? do we want it to? Is the current behavior of the implementation deliberate or merely an accident?

The current behavior is deliberate. In the cases of nested truncation operations, the limit of each inner truncation is min'ed against the outer limit. This logic happens in the Truncate method of the Buffer struct. So T(T(a, 20), 3) would effectively be the same as T(T(a, 3), 3) since 20 > 3.

The usage of nested truncation will likely be rare. Support for it was added just in case someone tries:

type Foo struct {
    a string
    b string
}

func (f Foo) SafeFormat(w redact.SafePrinter, _ rune) {
    w.Print(redact.Truncate(f.a, 10), redact.Truncate(f.b, 15))
}

log.Println(redact.Truncate(Foo{}, 20))

What does Safe(Truncate(x, n)) mean? (This needs tests)

The Truncate operation does not affect the "safe/unsafe-ness" of a value. So Safe(Truncate(x, n)) should mean the same as Safe(x) but truncated to n runes.

What does Truncate(x, 0) mean? (This needs tests)

Truncate only applies when there is a positive limit. The entire value of x would be outputted here.

What does Truncate(x, n) do if x is formatted using a mix of safe and unsafe sub-strings? (This needs tests)

It would first limit the output x to n runes. If the truncation ends with an open redaction marker, the ending redaction marker would be promptly appended in addition to a truncation marker afterwards. So: Truncate(Unsafe("hello", 3)) -> ‹hel›….

If I am not mistaken, the current behavior of Truncate(x, n) is (approximately)

The truncation logic is mostly embedded into the Buffer struct write methods. Truncate sets a limit in the struct's field runeLimit. All future write operations then keeps count of runes written in the buffer and will restrict writing when this limit is hit. This is done this way to prevent unnecessary memory allocation that happens when the buffer grows. This is a key detail since truncation doesn't actually truncate once after the entire output is generated. It performs this operation as the buffer is being written to. Something else to note is that all markers are excluded from this rune count.

Then, why not implement Truncate like this - without changing the formatting code?
when we first discussed this, you had proposed a formatting flag, for example %#20v. Why did you not do this? It looks to me as if the implementation of this would be simpler.

In my mind, the implementation would have been somewhat similar with the bulk of the logic happening within the Buffer struct write operations. It was just a matter of how this truncation feature is exposed to developers that use this library. I opted for this Truncate function since it makes this feature more visible/discoverable from a developer's perspective than a formatting flag which is a bit much more subtle. It also makes any code that uses this functionality more understandable from a glance rather than cryptic. You can somewhat tell right off the bat what log.Info(redact.Truncate("hello", 5) intends to do versus log.Infof("%#5v", "hello"). I don't have a strong opinion on this and totally super cool to switch over, but I feel the underlying implementation would still be somewhat similar.

Reviewable status: 1 of 6 files reviewed, 4 unresolved discussions (waiting on @abarganier, @dhartunian, @knz, and @rimadeodhar)

markers_test.go, line 237 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

btw yeah I see how the nested truncate looks odd with the multiple ... in there. I think it's something we can get feedback from users on once people have a chance to use the truncation feature. I imagine these complex cases will be less common since we'll truncate at the top level usually.

👍

interfaces/interfaces.go, line 151 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: struct inside a file named interfaces.go. Maybe there's a better place for it.

I think the file name might mean more broadly interfaces in general, not strictly limited specifically to interfaces in Go. This is judging off of the fact that there are already some non interface type definitions in there. However, I could very well be misinterpreting this 🤔

internal/buffer/buffer.go, line 37 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: wonder if runeCount and runeLimit might be better...?

Done.

internal/buffer/buffer.go, line 149 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

move r definition below if statement.

Done.

internal/buffer/buffer.go, line 149 at r1 (raw file):

Previously, knz (kena) wrote…

please don't implement this like this. The redact library really means to go the longest way possible to avoid heap allocations. Here you're creating a new slice on the heap for each string argument and using sub-slices for something that could be implemented with a for loop.

Done. Is this what you were thinking of?

internal/buffer/buffer.go, line 222 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: add docstring

Done.

internal/markers/constants.go, line 28 at r1 (raw file):

Previously, knz (kena) wrote…

I'd recommend … instead.

Done.

rimadeodhar

Reviewed 1 of 3 files at r4.
Reviewable status: 2 of 6 files reviewed, 9 unresolved discussions (waiting on @abarganier, @dhartunian, @knz, and @kzh)

internal/buffer/buffer.go, line 140 at r4 (raw file):

	if b.runeLimit != 0 {
		if b.runeLimit == b.runeCount {
			if len(p) != 0 {

Nit: You can use length here.

internal/buffer/buffer.go, line 143 at r4 (raw file):

Quoted 5 lines of code…

		if b.runeLimit == b.runeCount {
			if len(p) != 0 {
				b.excess = true
			}
			return

Seems like this can be made simpler

if b.runeLimit == b.runeCount {
  b.excess = length != 0
  return
}

internal/buffer/buffer.go, line 201 at r4 (raw file):

	}
	excess = len(p) != 0
	return

Don't need this return statement right? It's at the end of the function?

internal/buffer/buffer.go, line 214 at r4 (raw file):

	}
	excess = len(str) != 0
	return

Ditto.

internal/buffer/buffer.go, line 222 at r4 (raw file):

		if b.runeLimit == b.runeCount {
			b.excess = true
			return nil

Do we not consider truncation to be an error? i.e. we don't actually write the byte in this case but simply set the excess field right?

knz

I wonder what happens if you specify Truncate with an object (say, a struct or an error) which has a SafeFormat method that mixes multiple redactable and non-redactable substrings. If my reading of the code is right, we'll get multiple ellipses in that case. (This needs tests!)

I think that if you combine the %xxv approach and also do not use a separate TruncatedValue type (i.e. do not implement this logic in internal.Buffer, and instead do it in the printf code), this will remove the artifacts when an object that contains multiple sub-fields gets formatted.

Meanwhile, from a purely engineering perspective:

The truncation logic is mostly embedded into the Buffer struct write methods

This still feels strange to me. This design means we're paying overhead for a rarely-used feature in the common case.

Reviewable status: 2 of 6 files reviewed, 9 unresolved discussions (waiting on @abarganier, @dhartunian, @knz, and @kzh)

knz

As found out in PR #26, you'll also need to double check the behavior of %q and %x with truncation
As well as the behavior of %v with a nil pointer, i.e. what does Truncate(nil) do

(In fact I recommend you pilfer the unit tests in that other PR and find out what they reveal about your implementation)

Reviewable status: 2 of 6 files reviewed, 9 unresolved discussions (waiting on @abarganier, @dhartunian, @knz, and @kzh)

kzh requested review from knz, dhartunian, abarganier and rimadeodhar August 3, 2021 12:38

kzh self-assigned this Aug 3, 2021

kzh force-pushed the truncate branch from c50ee6b to c941448 Compare August 3, 2021 15:12

dhartunian reviewed Aug 4, 2021

View reviewed changes

knz reviewed Aug 5, 2021

View reviewed changes

kzh force-pushed the truncate branch 2 times, most recently from f3d2d0e to f79d56f Compare August 19, 2021 10:31

Add builtin support for truncation

591a5fc

kzh force-pushed the truncate branch from f79d56f to 591a5fc Compare August 31, 2021 13:48

kzh commented Aug 31, 2021

View reviewed changes

rimadeodhar reviewed Sep 2, 2021

View reviewed changes

knz reviewed Sep 3, 2021

View reviewed changes

knz mentioned this pull request Sep 3, 2021

Demonstrate and extend printf truncation #26

Open

knz reviewed Sep 3, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add builtin support for truncation #25

Add builtin support for truncation #25

kzh commented Aug 3, 2021 •

edited

Loading

cockroach-teamcity commented Aug 3, 2021 •

edited

Loading

dhartunian left a comment

knz left a comment

kzh left a comment •

edited

Loading

rimadeodhar left a comment

knz left a comment

knz left a comment

Add builtin support for truncation #25

Are you sure you want to change the base?

Add builtin support for truncation #25

Conversation

kzh commented Aug 3, 2021 • edited Loading

cockroach-teamcity commented Aug 3, 2021 • edited Loading

dhartunian left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

kzh left a comment • edited Loading

Choose a reason for hiding this comment

rimadeodhar left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

kzh commented Aug 3, 2021 •

edited

Loading

cockroach-teamcity commented Aug 3, 2021 •

edited

Loading

kzh left a comment •

edited

Loading