Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak caused by registry ignoring metrics of external types #264

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

michaelschlies
Copy link

fixes #244

@michaelschlies
Copy link
Author

reverted out the move to sync.Map to keep backward compatibility before go 1.9. still not increasing in memory, this is ready for review.

@mihasya
Copy link
Collaborator

mihasya commented Aug 24, 2019

Thanks for tracking this down and submitting a PR. I wonder why this check was there in the first place. Nothing obvious in blame, let me see if I can track down @rcrowley and understand what the reasoning was for doing the check in the first place.

@michaelschlies
Copy link
Author

I eventually did see some slow leaks come from other places in the code but haven't opened a PR for those yet (One of my running variants of this is perfectly happy with this change, no increase in memory, the other is increasing coming out of meter.go (they consume this slightly differently so I am hoping to get some more pprof dumps from the one that is increasing (albeit very slowly) from go-metrics) when I build a passthrough route in my main application (Java Ratpack) into the sidecar container using this library!

@michaelschlies
Copy link
Author

(I know our big leak from registry.go was the fact we have a wrapper metric that uses other metrics types so it was being discarded from registry and just kept allocating new objects for the same metrics)

@mihasya
Copy link
Collaborator

mihasya commented Aug 24, 2019

This probably calls for a broader discussion of what makes a "metric" and if we should be using an interface here rather than manual type checking 🤔 I think the main thing preventing the use of an interface is that we want to make it possible for metrics to generate a float64 OR an int64 and there's no type that captures both of those in Golang (at least I remember lamenting this, oh, 5 years ago when we last talked about this; could well be some new feature makes this possible).

@rcrowley and I exchanged some texts about this, and he confirmed my suspicion that the check is there to guard from crashes down the line, since both the argument to register and the value type for the r.metrics map is interface{}. He agrees that due to 1. all of this being buried in private space and 2. the fact that any consumer of the registry has to do type switching on the metrics later anyway because nothing can be assumed about interface{}, it's probably reasonably safe to remove the check.

On that note, in your use of wrapper metric types, how were you accessing them later? Do you have your own function that uses .Each and typechecks the metrics that come out? I would assume this function, too, would never see the metrics you registered, since register drops unrecognized types on the ground without returning an error.

@michaelschlies
Copy link
Author

In our case (and I'm trying to infer why it was chosen to be built a particular way because I'm not on the team that built it) We have an HTTPMetric that contains a Meter, Timer, and Histogram that all trickle into an internal metric pipeline with zero configuration for a consuming application team. This way anything using net/http or gin get metrics basically for free (https://youtu.be/cnHfK4MZA2Y?t=933 if you're curious about the whole services for free thing to our internal platform).

@mihasya
Copy link
Collaborator

mihasya commented Aug 24, 2019

@michaelschlies right, but if register doesn't actually add the HTTPMetric to its registry because it fails the type check, how are you currently getting the data back out? Or do you basically have some parallel registry?

@mihasya
Copy link
Collaborator

mihasya commented Aug 24, 2019

@michaelschlies also, isn't a metrics.Timer already a Meter, Counter, and Histogram all in one? It's not explicitly called out as such, so we should probably think about some documentation polish; however check the actual interface. Is what you have different?

go-metrics/timer.go

Lines 8 to 28 in 9180854

// Timers capture the duration and rate of events.
type Timer interface {
Count() int64
Max() int64
Mean() float64
Min() int64
Percentile(float64) float64
Percentiles([]float64) []float64
Rate1() float64
Rate5() float64
Rate15() float64
RateMean() float64
Snapshot() Timer
StdDev() float64
Stop()
Sum() int64
Time(func())
Update(time.Duration)
UpdateSince(time.Time)
Variance() float64
}

FWIW, go-tigertonic, which is a simple HTTP lib @rcrowley wrote concurrently with this library, just uses metric.Timer: https://github.com/rcrowley/go-tigertonic/blob/87d5e7ed6b42eaa9daadd697d2e169bf0f8d8c7d/metrics.go#L210-L225

@michaelschlies
Copy link
Author

I can pass that along to the team that actually built this library internally, I was just trying to figure out why my application that started out at 2mb RAM utilized grew like crazy shrug

@mihasya
Copy link
Collaborator

mihasya commented Aug 24, 2019 via email

@michaelschlies
Copy link
Author

Apparently the person that built the specific Http metric struct isn't actually here anymore, so that's a bit of a bummer, but the only kind of funkiness appears to be in the histogram:

func NewHTTPMetric() *HTTPMetric {
	return &HTTPMetric{
		Meter:                 metrics.NewMeter(),
		ResponseTimer:         metrics.NewTimer(),
		ResponseSizeHistogram: metrics.NewHistogram(metrics.NewUniformSample(100)),
	}
}

So I can't get the background on the why but we do have a fair amount of automation in several downstream applications wrapped in using our library (wrapping go-metrics in this case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

memory is keeping increase until crash
2 participants