S3: rewrite metrics tests, stop collecting errors from missing keys#118
S3: rewrite metrics tests, stop collecting errors from missing keys#118ahouene merged 4 commits intoPowerDNS:mainfrom
Conversation
| } | ||
| errRes := minio.ToErrorResponse(err) | ||
| if !isList && errRes.StatusCode == 404 { | ||
| if !isList && errRes.Code == "NoSuchKey" { |
There was a problem hiding this comment.
This change is not very obvious about what it does: a missing bucket also has errRes.StatusCode == 404. This makes the function only return ErrNotExist when we have a missing key. A missing bucket will return a non-converted error.
There was a problem hiding this comment.
Maybe add a comment in the code to this effect?
There was a problem hiding this comment.
I like being able to differentiate between missing key and missing bucket, and looking at your tests it seems that there is a "NoSuchBucket" code, so the label will be correct.
| SecretKey: "bar", | ||
| Bucket: "test-bucket", | ||
| CreateBucket: false, | ||
| DialTimeout: 1 * time.Second, |
There was a problem hiding this comment.
I found why this test results in different behavior for me:
$ time dig +short nosuchhost
real 0m3,920s
user 0m0,008s
sys 0m0,003s
There was a problem hiding this comment.
Interesting, on my side it takes less than a second, and changing the timeout to 20s just in case doesn't change the diff in the test failure! Does raising the timeout fix it for you?
There was a problem hiding this comment.
If I increase the DialTimeout, something else is getting REAL slow, because the default 30s timeout in my editor aborts the test.
There was a problem hiding this comment.
It does fix some of it, though not the storage_s3_call_error_by_type_total{error="NotFound",method="load"} 3 discrepancy.
There was a problem hiding this comment.
Leaving DialTimeout at 1 second but changing nosuchhost to nosuchhost.invalid yields the same results but takes one fifth as long.
There was a problem hiding this comment.
I think would have been solved with the metricCallErrorsType.Reset() change that Rod added
There was a problem hiding this comment.
Nope, same output for a full go test -count=1 ./....
There was a problem hiding this comment.
Oh, sorry, if I fix my env for devcontainers it succeeds.
neilcook
left a comment
There was a problem hiding this comment.
A couple of comment but overall LGTM
| } | ||
| errRes := minio.ToErrorResponse(err) | ||
| if !isList && errRes.StatusCode == 404 { | ||
| if !isList && errRes.Code == "NoSuchKey" { |
There was a problem hiding this comment.
I like being able to differentiate between missing key and missing bucket, and looking at your tests it seems that there is a "NoSuchBucket" code, so the label will be correct.
| if err = convertMinioError(err, false); err != nil { | ||
| metricCallErrors.WithLabelValues("load").Inc() | ||
| metricCallErrorsType.WithLabelValues("load", errorToMetricsLabel(err)).Inc() | ||
| if !errors.Is(err, os.ErrNotExist) { |
There was a problem hiding this comment.
Can you give some more info on why you think 'missing bucket' should be reported for this metric, but not "missing key"? I guess because missing key is likely to happen for all kinds of good reasons (like cleanup of files by another process) but missing bucket is always bad?
There was a problem hiding this comment.
Yes, a missing bucket means either the ability to create a bucket was not given to the client in the options and the bucket does not exist, or the bucket was removed from another client we have no control over.
I'm documenting it better in aa828a5
…ing bucket is important
The
TestMetricstests are failing on my machine's and Luit's, as the Minio Go SDK seems to issue different errors on macOS (a difference in resolving?).That prompted me to rewrite the tests to make sure the use of a healthy backend sees no metrics being collected, and test that metrics are collected with an error we have more control over.
Moreover, working on this made me realise that we collect
ErrNotExistjust as other errors. Since it does not reflect an unhealthy backend but merely a missing key, I also propose we no longer collect it.Lastly, we no longer return
ErrNotExistfor a missing bucket, as it is a completely different error.