Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky: TestGameServerResourceValidation #3362

Closed
markmandel opened this issue Sep 6, 2023 · 4 comments · Fixed by #3373
Closed

Flaky: TestGameServerResourceValidation #3362

markmandel opened this issue Sep 6, 2023 · 4 comments · Fixed by #3373
Labels
area/tests Unit tests, e2e tests, anything to make sure things don't break help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.

Comments

@markmandel
Copy link
Member

What happened:

Occasional failure on e2e tests

time="2023-09-06 19:33:45.261" level=info msg="RoleBinding 1694028825/agones-sdk-access is created"
=== RUN   TestGameServerResourceValidation
=== PAUSE TestGameServerResourceValidation
=== CONT  TestGameServerResourceValidation
    gameserver_test.go:953: 
        	Error Trace:	/go/src/agones.dev/agones/test/e2e/gameserver_test.go:953
        	Error:      	Not equal: 
        	            	expected: "spec.template.spec.containers[0].resources.requests[cpu]"
        	            	actual  : "spec.template.spec.containers[0].resources.requests"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-spec.template.spec.containers[0].resources.requests[cpu]
        	            	+spec.template.spec.containers[0].resources.requests
        	Test:       	TestGameServerResourceValidation
--- FAIL: TestGameServerResourceValidation (0.17s)

As far as I can tell it's doesn't seem tied to a Kubernetes version (but I could be wrong) - most recently seen on 1.27 standard cluster.

What you expected to happen:

To get a consistent error message back. Very odd.

How to reproduce it (as minimally and precisely as possible):

Run a few tests. What's even weirder is on subsequent builds it doesn't fail, but it consistently fails on e2e test retries.

Anything else we need to know?:

Environment:

@markmandel markmandel added kind/bug These are bugs. help wanted We would love help on these issues. Please come help us! area/tests Unit tests, e2e tests, anything to make sure things don't break labels Sep 6, 2023
@markmandel
Copy link
Member Author

Just saw it again, and it was also 1.27.x - now I am wondering if it's a k8s version thing.

Logs: https://console.cloud.google.com/cloud-build/builds/4e05fa57-2b0b-4bc4-8f43-42b08ad8d242?project=258182270954

@markmandel
Copy link
Member Author

@igooch
Copy link
Collaborator

igooch commented Sep 11, 2023

Based on a println of a passing test (tested locally with ~/agones/test/e2e$ go test -race -run ^TestGameServerResourceValidation$):

statusErr.Status().Details.Causes: [{FieldValueInvalid Invalid value: "-50m": must be greater than or equal to 0 spec.template.spec.containers[0].resources.requests[cpu]} {FieldValueInvalid Invalid value: "128Mi": must be less than or equal to memory limit of 64Mi spec.template.spec.containers[0].resources.requests}]

My guess is that since this is only failing on assert assert.Equal(t, "spec.template.spec.containers[0].resources.requests[cpu]", statusErr.Status().Details.Causes[0].Field), and not the previous assert assert.Len(t, statusErr.Status().Details.Causes, 2), that this could be an issue with the ordering of the errors in the list.

gs.Spec.Template.Spec.Containers[0].Resources.Requests[corev1.ResourceCPU] = resource.MustParse("-50m")
_, err = gsClient.Create(ctx, gs.DeepCopy(), metav1.CreateOptions{})
assert.NotNil(t, err)
statusErr, ok = err.(*k8serrors.StatusError)
assert.True(t, ok)
assert.Len(t, statusErr.Status().Details.Causes, 2)
assert.Equal(t, metav1.CauseTypeFieldValueInvalid, statusErr.Status().Details.Causes[0].Type)
assert.Equal(t, "spec.template.spec.containers[0].resources.requests[cpu]", statusErr.Status().Details.Causes[0].Field)

@markmandel
Copy link
Member Author

Oh that's a good suggestion!

One way to test would be to sort the result list, and see if we can repro the issue by sorting asc, then desc, and see if we can guarantee the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests Unit tests, e2e tests, anything to make sure things don't break help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants