Fix failing eden tests #986

milan-zededa · 2024-06-20T08:55:13Z

Multiple patches in this PR:

Increased the default EVE version to the latest 12.4
Removed workflow dependency on Smoke test suite. With reusable workflows this does not work properly in the EVE repository. The other test suites are triggered regardless of the outcome of Smoke tests. The only effect this dependency has, is that other workflows are delayed from starting until Smoke tests finish, increasing the total execution time of all tests.
Moved tests that depend on virtualization (apps must be deployed inside VMs) to a separate test suite. The primary reason is that the HW-assisted nested virtualization is performing poorly on the buildjet runners and we are getting many failures (not related to EVE code). For every other test it is therefore better to run EVE with acceleration disabled. A secondary reason is that most of these tests are from the networking test suite, which is already quite long and takes well over an hour to execute. It therefore makes sense to move some tests to a different test suite. Please note that when the Virtualization test suite is run locally, all tests are passing. However, on buildjet runners they are expected to fail until we resolve the issue with the nested virtualization.
Fixed 'eden pod modify' (wrong timeout + app state to wait for had to be modified to reflect recent EVE changes)
Fixed ctrl_cert_change test (improper use of -check-new)
App 'nodered' needs more time to get deployed
Test publish_location is now skipped and likely will be removed. This is because after the recent changes in the EVE/wwan microservice, injecting a fake GPS location data is no longer possible.
Modified EVE-upgrade tests to use newer base EVE version (> 12.0). This is because the policy_version of fscrypt was bumped from 1 to 2 and this change is not backward compatible. Meaning we cannot downgrade from 12.X to 11.Y or older.

Please note that we are still getting some occasional kernel crashes on buildjet runners (even with accel=false), so even stable tests may fail once in a while.

uncleDecart

LGTM, thank you Milan, I think we should merge this and release eden and bump eden version in EVE. For Buildjet we also should integrate this #984. And point to guys that we still have problems with virtualisation tests as we found out. And that locally they run perfectly fine. Maybe we also should try to run test suite on self-hosted runners to see the difference.

Also @christoph-zededa this PR bumps EVE version, so yours won't have to if we merge this before

uncleDecart · 2024-06-20T10:22:23Z

Forgot the sticker :D

And also I see that LPC LOC tests are failing, right?

Edit: and some of the Smoke tests are still failing and they fail on rebooting. I wonder if it has to do with Buildet runners as well

Multiple patches in this PR: * Increased the default EVE version to the latest 12.4 * Removed workflow dependency on Smoke test suite. With reusable workflows this does not work properly in the EVE repository. The other test suites are triggered regardless of the outcome of Smoke tests. The only effect this dependency has, is that other workflows are delayed from starting until Smoke tests finish, increasing the total execution time of all tests. * Moved tests that depend on virtualization (apps must be deployed inside VMs) to a separate test suite. The primary reason is that the HW-assisted nested virtualization is performing poorly on the buildjet runners and we are getting many failures (not related to EVE code). For every other test it is therefore better to run EVE with acceleration disabled. A secondary reason is that most of these tests are from the networking test suite, which is already quite long and takes well over an hour to execute. It therefore makes sense to move some tests to a different test suite. Please note that when the Virtualization test suite is run locally, all tests are passing. However, on buildjet runners they are expected to fail until we resolve the issue with the nested virtualization. * Fixed 'eden pod modify' (wrong timeout + app state to wait for had to be modified to reflect recent EVE changes) * Fixed ctrl_cert_change test (improper use of -check-new) * App 'nodered' needs more time to get deployed * Test publish_location is now skipped and likely will be removed. This is because after the recent changes in the EVE/wwan microservice, injecting a fake GPS location data is no longer possible. * Modified EVE-upgrade tests to use newer base EVE version (> 12.0). This is because the policy_version of fscrypt was bumped from 1 to 2 and this change is not backward compatible. Meaning we cannot downgrade from 12.X to 11.Y or older. Signed-off-by: Milan Lenco <milan@zededa.com>

milan-zededa · 2024-06-20T13:55:13Z

Forgot the sticker :D

And also I see that LPC LOC tests are failing, right?

Edit: and some of the Smoke tests are still failing and they fail on rebooting. I wonder if it has to do with Buildet runners as well

Yes, all of those were due to kernel crashes that we are occasionally getting.

milan-zededa · 2024-06-20T16:12:17Z

Runners are without Internet connectivity, will try again tomorrow...

milan-zededa · 2024-06-21T07:32:25Z

Kernel crashes are quite often.
I wonder if the "noisy neighbor" (quoting the guy from Buildjet) is actually us and these problems are more frequent when we run many eden workflows at the same time (likely scheduled on the same host(s)).

uncleDecart · 2024-06-21T10:07:16Z

Okay, anyways I'm merging this PR and telemetry one, let's try to get as much info as possible

uncleDecart · 2024-06-21T10:08:12Z

Let's try to run it on local runner, get statistics from it and compare them

milan-zededa requested a review from giggsoff June 20, 2024 08:55

milan-zededa requested a review from uncleDecart as a code owner June 20, 2024 08:55

This was referenced Jun 20, 2024

Use latest EVE LTS version for testing upgrades #969

Closed

Testing my eden test patches lf-edge/eve#3988

Closed

uncleDecart approved these changes Jun 20, 2024

View reviewed changes

milan-zededa force-pushed the test-fixing branch from f6b72b5 to 308accf Compare June 20, 2024 13:42

uncleDecart merged commit 6e38968 into lf-edge:master Jun 21, 2024
17 of 19 checks passed

milan-zededa mentioned this pull request Jun 25, 2024

Manually assign uid/gid to username and groupname in dom0 lf-edge/eve#3989

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix failing eden tests #986

Fix failing eden tests #986

milan-zededa commented Jun 20, 2024 •

edited

Loading

uncleDecart left a comment

uncleDecart commented Jun 20, 2024 •

edited

Loading

milan-zededa commented Jun 20, 2024

milan-zededa commented Jun 20, 2024

milan-zededa commented Jun 21, 2024

uncleDecart commented Jun 21, 2024

uncleDecart commented Jun 21, 2024

Fix failing eden tests #986

Fix failing eden tests #986

Conversation

milan-zededa commented Jun 20, 2024 • edited Loading

uncleDecart left a comment

Choose a reason for hiding this comment

uncleDecart commented Jun 20, 2024 • edited Loading

milan-zededa commented Jun 20, 2024

milan-zededa commented Jun 20, 2024

milan-zededa commented Jun 21, 2024

uncleDecart commented Jun 21, 2024

uncleDecart commented Jun 21, 2024

milan-zededa commented Jun 20, 2024 •

edited

Loading

uncleDecart commented Jun 20, 2024 •

edited

Loading