[FAB-17286] Improve Unit-Test Framework #394

lindluni · 2019-12-11T17:45:21Z

There were several changes I've made here to attempt to address some feedback I've gotten in conversation with Jason while he was trying to do reviews and keep up with making sure CI was passed, as well as debugging CI since we've moved:

Introduce gotestsum: This change attempts to fix some of the short comings of the standard go test framework. gotestsum is a wrapper around go test using the test2json framework, and was written (and maintained) by CircleCI while they were working on mainstream support for Go. The benefits we are particularly interested in, are the ability to generate junit reports and coverage reports in a single command, gotestsum provides a summary of failures at the end of the test (including the function, stacktrace and line in the test that actually failed) , unlike go test which requires you to dig through your logs to figure out exactly where failures occurred. It also provides a clean UI of green check marks, and red exes to highlight failures.
Introduce JUnit reporting for Unit Tests: On merge we will now collect JUnit metrics. This will allow us to better triage failing unit tests by automatically collecting metrics, and being able to get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days.
Introduce JUnit reporting for Integration Tests: On merge we will now run integration tests and collect JUnit metrics. This will allow us to better triage failing integration tests by automatically collecting metrics, and being able to run get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days. I am not particularly happy about the introduction of the ENABLE_JUNIT environment variable to collect JUnit reports in Ginkgo, but Ginkgo's only mechanism for collecting JUnit reports is by adding the reporters to code, there is no command line flag. Adding the ENABLE_JUNIT flag prevents us from polluting contributors local repos every time someone runs ginkgo, I'm happy to take suggestions if anyone has any.
Limit the scope of the run-unit-test script: Historically we had paths for benchmarking and other things which have since been removed from the script. Since we don't use any of this historical functionality anywhere, I've slimmed the script down, and simplified the logic around conditionally skipping tests (i.e. gossip)

Signed-off-by: Brett Logan Brett.T.Logan@ibm.com

sykesm

I do not support adding even more complexity to the process. The reporting and wiring seems unnecessary.

For gossip, as is typical, we are addressing the symptom instead of the cause.

I’m not sure how we are doing this again: we want less magic in the build - not more. We are heading in the wrong direction.

lindluni · 2019-12-20T12:53:03Z

I decided to take a day before replying to this. I don't see any complexity in what this adds, as a matter of fact, it reduces the complexity of running unit tests, ask someone to tell you how we run unit tests today, and they may get back to you by the end of the week.

I'm not sure what you mean by "For gossip, as is typical, we are addressing the symptom instead of the cause", it was at your request that we added the conditional skipping, I didn't modify anything here, only simplified the logic.

What is magic about this? Adding reporting isn't magic, I don't think anyone expects that Azure is going to solve our flake problems for us, but it can certainly help us get better at identifying them. Today we rely on people opening Jira's to report when they've seen flakes (a heavy manual process). How many of those flakes do you think go unreported. It would be nice to know that a particular flake is being hit 3 times a week, versus another that is hit 200 times a week (which has actually been the case with some of our flakes recently).

Moreover, today we have multiple people who are dedicating all of their time to triaging flakes and CI failures, this can help reduce that part of their work, and allow them to focus on fixing problems. You want to fix a flake, great, go look at the flake report, see which one's have an open Jira, and which are most prevelant, start from the top and work your way down. This isn't about solving the problem "magically" for us, its about helping find and triage the problems.

I'm not sure how you would have us bring a better approach to flakes. Consider when we move to promotion pipelines, it will become even harder to detect flakes. In a multi-branch project, consider the case where commit 1 comes in and automated testing hits it, and then commit 2 and 3 come in and are queued waiting for commit 1 to determine if it should be auto-promoted. Commit 1 flakes out (and goes unreported in your non-reported scenario). Now commit 1, 2 and 3 are all tested in the next run with no flake, and not they are all promoted to the next branch, all while we miss the fact there was ever a flake in the place.

Help me understand what is a better way, I'm tired of people complaining that things are being done wrong, but not offering real help to fix them.

TLDR: Automated reporting helps us make sure flakes don't go unreported, and helps us triage flakes, and reduces the manpower needed to identify the most prevalent problems.

sykesm · 2019-12-20T14:07:10Z

Reports are not needed to evaluate pass/fail.
We have gone from running “go test” to having additional wrappers. That’s moving pieces and another dependency. This is complexity.
Ginkgo tests now require everyone to wire custom reporters. (Ginkgo is used in some units too.)
We need to focus on improving the speed and stability of the tests across the team not adding decorations that can be implemented in build.
I do not see how this helps with flakes; it is creating a dashboard view for tests.

If you simply wanted to fit the unit test script to remove the unnecessary bits, that would be great. That’s not all that’s in here. This PR makes structural changes to how tests are executed and bifurcates the process such that CI and developer tests do not use the same process. This is going backwards. CI builds and dev builds should be as consistent as possible.

On the gossip front, we have been fighting flakes for two years. It simply has not gotten the attention it requires by those that are most knowledgeable in the domain. We attempted to reduce the impact by skipping them when no transitive dependency changed. That worked well in gerrit but seems to not work so well now since they run most of the time. (I have a hunch that it’s because of how the PR branches are being pulled down in azure and I’m guessing the behavior is not well understood.) We also forced the gossip tests to run serially to avoid issues with port conflicts and timing. These are addressing symptoms; compromised based on resource.

In the serial/parallel front, it looks like the serial execution of gossip was dropped which means we will also hit port conflicts in from the tests.

In summary:

we need local test execution and CI execution to be as consistent as possible
we should focus on speed and stability over reporting.
we need to understand why gossip tests are running more frequently than they were with gerrit and jenkins

mastersingh24 · 2019-12-21T11:18:00Z

we need to understand why gossip tests are running more frequently than they were with gerrit and jenkins

I'd really like to get to the bottom of this as well

mastersingh24 · 2019-12-21T11:28:39Z

requires you to dig through your logs to figure out exactly where failures occurred.

This would not be such an issue if we actually learned how to do logging. We ABUSE logging ... especially debug logging

lindluni · 2019-12-27T16:21:07Z

I can't drop this front, I am willing to iterate on this till till I have nothing left. I believe in this so strongly, and it didn't come out of my own mind.

Danny, Dave and I sat in a room after his CI-focal week and we had a long conversation about what we can do to improve CI. Danny and I both agreed, the worst part is that we spend so much digging through failed tests to find flakes because people just don't report them. This is a HUGE struggle, You (Matt) are probably the only one who consistently reports them, and a couple of us do our best, but the vast majority go completely unreported.

Even worse, for every flake, there are a dozen Jira's open, some as old as 3 years, and almost always with no up to date information. More over, sometimes just finding an open Jira is impossible (even though it exists) when I am trying to determine if this is the first time seeing this flake.

Having test analytics gives you a one stop shop to solve all of these problems. You want to know if this is the first time a flake has been seen, go check the 30 day report. You want to know if a flake in an open Jira has been seen recently, go check the 14 day report. How do you know if a flake has been hit one time 5 days ago, or 100 times in the last 12 hours, unless you spend your time digging through failure logs, or praying people have opened Jira's (with names you can actually search on) with logs actually attached.

By adding analytics we can get out of this pattern of people having to dedicate all of their time to triaging CI, and scouring every failed CI test to check for flakes. I for one know how painful this is, and until you spend enough time doing it, it's hard to understand how frustrating it is. Now we have a central place to see how we are doing, and where we should focus our time, and instead of someone dedicating their 40 hours in a week to triaging CI, they can dedicate their 40 hours to fixing even one test flake.

On the gotestsum front, I understand, 100%, that adding dependencies is never taken lightly, but what it solves is profound. If we were to do JUnit with go test we would have to add the junit dependency, turn on verbose output, pipe the go test -v output to junit and pipe the results and coverage to standard out. We change one dependency gotestsum for another junit but eliminate a ton of complexity in doing so.

Moreover gotestsum gives us added benefits, in particular, it gives you an actual fail summary at the end of the run, unlike go test. I can show you go test failures that literally don't even show the failure aside from marking the package failed, there is absolutely no obvious reason for the failure.

We don't have to wire the JUnitReporter for the Unit Tests, gotestsum which will be used to drive unit tests, uses the test2json framework, which ginkgo supports. While it's not 100% perfect (it does dump the slow spec output into the report) it still collects failure information.

Before I ever considered using gotestsum, I actually had a long conversation with Jason about it. It was actually at his statement that the failed report at the end of it, is a huge improvement over go test and something he has always felt go test was missing.

There is no difference between people using gotestsum, go test, or ginkgo to drive their tests locally, gotestsum uses the exact same arguments as we used when we ran go test, so people can continue to use go test on the cli, or in my case, I use gotestsum the results are the same.

I get it, and I understand your point of view, but if we want to get past people having to dedicate their lives to triaging CI, we have to do something that gives us automated insight into our test flakes, and help improve our ability to identify them, and triage them when we are looking at open Jira's.

I've modified the code to remove ENABLE_JUNIT, I'm not trying to reinvent the wheel, there is a reason test reporting exists, having spent the last year 100% dedicated to CI, I believe with all my being that this is the right thing and will improve our ability to identify and fix flakes. The worst that happens is we decide to revert the change

- Introduce `gotestsum`: This change attempts to fix some of the short comings of the standard `go test` framework. The `gotestsum` is a wrapper around Go test using the test2json framework, and was written by CircleCI while they were working on mainstream support for Go. The benefits we are particularly interested in, are the ability to generate junit reports and coverage reports in a single command, `gotestsum` provides a summary of failures at the end of the test (including the function, stacktrace and line in the test that actually failed) , unlike `go test` which requires you to dig through your logs to figure out exactly where failures occurred. It also provides a clean UI of green check marks, and red exes to highlight failures. - Introduce JUnit reporting for Unit Tests: On merge we will now collect JUnit metrics. This will allow us to better triage failing unit tests by automatically collecting metrics, and being able to get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days. - Introduce JUnit reporting for Integration Tests: On merge we will now run integration tests and collect JUnit metrics. This will allow us to better triage failing integration tests by automatically collecting metrics, and being able to run get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days. - Limit the scope of the run-unit-test script: Historically we had paths for benchmarking and other things which have since been removed from the script. Since we don't use any of this historical functionality anywhere, I've slimmed the script down, and simplified the logic around conditionally skipping tests (i.e. **gossip**) Signed-off-by: Brett Logan <Brett.T.Logan@ibm.com>

Fix the result message on running "./network.sh createChannel" The result message format has been changed from the change of test-network/scripts/createChannel.sh in hyperledger/fabric-samples since b690d8f Stop using deprecated outputAnchorPeersUpdate in test-network (hyperledger#394) Signed-off-by: Justin Yang <justin.yang@themedium.io>

Fix the result message on running "./network.sh createChannel" The result message format has been changed from the change of test-network/scripts/createChannel.sh in hyperledger/fabric-samples since b690d8f Stop using deprecated outputAnchorPeersUpdate in test-network (#394) Signed-off-by: Justin Yang <justin.yang@themedium.io>

lindluni changed the title ~~[DRAFT] Improve Unit-Tests~~ [DRAFT] [NOT FOR REVIEW] Improve Unit-Tests Dec 12, 2019

lindluni changed the title ~~[DRAFT] [NOT FOR REVIEW] Improve Unit-Tests~~ [DRAFT] [NOT FOR REVIEW] Improve Unit-Test Framework Dec 12, 2019

lindluni changed the title ~~[DRAFT] [NOT FOR REVIEW] Improve Unit-Test Framework~~ Improve Unit-Test Framework Dec 19, 2019

lindluni marked this pull request as ready for review December 19, 2019 10:20

lindluni requested review from ale-linux, binhn, C0rWin, christo4ferris, denyeart, guoger, jyellick, kchristidis, manish-sethi, mastersingh24, muralisrini, sykesm and yacovm as code owners December 19, 2019 10:20

lindluni changed the title ~~Improve Unit-Test Framework~~ [FAB-17286] Improve Unit-Test Framework Dec 19, 2019

sykesm suggested changes Dec 19, 2019

View reviewed changes

lindluni requested a review from a team as a code owner January 6, 2020 13:09

sykesm self-assigned this Jan 7, 2020

lindluni closed this Jan 9, 2020

lindluni deleted the junit branch April 15, 2020 02:57

justin-themedium mentioned this pull request Sep 16, 2021

Fix the result message in test_network.md #2924

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FAB-17286] Improve Unit-Test Framework #394

[FAB-17286] Improve Unit-Test Framework #394

lindluni commented Dec 11, 2019 •

edited

Loading

sykesm left a comment

lindluni commented Dec 20, 2019

sykesm commented Dec 20, 2019

mastersingh24 commented Dec 21, 2019

mastersingh24 commented Dec 21, 2019

lindluni commented Dec 27, 2019

[FAB-17286] Improve Unit-Test Framework #394

[FAB-17286] Improve Unit-Test Framework #394

Conversation

lindluni commented Dec 11, 2019 • edited Loading

sykesm left a comment

Choose a reason for hiding this comment

lindluni commented Dec 20, 2019

sykesm commented Dec 20, 2019

mastersingh24 commented Dec 21, 2019

mastersingh24 commented Dec 21, 2019

lindluni commented Dec 27, 2019

lindluni commented Dec 11, 2019 •

edited

Loading