Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FAB-17286] Improve Unit-Test Framework #394

Closed
wants to merge 1 commit into from
Closed

[FAB-17286] Improve Unit-Test Framework #394

wants to merge 1 commit into from

Conversation

lindluni
Copy link
Contributor

@lindluni lindluni commented Dec 11, 2019

There were several changes I've made here to attempt to address some feedback I've gotten in conversation with Jason while he was trying to do reviews and keep up with making sure CI was passed, as well as debugging CI since we've moved:

  • Introduce gotestsum: This change attempts to fix some of the short comings of the standard go test framework. gotestsum is a wrapper around go test using the test2json framework, and was written (and maintained) by CircleCI while they were working on mainstream support for Go. The benefits we are particularly interested in, are the ability to generate junit reports and coverage reports in a single command, gotestsum provides a summary of failures at the end of the test (including the function, stacktrace and line in the test that actually failed) , unlike go test which requires you to dig through your logs to figure out exactly where failures occurred. It also provides a clean UI of green check marks, and red exes to highlight failures.

  • Introduce JUnit reporting for Unit Tests: On merge we will now collect JUnit metrics. This will allow us to better triage failing unit tests by automatically collecting metrics, and being able to get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days.

  • Introduce JUnit reporting for Integration Tests: On merge we will now run integration tests and collect JUnit metrics. This will allow us to better triage failing integration tests by automatically collecting metrics, and being able to run get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days. I am not particularly happy about the introduction of the ENABLE_JUNIT environment variable to collect JUnit reports in Ginkgo, but Ginkgo's only mechanism for collecting JUnit reports is by adding the reporters to code, there is no command line flag. Adding the ENABLE_JUNIT flag prevents us from polluting contributors local repos every time someone runs ginkgo, I'm happy to take suggestions if anyone has any.

  • Limit the scope of the run-unit-test script: Historically we had paths for benchmarking and other things which have since been removed from the script. Since we don't use any of this historical functionality anywhere, I've slimmed the script down, and simplified the logic around conditionally skipping tests (i.e. gossip)

Signed-off-by: Brett Logan Brett.T.Logan@ibm.com

@lindluni lindluni changed the title [DRAFT] Improve Unit-Tests [DRAFT] [NOT FOR REVIEW] Improve Unit-Tests Dec 12, 2019
@lindluni lindluni changed the title [DRAFT] [NOT FOR REVIEW] Improve Unit-Tests [DRAFT] [NOT FOR REVIEW] Improve Unit-Test Framework Dec 12, 2019
@lindluni lindluni changed the title [DRAFT] [NOT FOR REVIEW] Improve Unit-Test Framework Improve Unit-Test Framework Dec 19, 2019
@lindluni lindluni marked this pull request as ready for review December 19, 2019 10:20
@lindluni lindluni changed the title Improve Unit-Test Framework [FAB-17286] Improve Unit-Test Framework Dec 19, 2019
Copy link
Contributor

@sykesm sykesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not support adding even more complexity to the process. The reporting and wiring seems unnecessary.

For gossip, as is typical, we are addressing the symptom instead of the cause.

I’m not sure how we are doing this again: we want less magic in the build - not more. We are heading in the wrong direction.

@lindluni
Copy link
Contributor Author

I decided to take a day before replying to this. I don't see any complexity in what this adds, as a matter of fact, it reduces the complexity of running unit tests, ask someone to tell you how we run unit tests today, and they may get back to you by the end of the week.

I'm not sure what you mean by "For gossip, as is typical, we are addressing the symptom instead of the cause", it was at your request that we added the conditional skipping, I didn't modify anything here, only simplified the logic.

What is magic about this? Adding reporting isn't magic, I don't think anyone expects that Azure is going to solve our flake problems for us, but it can certainly help us get better at identifying them. Today we rely on people opening Jira's to report when they've seen flakes (a heavy manual process). How many of those flakes do you think go unreported. It would be nice to know that a particular flake is being hit 3 times a week, versus another that is hit 200 times a week (which has actually been the case with some of our flakes recently).

Moreover, today we have multiple people who are dedicating all of their time to triaging flakes and CI failures, this can help reduce that part of their work, and allow them to focus on fixing problems. You want to fix a flake, great, go look at the flake report, see which one's have an open Jira, and which are most prevelant, start from the top and work your way down. This isn't about solving the problem "magically" for us, its about helping find and triage the problems.

I'm not sure how you would have us bring a better approach to flakes. Consider when we move to promotion pipelines, it will become even harder to detect flakes. In a multi-branch project, consider the case where commit 1 comes in and automated testing hits it, and then commit 2 and 3 come in and are queued waiting for commit 1 to determine if it should be auto-promoted. Commit 1 flakes out (and goes unreported in your non-reported scenario). Now commit 1, 2 and 3 are all tested in the next run with no flake, and not they are all promoted to the next branch, all while we miss the fact there was ever a flake in the place.

Help me understand what is a better way, I'm tired of people complaining that things are being done wrong, but not offering real help to fix them.

TLDR: Automated reporting helps us make sure flakes don't go unreported, and helps us triage flakes, and reduces the manpower needed to identify the most prevalent problems.

@sykesm
Copy link
Contributor

sykesm commented Dec 20, 2019

  1. Reports are not needed to evaluate pass/fail.
  2. We have gone from running “go test” to having additional wrappers. That’s moving pieces and another dependency. This is complexity.
  3. Ginkgo tests now require everyone to wire custom reporters. (Ginkgo is used in some units too.)
  4. We need to focus on improving the speed and stability of the tests across the team not adding decorations that can be implemented in build.
  5. I do not see how this helps with flakes; it is creating a dashboard view for tests.

If you simply wanted to fit the unit test script to remove the unnecessary bits, that would be great. That’s not all that’s in here. This PR makes structural changes to how tests are executed and bifurcates the process such that CI and developer tests do not use the same process. This is going backwards. CI builds and dev builds should be as consistent as possible.

On the gossip front, we have been fighting flakes for two years. It simply has not gotten the attention it requires by those that are most knowledgeable in the domain. We attempted to reduce the impact by skipping them when no transitive dependency changed. That worked well in gerrit but seems to not work so well now since they run most of the time. (I have a hunch that it’s because of how the PR branches are being pulled down in azure and I’m guessing the behavior is not well understood.) We also forced the gossip tests to run serially to avoid issues with port conflicts and timing. These are addressing symptoms; compromised based on resource.

In the serial/parallel front, it looks like the serial execution of gossip was dropped which means we will also hit port conflicts in from the tests.

In summary:

  • we need local test execution and CI execution to be as consistent as possible
  • we should focus on speed and stability over reporting.
  • we need to understand why gossip tests are running more frequently than they were with gerrit and jenkins

@mastersingh24
Copy link
Contributor

  • we need to understand why gossip tests are running more frequently than they were with gerrit and jenkins

I'd really like to get to the bottom of this as well

@mastersingh24
Copy link
Contributor

  • requires you to dig through your logs to figure out exactly where failures occurred.

This would not be such an issue if we actually learned how to do logging. We ABUSE logging ... especially debug logging

@lindluni
Copy link
Contributor Author

I can't drop this front, I am willing to iterate on this till till I have nothing left. I believe in this so strongly, and it didn't come out of my own mind.

Danny, Dave and I sat in a room after his CI-focal week and we had a long conversation about what we can do to improve CI. Danny and I both agreed, the worst part is that we spend so much digging through failed tests to find flakes because people just don't report them. This is a HUGE struggle, You (Matt) are probably the only one who consistently reports them, and a couple of us do our best, but the vast majority go completely unreported.

Even worse, for every flake, there are a dozen Jira's open, some as old as 3 years, and almost always with no up to date information. More over, sometimes just finding an open Jira is impossible (even though it exists) when I am trying to determine if this is the first time seeing this flake.

Having test analytics gives you a one stop shop to solve all of these problems. You want to know if this is the first time a flake has been seen, go check the 30 day report. You want to know if a flake in an open Jira has been seen recently, go check the 14 day report. How do you know if a flake has been hit one time 5 days ago, or 100 times in the last 12 hours, unless you spend your time digging through failure logs, or praying people have opened Jira's (with names you can actually search on) with logs actually attached.

By adding analytics we can get out of this pattern of people having to dedicate all of their time to triaging CI, and scouring every failed CI test to check for flakes. I for one know how painful this is, and until you spend enough time doing it, it's hard to understand how frustrating it is. Now we have a central place to see how we are doing, and where we should focus our time, and instead of someone dedicating their 40 hours in a week to triaging CI, they can dedicate their 40 hours to fixing even one test flake.

On the gotestsum front, I understand, 100%, that adding dependencies is never taken lightly, but what it solves is profound. If we were to do JUnit with go test we would have to add the junit dependency, turn on verbose output, pipe the go test -v output to junit and pipe the results and coverage to standard out. We change one dependency gotestsum for another junit but eliminate a ton of complexity in doing so.

Moreover gotestsum gives us added benefits, in particular, it gives you an actual fail summary at the end of the run, unlike go test. I can show you go test failures that literally don't even show the failure aside from marking the package failed, there is absolutely no obvious reason for the failure.

We don't have to wire the JUnitReporter for the Unit Tests, gotestsum which will be used to drive unit tests, uses the test2json framework, which ginkgo supports. While it's not 100% perfect (it does dump the slow spec output into the report) it still collects failure information.

Before I ever considered using gotestsum, I actually had a long conversation with Jason about it. It was actually at his statement that the failed report at the end of it, is a huge improvement over go test and something he has always felt go test was missing.

There is no difference between people using gotestsum, go test, or ginkgo to drive their tests locally, gotestsum uses the exact same arguments as we used when we ran go test, so people can continue to use go test on the cli, or in my case, I use gotestsum the results are the same.

I get it, and I understand your point of view, but if we want to get past people having to dedicate their lives to triaging CI, we have to do something that gives us automated insight into our test flakes, and help improve our ability to identify them, and triage them when we are looking at open Jira's.

I've modified the code to remove ENABLE_JUNIT, I'm not trying to reinvent the wheel, there is a reason test reporting exists, having spent the last year 100% dedicated to CI, I believe with all my being that this is the right thing and will improve our ability to identify and fix flakes. The worst that happens is we decide to revert the change

- Introduce `gotestsum`: This change attempts to fix some of the short comings
of the standard `go test` framework. The `gotestsum` is a wrapper around Go test
using the test2json framework, and was written by CircleCI while they were
working on mainstream support for Go. The benefits we are particularly interested in,
are the ability to generate junit reports and coverage reports in a single command,
`gotestsum` provides a summary of failures at the end of the test
(including the function, stacktrace and line in the test that actually failed) ,
unlike `go test` which requires you to dig through your logs to figure out exactly
where failures occurred. It also provides  a clean UI of green check marks, and red exes
to highlight failures.

- Introduce JUnit reporting for Unit Tests: On merge we will now collect JUnit metrics.
This will allow us to better triage failing unit tests by automatically collecting metrics,
and being able to get an instant snapshot in Azure DevOp's Analytics of historical failures
over the last 1, 7, 14 and 30 days.

- Introduce JUnit reporting for Integration Tests: On merge we will now run integration
tests and collect JUnit metrics. This will allow us to better triage failing integration
tests by automatically collecting metrics, and being able to run get an instant snapshot
in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days.

- Limit the scope of the run-unit-test script: Historically we had paths for benchmarking
and other things which have since been removed from the script. Since we don't use any of
this historical functionality anywhere, I've slimmed the script down, and simplified the
logic around conditionally skipping tests (i.e. **gossip**)

Signed-off-by: Brett Logan <Brett.T.Logan@ibm.com>
@lindluni lindluni requested a review from a team as a code owner January 6, 2020 13:09
@sykesm sykesm self-assigned this Jan 7, 2020
@lindluni lindluni closed this Jan 9, 2020
@lindluni lindluni deleted the junit branch April 15, 2020 02:57
justin-themedium added a commit to justin-themedium/fabric that referenced this pull request Sep 16, 2021
Fix the result message on running "./network.sh createChannel"

The result message format has been changed from the change of
test-network/scripts/createChannel.sh in hyperledger/fabric-samples
since
b690d8f Stop using deprecated outputAnchorPeersUpdate in test-network (hyperledger#394)

Signed-off-by: Justin Yang <justin.yang@themedium.io>
denyeart pushed a commit that referenced this pull request Sep 16, 2021
Fix the result message on running "./network.sh createChannel"

The result message format has been changed from the change of
test-network/scripts/createChannel.sh in hyperledger/fabric-samples
since
b690d8f Stop using deprecated outputAnchorPeersUpdate in test-network (#394)

Signed-off-by: Justin Yang <justin.yang@themedium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants