-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FAB-17286] Improve Unit-Test Framework #394
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not support adding even more complexity to the process. The reporting and wiring seems unnecessary.
For gossip, as is typical, we are addressing the symptom instead of the cause.
I’m not sure how we are doing this again: we want less magic in the build - not more. We are heading in the wrong direction.
I decided to take a day before replying to this. I don't see any complexity in what this adds, as a matter of fact, it reduces the complexity of running unit tests, ask someone to tell you how we run unit tests today, and they may get back to you by the end of the week. I'm not sure what you mean by "For gossip, as is typical, we are addressing the symptom instead of the cause", it was at your request that we added the conditional skipping, I didn't modify anything here, only simplified the logic. What is magic about this? Adding reporting isn't magic, I don't think anyone expects that Azure is going to solve our flake problems for us, but it can certainly help us get better at identifying them. Today we rely on people opening Jira's to report when they've seen flakes (a heavy manual process). How many of those flakes do you think go unreported. It would be nice to know that a particular flake is being hit 3 times a week, versus another that is hit 200 times a week (which has actually been the case with some of our flakes recently). Moreover, today we have multiple people who are dedicating all of their time to triaging flakes and CI failures, this can help reduce that part of their work, and allow them to focus on fixing problems. You want to fix a flake, great, go look at the flake report, see which one's have an open Jira, and which are most prevelant, start from the top and work your way down. This isn't about solving the problem "magically" for us, its about helping find and triage the problems. I'm not sure how you would have us bring a better approach to flakes. Consider when we move to promotion pipelines, it will become even harder to detect flakes. In a multi-branch project, consider the case where commit 1 comes in and automated testing hits it, and then commit 2 and 3 come in and are queued waiting for commit 1 to determine if it should be auto-promoted. Commit 1 flakes out (and goes unreported in your non-reported scenario). Now commit 1, 2 and 3 are all tested in the next run with no flake, and not they are all promoted to the next branch, all while we miss the fact there was ever a flake in the place. Help me understand what is a better way, I'm tired of people complaining that things are being done wrong, but not offering real help to fix them. TLDR: Automated reporting helps us make sure flakes don't go unreported, and helps us triage flakes, and reduces the manpower needed to identify the most prevalent problems. |
If you simply wanted to fit the unit test script to remove the unnecessary bits, that would be great. That’s not all that’s in here. This PR makes structural changes to how tests are executed and bifurcates the process such that CI and developer tests do not use the same process. This is going backwards. CI builds and dev builds should be as consistent as possible. On the gossip front, we have been fighting flakes for two years. It simply has not gotten the attention it requires by those that are most knowledgeable in the domain. We attempted to reduce the impact by skipping them when no transitive dependency changed. That worked well in gerrit but seems to not work so well now since they run most of the time. (I have a hunch that it’s because of how the PR branches are being pulled down in azure and I’m guessing the behavior is not well understood.) We also forced the gossip tests to run serially to avoid issues with port conflicts and timing. These are addressing symptoms; compromised based on resource. In the serial/parallel front, it looks like the serial execution of gossip was dropped which means we will also hit port conflicts in from the tests. In summary:
|
I'd really like to get to the bottom of this as well |
This would not be such an issue if we actually learned how to do logging. We ABUSE logging ... especially debug logging |
I can't drop this front, I am willing to iterate on this till till I have nothing left. I believe in this so strongly, and it didn't come out of my own mind. Danny, Dave and I sat in a room after his CI-focal week and we had a long conversation about what we can do to improve CI. Danny and I both agreed, the worst part is that we spend so much digging through failed tests to find flakes because people just don't report them. This is a HUGE struggle, You (Matt) are probably the only one who consistently reports them, and a couple of us do our best, but the vast majority go completely unreported. Even worse, for every flake, there are a dozen Jira's open, some as old as 3 years, and almost always with no up to date information. More over, sometimes just finding an open Jira is impossible (even though it exists) when I am trying to determine if this is the first time seeing this flake. Having test analytics gives you a one stop shop to solve all of these problems. You want to know if this is the first time a flake has been seen, go check the 30 day report. You want to know if a flake in an open Jira has been seen recently, go check the 14 day report. How do you know if a flake has been hit one time 5 days ago, or 100 times in the last 12 hours, unless you spend your time digging through failure logs, or praying people have opened Jira's (with names you can actually search on) with logs actually attached. By adding analytics we can get out of this pattern of people having to dedicate all of their time to triaging CI, and scouring every failed CI test to check for flakes. I for one know how painful this is, and until you spend enough time doing it, it's hard to understand how frustrating it is. Now we have a central place to see how we are doing, and where we should focus our time, and instead of someone dedicating their 40 hours in a week to triaging CI, they can dedicate their 40 hours to fixing even one test flake. On the Moreover We don't have to wire the JUnitReporter for the Unit Tests, Before I ever considered using gotestsum, I actually had a long conversation with Jason about it. It was actually at his statement that the failed report at the end of it, is a huge improvement over There is no difference between people using I get it, and I understand your point of view, but if we want to get past people having to dedicate their lives to triaging CI, we have to do something that gives us automated insight into our test flakes, and help improve our ability to identify them, and triage them when we are looking at open Jira's. I've modified the code to remove ENABLE_JUNIT, I'm not trying to reinvent the wheel, there is a reason test reporting exists, having spent the last year 100% dedicated to CI, I believe with all my being that this is the right thing and will improve our ability to identify and fix flakes. The worst that happens is we decide to revert the change |
- Introduce `gotestsum`: This change attempts to fix some of the short comings of the standard `go test` framework. The `gotestsum` is a wrapper around Go test using the test2json framework, and was written by CircleCI while they were working on mainstream support for Go. The benefits we are particularly interested in, are the ability to generate junit reports and coverage reports in a single command, `gotestsum` provides a summary of failures at the end of the test (including the function, stacktrace and line in the test that actually failed) , unlike `go test` which requires you to dig through your logs to figure out exactly where failures occurred. It also provides a clean UI of green check marks, and red exes to highlight failures. - Introduce JUnit reporting for Unit Tests: On merge we will now collect JUnit metrics. This will allow us to better triage failing unit tests by automatically collecting metrics, and being able to get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days. - Introduce JUnit reporting for Integration Tests: On merge we will now run integration tests and collect JUnit metrics. This will allow us to better triage failing integration tests by automatically collecting metrics, and being able to run get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days. - Limit the scope of the run-unit-test script: Historically we had paths for benchmarking and other things which have since been removed from the script. Since we don't use any of this historical functionality anywhere, I've slimmed the script down, and simplified the logic around conditionally skipping tests (i.e. **gossip**) Signed-off-by: Brett Logan <Brett.T.Logan@ibm.com>
Fix the result message on running "./network.sh createChannel" The result message format has been changed from the change of test-network/scripts/createChannel.sh in hyperledger/fabric-samples since b690d8f Stop using deprecated outputAnchorPeersUpdate in test-network (hyperledger#394) Signed-off-by: Justin Yang <justin.yang@themedium.io>
Fix the result message on running "./network.sh createChannel" The result message format has been changed from the change of test-network/scripts/createChannel.sh in hyperledger/fabric-samples since b690d8f Stop using deprecated outputAnchorPeersUpdate in test-network (#394) Signed-off-by: Justin Yang <justin.yang@themedium.io>
There were several changes I've made here to attempt to address some feedback I've gotten in conversation with Jason while he was trying to do reviews and keep up with making sure CI was passed, as well as debugging CI since we've moved:
Introduce
gotestsum
: This change attempts to fix some of the short comings of the standardgo test
framework.gotestsum
is a wrapper aroundgo test
using the test2json framework, and was written (and maintained) by CircleCI while they were working on mainstream support for Go. The benefits we are particularly interested in, are the ability to generate junit reports and coverage reports in a single command,gotestsum
provides a summary of failures at the end of the test (including the function, stacktrace and line in the test that actually failed) , unlikego test
which requires you to dig through your logs to figure out exactly where failures occurred. It also provides a clean UI of green check marks, and red exes to highlight failures.Introduce JUnit reporting for Unit Tests: On merge we will now collect JUnit metrics. This will allow us to better triage failing unit tests by automatically collecting metrics, and being able to get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days.
Introduce JUnit reporting for Integration Tests: On merge we will now run integration tests and collect JUnit metrics. This will allow us to better triage failing integration tests by automatically collecting metrics, and being able to run get an instant snapshot in Azure DevOp's Analytics of historical failures over the last 1, 7, 14 and 30 days. I am not particularly happy about the introduction of the ENABLE_JUNIT environment variable to collect JUnit reports in Ginkgo, but Ginkgo's only mechanism for collecting JUnit reports is by adding the reporters to code, there is no command line flag. Adding the ENABLE_JUNIT flag prevents us from polluting contributors local repos every time someone runs
ginkgo
, I'm happy to take suggestions if anyone has any.Limit the scope of the run-unit-test script: Historically we had paths for benchmarking and other things which have since been removed from the script. Since we don't use any of this historical functionality anywhere, I've slimmed the script down, and simplified the logic around conditionally skipping tests (i.e. gossip)
Signed-off-by: Brett Logan Brett.T.Logan@ibm.com