-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests linearizability: trigger snapshot related failpoints #15104
Conversation
Looks promising, great work! Suggestions:
I managed to get it working. Please take a look serathius@a13adec What is left is integrate snapshot failpoints into existing tests. As snapshot failpoints can only be triggered in follower, they cannot be enabled in single node cluster. You should be able to skip those failpoints if you change |
Codecov Report
@@ Coverage Diff @@
## main #15104 +/- ##
==========================================
- Coverage 74.75% 74.62% -0.14%
==========================================
Files 415 415
Lines 34341 34341
==========================================
- Hits 25673 25628 -45
- Misses 7038 7075 +37
- Partials 1630 1638 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
ping @lavacat, are you planning to finish this? |
@serathius yes, sorry, I got sidetracked on another issue. |
ping, this change is blocking #15102 |
@serathius I've enabled debug logs for proxy and it reports correct info.
But blackholed process still receives non-empty entries in |
Please let me know if you need any help. |
@serathius if you have time, please take a look. I'll continue investigating as well. It seams to me that after blackholing traffic, peer should get deactivated and logs should show |
I would guess that reason is that blackholing just drops the packet, but doesn't interrupt the connection. |
Ok, I think explanation of this issue is that cutting traffic between Here is an example of Procfile to reproduce this. etcd1 has different
|
Each peer X registers a writer with other peers, so they can write msg addressed to X. That's my understanding based on debugging. I think this isn't a bug but not much docs around design. Still not sure why blackholing works after some time. But to fix the test we need to blackhole pipeline traffic somehow. |
Great job on investigation! Seems like an issue with using proxy for network blackholing, however we can still workaround this. Let's focus on getting PR merged, leave 1000 for now and create an follow up issue. |
f877cd3
to
d983b86
Compare
@serathius made a small improvement - instead of waiting for 1000 revs, check revLeader - revBlackholedMember. In terms of stability, sometimes I get
|
That's great. The error you linked is usually connected to incorrect history merging or patching. Still it's weird as I thought I addressed all of those. |
Please let me know when PR is ready for review. |
@serathius I've run into another Otherwise PR is ready. Maybe we can merge but disable this scenario while I debug the problem with |
Sounds good. Please comment the scenario out and leave a TODO to fix it and restore it. |
Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
d983b86
to
91b0569
Compare
@serathius I think functional test failure is a flake. Documented it #14826 (comment) Can you please trigger tests rerun? |
@lavacat Looked at the file you send, it looks like a data inconsistency. Will investigate it more. |
fixes #14726, except
raftBeforeFollowerSend
Signed-off-by: Bogdan Kanivets bkanivets@apple.com
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.