Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver/*, wal/*: changes to snapshots and WAL logic to fix #10219 #10356

Closed
wants to merge 5 commits into from

Conversation

brk0v
Copy link

@brk0v brk0v commented Dec 27, 2018

Proposal to fix inconsistent state between WAL and saved Snapshot #10219

Saving logic was changed from the current:

  1. Save to WAL: entries and hard state (storage.Save())
  2. Save snapshot to disk (storage.WAL.SaveSnapshot())
  3. Save snapshot meta data to WAL (storage.Snapshotter.SaveSnap())
  4. Release WAL entries up to snapshot (storage.WAL.ReleaseLockTo())
  5. Apply snapshot (raftStorage.ApplySnapshot())

to

  1. Save snapshot data to snapshot file (storage.SaveSnapshot())
  2. Sequentially save to WAL: entries, snapshot meta data (if exists) and hard state (storage.SaveAll())
  3. Apply snapshot (raftStorage.ApplySnapshot())
  4. Release WAL entries (storage.Release())

Also added a snapshot check step during a node starting/restarting to prevent loading a snapshot that doesn't have a proper WAL record.

@codecov-io
Copy link

codecov-io commented Dec 27, 2018

Codecov Report

Merging #10356 into master will decrease coverage by 0.09%.
The diff coverage is 52.77%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #10356     +/-   ##
=========================================
- Coverage    71.6%   71.51%   -0.1%     
=========================================
  Files         392      392             
  Lines       36461    36522     +61     
=========================================
+ Hits        26109    26119     +10     
- Misses       8524     8576     +52     
+ Partials     1828     1827      -1
Impacted Files Coverage Δ
etcdserver/raft.go 79.39% <10%> (-1.42%) ⬇️
wal/wal.go 53.73% <100%> (+0.22%) ⬆️
pkg/mock/mockstorage/storage_recorder.go 100% <100%> (ø) ⬆️
etcdserver/server.go 73.99% <30%> (-1.19%) ⬇️
etcdserver/api/snap/snapshotter.go 60.28% <55.55%> (-0.33%) ⬇️
etcdserver/storage.go 53.52% <70.83%> (+9.52%) ⬆️
pkg/adt/interval_tree.go 80.18% <0%> (-8.41%) ⬇️
pkg/tlsutil/tlsutil.go 86.2% <0%> (-6.9%) ⬇️
clientv3/namespace/watch.go 87.87% <0%> (-6.07%) ⬇️
etcdserver/v2_server.go 80.76% <0%> (-3.85%) ⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de8e29e...389c1a0. Read the comment docs.

@jingyih
Copy link
Contributor

jingyih commented Feb 4, 2019

cc @jpbetz

@@ -125,6 +126,23 @@ func (s *Snapshotter) Load() (*raftpb.Snapshot, error) {
return snap, nil
}

func (s *Snapshotter) LoadIndex(i uint64) (*raftpb.Snapshot, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document function.

return nil, err

// Find a snapshot to start/restart a raft node
for i := uint64(0); ; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be extracted out into a separate function?

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@stale stale bot closed this Apr 27, 2020
@jpbetz jpbetz reopened this May 13, 2020
@stale stale bot removed the stale label May 13, 2020
@jpbetz
Copy link
Contributor

jpbetz commented May 13, 2020

I'd like to get this fixed. @brk0v any interest in following up? If not, I'm happy to shepherd it through.

@jpbetz
Copy link
Contributor

jpbetz commented May 13, 2020

@jingyih When you get some cycles, would you also review this? I'd like to get this fix (or some variant of it) merged.

@jpbetz
Copy link
Contributor

jpbetz commented May 14, 2020

Superseded by #11888 (which retains the commits from this PR with full attribution).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants