-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: deletion queue & generation validation for deletions #5207
Conversation
2520 tests run: 2403 passed, 0 failed, 117 skipped (full report)Flaky tests (5)Postgres 16
Postgres 14
Code coverage (full report)
The comment gets automatically updated with the latest test results
8af4131 at 2023-09-26T15:05:57.985Z :recycle: |
afb48d0
to
13f45cf
Compare
…#5231) ## Problem Currently our testing environment only supports running a single pageserver at a time. This is insufficient for testing failover and migrations. - Dependency of writing tests for #5207 ## Summary of changes - `neon_local` and `neon_fixture` now handle multiple pageservers - This is a breaking change to the `.neon/config` format: any local environments will need recreating - Existing tests continue to work unchanged: - The default number of pageservers is 1 - `NeonEnv.pageserver` is now a helper property that retrieves the first pageserver if there is only one, else throws. - Pageserver data directories are now at `.neon/pageserver_{n}` where n is 1,2,3... - Compatibility tests get some special casing to migrate neon_local configs: these are not meant to be backward/forward compatible, but they were treated that way by the test.
This is just for testing. Eventually we'll remove this after everything is upgraded.
This controls the lifetime of the MockDeletionQueue.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big chunk but let's handle rest of changes if any after merging.
consistent LSN updates and deletions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reviewed validation & recovery.
I'm having trouble keeping track of error handling correctness in validator.rs when we write to disk.
At the offsite, we agreed that local IO error should crash the pageserver.
Started Slack thread about that.
Other than that, only few small remarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed in Slack that there will be a follow-up PR that will make the pageserver process abort/panic/exit if there are any local filesystem IO errors encountered by deletion queue.
With that, I'm OK with merging this PR.
(I still dislike the over-use of the term "flush", but, let's take that to a follow-up if any.)
This hit a test failure that didn't have an existing ticket: #5385 -- I looked at the test and I don't think it's touching anything in that changed in this PR, and the test also looks like it might be racy by not waiting for pageserver ingest after it writes to the database. Will follow up separately. |
(re-ran failed) |
Problem
Pageservers must not delete objects or advertise updates to remote_consistent_lsn without checking that they hold the latest generation for the tenant in question (see the RFC)
In this PR:
RemoteTimelineClient
is modified to send deletions through the deletion queue:last_uploaded_consistent_lsn
value inUploadQueue
is replaced with a mechanism that maintains a "projected" lsn (equivalent to the previous property), and a "visible" LSN (which is the one that we may share with safekeepers).control_plane_api
is set, all deletions skip generation validationtest_pageserver_generations.py
Once this lands, if a pageserver is configured with the
control_plane_api
configuration added in #5163, it becomes safe to attach a tenant to multiple pageservers concurrently.Summary of changes
Checklist before requesting a review
Checklist before merging