Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSN and ids Serialize impls force unnecessary copy-paste #3511

Closed
koivunej opened this issue Feb 1, 2023 · 5 comments · Fixed by #5335
Closed

LSN and ids Serialize impls force unnecessary copy-paste #3511

koivunej opened this issue Feb 1, 2023 · 5 comments · Fixed by #5335
Labels
a/tech_debt Area: related to tech debt m/good_first_issue Moment: when doing your first Neon contributions

Comments

@koivunej
Copy link
Member

koivunej commented Feb 1, 2023

Repeated:

#[serde_with::serde_as]
#[derive(serde::Serialize)]
struct Foo {
  #[serde_as(as = "DisplayFromStr")]
  lsn_start: Lsn,
}

This could be just:

#[deride(serde::Serialize)]
struct Foo {
  lsn_start: Lsn,
}

If we'd query the serializer for https://docs.rs/serde/latest/serde/trait.Serializer.html#method.is_human_readable in the serde::Serialize impl for LSN, id and then decide to use stringified (for json) or u64 (for bincode).

This has been a bigger issue with ids by default serializing to array of bytes.

Deserialize impls should handle strings and u64 or string and array of bytes (haven't checked if they already do).

(I haven't yet implemented this so there might be some issues.)

@koivunej koivunej added the a/tech_debt Area: related to tech debt label Feb 1, 2023
@SomeoneToIgnore
Copy link
Contributor

Deserialize impls should handle strings and u64 or string and array of bytes (haven't checked if they already do).

AFAIK, this was the issue last time I've tried to implement this: I failed to distinguish "bytes of a json string" from "bytes of a bincode serialization", causing obscure bugs in tests.

@koivunej
Copy link
Member Author

koivunej commented Sep 6, 2023

There is #[serde(with = "hex")] as well in safekeeper code, see: #5163 (comment)

@koivunej koivunej added the m/good_first_issue Moment: when doing your first Neon contributions label Sep 13, 2023
@koivunej
Copy link
Member Author

koivunej commented Sep 13, 2023

Marking this as good_first_issue.

First thing which needs to be done here is to add image tests for both bincode OR serde-json serialized types which have LSN, TenantId, OR TimelineId within them. Tests should be something like:

#[test]
fn ensure_type_roundtrips() {
  let val = /* creation omitted */;
  
  let expected = b"....";  // for serde-json
  let expected = [...] or hex; // for bincode, whichever you find more aesthetic
  
  assert_eq!(&serialize(&val), &expected);
  assert_eq!(deserialize(&expected).unwrap(), val);
}

After that in the Serialize implementation my plan was to use https://docs.rs/serde/latest/serde/trait.Serializer.html#method.is_human_readable to query which kind of serializer it is, and in the Deserialize implementation, support both kind of values with the ability to drop one for the other (bincode wants a byte array for TenantId, json wants a hex string -- might not be great to support deserializing both, but might be).

I haven't yet implemented this so there might be some issues, but after adding the tests and making the modifications to the serialize and deserialize impls, it would be safe to start removing the copypaste, and tests should tell us if the idea was good. Suggest starting out with one type of each serialization.

Lsn within bincode is most likely just a u64. Within serde-json it is {upper_u32_hex}/{lower_u32_hex}.

@koivunej
Copy link
Member Author

Alternatively a new test type with all (Lsn, TenantId, TimelineId) could be created as two variants, one for bincode and other for serde_json. Then tests could be created for that type and the different formats roundtripping.

However, it would still be better to find the two examples and show that the really used serialization format does not change. We only have "image based" testing for index_part.json, which is not much.

@duguorong009
Copy link
Contributor

I would like to take this issue!

@koivunej
I created the draft PR with Lsn serde implementation.
Can you take a look & give me comments?
Thx!

koivunej added a commit that referenced this issue Nov 6, 2023
…melineId` ...) (#5335)

Improve the serde impl for several types (`Lsn`, `TenantId`,
`TimelineId`) by making them sensitive to
`Serializer::is_human_readadable` (true for json, false for bincode).

Fixes #3511 by:
- Implement the custom serde for `Lsn`
- Implement the custom serde for `Id`
- Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs`
- Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in
all possible structs

Additionally some safekeeper types gained serde tests.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt m/good_first_issue Moment: when doing your first Neon contributions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants