Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid 64-bit arithmetic in LEB encoding #345

Merged
merged 3 commits into from
Jan 18, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 27 additions & 21 deletions src/leb.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
pub(crate) fn leb64(x: u64, buf: &mut [u8; 10]) -> usize {
/// LEB128-encodes a `usize` value into `buf`.
///
/// This handles 32-bit and 64-bit `usize`.
pub(crate) fn leb64(x: usize, buf: &mut [u8; 10]) -> usize {
let mut low = x as u32;
let mut high = (x >> 32) as u32;
let mut high = ((x >> 16) >> 16) as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the double-shift instead of going for >>32 right away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, should've added a comment for that. Pushed a commit that explains this.


let mut i = 0;
loop {
Expand Down Expand Up @@ -58,8 +61,9 @@ pub(crate) fn leb64(x: u64, buf: &mut [u8; 10]) -> usize {
}
}

pub fn zigzag_encode(v: i64) -> u64 {
((v << 1) ^ (v >> 63)) as u64
pub fn zigzag_encode(v: isize) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a 1 sentence doc on what's happening here for those could be helpful who are not familiar with zigzag encoding (not blocking though)

const USIZE_BITS: usize = core::mem::size_of::<usize>() * 8;
((v << 1) ^ (v >> (USIZE_BITS - 1))) as usize
}

#[cfg(test)]
Expand All @@ -86,26 +90,28 @@ mod tests {
assert_eq!(buf[..i], [0x80, 1]);
buf.iter_mut().for_each(|b| *b = 0x55);

let i = leb64((1 << 32) - 1, &mut buf);
assert_eq!(buf[..i], [0xff, 0xff, 0xff, 0xff, 0xf]);
buf.iter_mut().for_each(|b| *b = 0x55);
if cfg!(target_pointer_width = "64") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nitpick: it's not very intuitive to figure out what's being tested here... Personally I'd put each check in a dedicated test case anyway (this way they're independent and their purpose is documented on the fly) but in this constellation comments would be helpful and time-saving

let i = leb64((1 << 32) - 1, &mut buf);
assert_eq!(buf[..i], [0xff, 0xff, 0xff, 0xff, 0xf]);
buf.iter_mut().for_each(|b| *b = 0x55);

let i = leb64((1 << 35) - 1, &mut buf);
assert_eq!(buf[..i], [0xff, 0xff, 0xff, 0xff, 0x7f]);
buf.iter_mut().for_each(|b| *b = 0x55);
let i = leb64((1 << 35) - 1, &mut buf);
assert_eq!(buf[..i], [0xff, 0xff, 0xff, 0xff, 0x7f]);
buf.iter_mut().for_each(|b| *b = 0x55);

let i = leb64(1 << 35, &mut buf);
assert_eq!(buf[..i], [0x80, 0x80, 0x80, 0x80, 0x80, 1]);
buf.iter_mut().for_each(|b| *b = 0x55);
let i = leb64(1 << 35, &mut buf);
assert_eq!(buf[..i], [0x80, 0x80, 0x80, 0x80, 0x80, 1]);
buf.iter_mut().for_each(|b| *b = 0x55);

let i = leb64((1 << 42) - 1, &mut buf);
assert_eq!(buf[..i], [0xff, 0xff, 0xff, 0xff, 0xff, 0x7f]);
buf.iter_mut().for_each(|b| *b = 0x55);
let i = leb64((1 << 42) - 1, &mut buf);
assert_eq!(buf[..i], [0xff, 0xff, 0xff, 0xff, 0xff, 0x7f]);
buf.iter_mut().for_each(|b| *b = 0x55);

let i = leb64(u64::max_value(), &mut buf);
assert_eq!(
buf[..i],
[0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 1]
);
let i = leb64(usize::max_value(), &mut buf);
assert_eq!(
buf[..i],
[0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 1]
);
}
}
}
14 changes: 6 additions & 8 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -417,9 +417,7 @@ impl InternalFormatter {

/// Implementation detail
/// leb64-encode `x` and write it to self.bytes
pub fn leb64(&mut self, x: u64) {
// FIXME: Avoid 64-bit arithmetic on 32-bit systems. This should only be used for
// pointer-sized values.
pub fn leb64(&mut self, x: usize) {
let mut buf: [u8; 10] = [0; 10];
let i = leb::leb64(x, &mut buf);
self.write(&buf[..i])
Expand Down Expand Up @@ -453,12 +451,12 @@ impl InternalFormatter {
/// Implementation detail
pub fn isize(&mut self, b: &isize) {
// Zig-zag encode the signed value.
self.leb64(leb::zigzag_encode(*b as i64));
self.leb64(leb::zigzag_encode(*b));
}

/// Implementation detail
pub fn fmt_slice(&mut self, values: &[impl Format]) {
self.leb64(values.len() as u64);
self.leb64(values.len());
let mut is_first = true;
for value in values {
let omit_tag = !is_first;
Expand Down Expand Up @@ -505,7 +503,7 @@ impl InternalFormatter {

/// Implementation detail
pub fn usize(&mut self, b: &usize) {
self.leb64(*b as u64);
self.leb64(*b);
}

/// Implementation detail
Expand All @@ -514,12 +512,12 @@ impl InternalFormatter {
}

pub fn str(&mut self, s: &str) {
self.leb64(s.len() as u64);
self.leb64(s.len());
self.write(s.as_bytes());
}

pub fn slice(&mut self, s: &[u8]) {
self.leb64(s.len() as u64);
self.leb64(s.len());
self.write(s);
}

Expand Down