Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid 64-bit arithmetic in LEB encoding #345

Merged
merged 3 commits into from
Jan 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions src/leb.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
pub(crate) fn leb64(x: u64, buf: &mut [u8; 10]) -> usize {
/// LEB128-encodes a `usize` value into `buf`.
///
/// This handles 32-bit and 64-bit `usize`.
pub(crate) fn leb64(x: usize, buf: &mut [u8; 10]) -> usize {
let mut low = x as u32;
let mut high = (x >> 32) as u32;
// Shift by 16 twice, to avoid a panic/error when shifting a 32-bit usize by 32 bits.
let mut high = ((x >> 16) >> 16) as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the double-shift instead of going for >>32 right away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, should've added a comment for that. Pushed a commit that explains this.


let mut i = 0;
loop {
Expand Down Expand Up @@ -58,8 +62,11 @@ pub(crate) fn leb64(x: u64, buf: &mut [u8; 10]) -> usize {
}
}

pub fn zigzag_encode(v: i64) -> u64 {
((v << 1) ^ (v >> 63)) as u64
/// Encodes an `isize` as a `usize` by pulling its sign bit to the least-significant place (this
/// makes it have an efficient LEB-encoding for small positive and negative values).
pub fn zigzag_encode(v: isize) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a 1 sentence doc on what's happening here for those could be helpful who are not familiar with zigzag encoding (not blocking though)

const USIZE_BITS: usize = core::mem::size_of::<usize>() * 8;
((v << 1) ^ (v >> (USIZE_BITS - 1))) as usize
}

#[cfg(test)]
Expand All @@ -85,6 +92,13 @@ mod tests {
let i = leb64(1 << 7, &mut buf);
assert_eq!(buf[..i], [0x80, 1]);
buf.iter_mut().for_each(|b| *b = 0x55);
}

/// Smoke test for bit patterns that require 64-bit `usize`s.
#[test]
#[cfg(target_pointer_width = "64")]
fn leb_64_bit() {
let mut buf = [0x55; 10];

let i = leb64((1 << 32) - 1, &mut buf);
assert_eq!(buf[..i], [0xff, 0xff, 0xff, 0xff, 0xf]);
Expand All @@ -102,7 +116,7 @@ mod tests {
assert_eq!(buf[..i], [0xff, 0xff, 0xff, 0xff, 0xff, 0x7f]);
buf.iter_mut().for_each(|b| *b = 0x55);

let i = leb64(u64::max_value(), &mut buf);
let i = leb64(usize::max_value(), &mut buf);
assert_eq!(
buf[..i],
[0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 1]
Expand Down
14 changes: 6 additions & 8 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -417,9 +417,7 @@ impl InternalFormatter {

/// Implementation detail
/// leb64-encode `x` and write it to self.bytes
pub fn leb64(&mut self, x: u64) {
// FIXME: Avoid 64-bit arithmetic on 32-bit systems. This should only be used for
// pointer-sized values.
pub fn leb64(&mut self, x: usize) {
let mut buf: [u8; 10] = [0; 10];
let i = leb::leb64(x, &mut buf);
self.write(&buf[..i])
Expand Down Expand Up @@ -453,12 +451,12 @@ impl InternalFormatter {
/// Implementation detail
pub fn isize(&mut self, b: &isize) {
// Zig-zag encode the signed value.
self.leb64(leb::zigzag_encode(*b as i64));
self.leb64(leb::zigzag_encode(*b));
}

/// Implementation detail
pub fn fmt_slice(&mut self, values: &[impl Format]) {
self.leb64(values.len() as u64);
self.leb64(values.len());
let mut is_first = true;
for value in values {
let omit_tag = !is_first;
Expand Down Expand Up @@ -505,7 +503,7 @@ impl InternalFormatter {

/// Implementation detail
pub fn usize(&mut self, b: &usize) {
self.leb64(*b as u64);
self.leb64(*b);
}

/// Implementation detail
Expand All @@ -514,12 +512,12 @@ impl InternalFormatter {
}

pub fn str(&mut self, s: &str) {
self.leb64(s.len() as u64);
self.leb64(s.len());
self.write(s.as_bytes());
}

pub fn slice(&mut self, s: &[u8]) {
self.leb64(s.len() as u64);
self.leb64(s.len());
self.write(s);
}

Expand Down