-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make char::is_ascii_whitespace branchless on 32 and 64-bit targets #77021
Conversation
r? @cramertj (rust_highfive has picked a reviewer for you, use r? to override) |
Benchmark code: use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn bench_is_ascii_whitespace(c: &mut Criterion) {
let mut group = c.benchmark_group("is_ascii_whitespace");
group.bench_function("std", |b| {
b.iter(|| {
let mut n = 0;
for i in 0..128u8 {
if is_whitespace::std_is_ascii_whitespace(black_box(&(i as char))) {
n += 1;
}
}
black_box(n);
})
});
group.bench_function("pr", |b| {
b.iter(|| {
let mut n = 0;
for i in 0..128u8 {
if is_whitespace::pr_is_ascii_whitespace(black_box(&(i as char))) {
n += 1;
}
}
black_box(n);
})
});
}
criterion_group!(benches, bench_is_ascii_whitespace);
criterion_main!(benches); Generated code: https://rust.godbolt.org/z/Ws69P3 |
6fa9a37
to
82ff02b
Compare
65ad849
to
960c039
Compare
6bce617
to
10c01ee
Compare
10c01ee
to
f1e7495
Compare
If you look at the generated assembly of the match-based version, you'll find that LLVM already does the u32 as bit set trick, and generates leaner code. |
I haven't looked at the machine code, but the new version is one instruction shorter and (maybe more importantly) branchless. |
Being branchless will not be very beneficial here, because a) In normal text, there isn't that much whitespace (or even other chars < 33), and so the branch will often be taken resulting in only a cmp, jg and xor being run and b) branch prediction has become very good on most architectures (although it'll be interesting to see the result on ARM, which was historically worse than Intel or AMD; unfortunately cargo asm doesn't work on my phone; will look into --emit asm and report back later). Also even if there's 1 more instruction, with the branchy version not all instructions are actually used, so that shouldn't make a difference. |
☔ The latest upstream changes (presumably #77630) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
No description provided.