Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make char::is_ascii_whitespace branchless on 32 and 64-bit targets #77021

Closed

Conversation

lnicola
Copy link
Member

@lnicola lnicola commented Sep 21, 2020

No description provided.

@rust-highfive
Copy link
Collaborator

r? @cramertj

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 21, 2020
@lnicola
Copy link
Member Author

lnicola commented Sep 21, 2020

is_ascii_whitespace/std time:   [267.44 ns 267.82 ns 268.32 ns]                                    
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
is_ascii_whitespace/pr  time:   [196.86 ns 197.11 ns 197.37 ns]                                   
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

image

Benchmark code:

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn bench_is_ascii_whitespace(c: &mut Criterion) {
    let mut group = c.benchmark_group("is_ascii_whitespace");
    group.bench_function("std", |b| {
        b.iter(|| {
            let mut n = 0;
            for i in 0..128u8 {
                if is_whitespace::std_is_ascii_whitespace(black_box(&(i as char))) {
                    n += 1;
                }
            }
            black_box(n);
        })
    });
    group.bench_function("pr", |b| {
        b.iter(|| {
            let mut n = 0;
            for i in 0..128u8 {
                if is_whitespace::pr_is_ascii_whitespace(black_box(&(i as char))) {
                    n += 1;
                }
            }
            black_box(n);
        })
    });
}

criterion_group!(benches, bench_is_ascii_whitespace);
criterion_main!(benches);

Generated code: https://rust.godbolt.org/z/Ws69P3

@lnicola lnicola force-pushed the branchless-is-ascii-whitespace branch from 6fa9a37 to 82ff02b Compare September 21, 2020 18:16
@lnicola lnicola changed the title Make char::is_ascii_whitespace branchless Make char::is_ascii_whitespace branchless on 64-bit targets Sep 21, 2020
@lnicola lnicola force-pushed the branchless-is-ascii-whitespace branch from 65ad849 to 960c039 Compare September 21, 2020 18:23
@lnicola
Copy link
Member Author

lnicola commented Sep 21, 2020

Lies, damned lies and statistics

This might not be a real improvement. Here I ran one version first, and then the other (red is std):

image

@lnicola lnicola force-pushed the branchless-is-ascii-whitespace branch 3 times, most recently from 6bce617 to 10c01ee Compare September 22, 2020 09:19
@lnicola lnicola force-pushed the branchless-is-ascii-whitespace branch from 10c01ee to f1e7495 Compare September 22, 2020 09:21
@lnicola lnicola changed the title Make char::is_ascii_whitespace branchless on 64-bit targets Make char::is_ascii_whitespace branchless on 32 and 64-bit targets Sep 23, 2020
@llogiq
Copy link
Contributor

llogiq commented Sep 30, 2020

If you look at the generated assembly of the match-based version, you'll find that LLVM already does the u32 as bit set trick, and generates leaner code.

@lnicola
Copy link
Member Author

lnicola commented Oct 1, 2020

I haven't looked at the machine code, but the new version is one instruction shorter and (maybe more importantly) branchless.

@llogiq
Copy link
Contributor

llogiq commented Oct 3, 2020

Being branchless will not be very beneficial here, because a) In normal text, there isn't that much whitespace (or even other chars < 33), and so the branch will often be taken resulting in only a cmp, jg and xor being run and b) branch prediction has become very good on most architectures (although it'll be interesting to see the result on ARM, which was historically worse than Intel or AMD; unfortunately cargo asm doesn't work on my phone; will look into --emit asm and report back later). Also even if there's 1 more instruction, with the branchy version not all instructions are actually used, so that shouldn't make a difference.

@bors
Copy link
Contributor

bors commented Oct 7, 2020

☔ The latest upstream changes (presumably #77630) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

@lnicola lnicola closed this Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants