Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade to aho-corasick 0.7 #566

Merged
merged 4 commits into from
Mar 30, 2019
Merged

upgrade to aho-corasick 0.7 #566

merged 4 commits into from
Mar 30, 2019

Commits on Mar 28, 2019

  1. ci: only test Rust benchmarks

    No need to build the benchmark suite 4 times.
    BurntSushi committed Mar 28, 2019
    Configuration menu
    Copy the full SHA
    a763b97 View commit details
    Browse the repository at this point in the history
  2. literal: upgrade to aho-corasick 0.7

    This is a "dumb" update in that we retain exactly the same functionality
    as before.
    BurntSushi committed Mar 28, 2019
    Configuration menu
    Copy the full SHA
    5734233 View commit details
    Browse the repository at this point in the history

Commits on Mar 29, 2019

  1. syntax: add is_literal and is_alternation_literal

    This adds a couple new methods on HIR expressions for determining whether
    they are literals or not. This is useful for determining whether to apply
    optimizations such as Aho-Corasick without re-analyzing the syntax.
    BurntSushi committed Mar 29, 2019
    Configuration menu
    Copy the full SHA
    461673d View commit details
    Browse the repository at this point in the history
  2. exec: add Aho-Corasick optimization

    Finally, if a regex is just `foo|bar|baz|...|quux`, we will now use plain
    old Aho-Corasick. The reason why we weren't doing this before is because
    Aho-Corasick didn't support proper leftmost-first match semantics. But
    since aho-corasick 0.7, it does, so we can now use it as a drop-in
    replacement.
    
    This basically fixes a pretty bad performance bug in a really common case,
    but it is otherwise really hacked. First of all, this only happens when a
    regex is literally `foo|bar|...|baz`. Even something like
    `foo|b(a)r|...|baz` will prevent this optimization from happening, which
    is a little silly. Second of all, this optimization only kicks in after
    we've compiled the full pattern, which adds quite a bit of overhead. Fixing
    this isn't trivial, since we may need the compiled program to resolve
    capturing groups. The way to do this is probably to specialize compilation
    for certain types of expressions. Maybe.
    
    Anyway, we hack this in for now, and punt on further improvements until
    we can really re-think how this should all work.
    BurntSushi committed Mar 29, 2019
    Configuration menu
    Copy the full SHA
    d7c01cc View commit details
    Browse the repository at this point in the history