Add support for lazy matchers #185

masklinn · 2024-02-13T19:45:19Z

Support is addef for lazy builtin matchers (with a separately compiled file), as well as loading json or yaml files using lazy matchers.

Lazy matchers are very much a tradeoff: they improve import speed, but slow down run speed, possibly dramatically.

Use them by default for the re2 parser, but not the basic parser: experimentally, on Python 3.11

importing the package itself takes ~36ms
importing the lazy matchers takes ~36ms (including the package, so ~0)
importing the eager matchers takes ~97ms

the eager matchers have a significant overhead, however running the bench on the sample file, they cause a runtime increase of 700~800ms on the basic parser bench, as that ends up instantiating every regex (likely due to match failures). Relatively this is not huge (~2.5%), but the tradeoff doesn't seem great, especially since the parser itself is initialized lazily.

The re2 parser does much better, only losing 20~30ms (~1%), this is likely because it only needs to compile a fraction of the regexes (156 out of 1162 as of regexes.yaml version 0.18), and possibly because it gets to avoid some of the most expensive to compile ones.

TODO:

test the lazy matchers
note the space overhead of the additional precompiled file (compression level 77% in wheel file, 27783 bytes stored, 123017 raw)
note the memory overhead of the additional precompiled file
turns out the eagerly compiled regex likely consume a bunch of memory,
- loading _matchers.py adds 760~780k to the process
- loading _lazy.py adds 65~75k (depends on loading order, likely because the literal strings are shared), forcing all the regexes to be compiled increases memory use by ~800k so that tracks
  the literal strings are likely shared but the compiled regex definitely are not, could have a shared cache but the use case of loading multiple builtin sets in actual production seems unlikely

Add lazy builtin matchers (with a separately compiled file), as well as loading json or yaml files using lazy matchers. Lazy matchers are very much a tradeoff: they improve import speed (and memory consumption until triggered), but slow down run speed, possibly dramatically: - importing the package itself takes ~36ms - importing the lazy matchers takes ~36ms (including the package, so ~0) and ~70kB RSS - importing the eager matchers takes ~97ms and ~780kB RSS - triggering the instantiation of the lazy matchers adds ~800kB RSS - running bench on the sample file using the lazy matcher has 700~800ms overhead compared to the eager matchers While the lazy matchers are less costly across the board until they're used, benching the sample file causes the loading of *every* regex -- likely due to matching failures -- has a 700~800ms overhead over eager matchers, and increases the RSS by ~800kB (on top of the original 70). Thus lazy matchers are not a great default for the basic parser. Though they might be a good opt-in if the user only ever uses one of the domains (especially if it's not the devices one as that's by far the largest). With the re2 parser however, only 156 of the 1162 regexes get evaluated, leading to a minor CPU overhead of 20~30ms (1% of bench time) and a more reasonable memory overhead. Thus use the lazy matcher fot the re2 parser. On the more net-negative but relatively minor side of things, the pregenerated lazy matchers file adds 120k to the on-disk requirements of the library, and ~25k to the wheel archive. This is also what the _regexes and _matchers precompiled files do. pyc files seem to be even bigger (~130k) so the tradeoff is dubious even if they are slightly faster. Fixes ua-parser#171, fixes ua-parser#173

masklinn force-pushed the lazy-matchers branch 2 times, most recently from 51d1d6f to bdc33fd Compare February 17, 2024 19:25

masklinn force-pushed the lazy-matchers branch from bdc33fd to 2856614 Compare February 18, 2024 19:13

masklinn enabled auto-merge (rebase) February 18, 2024 19:14

masklinn merged commit 16c1324 into ua-parser:master Feb 18, 2024
29 checks passed

masklinn deleted the lazy-matchers branch February 19, 2024 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for lazy matchers #185

Add support for lazy matchers #185

masklinn commented Feb 13, 2024 •

edited

Loading

Add support for lazy matchers #185

Add support for lazy matchers #185

Conversation

masklinn commented Feb 13, 2024 • edited Loading

masklinn commented Feb 13, 2024 •

edited

Loading