Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(commons): fix metadata regular expressions #1065

Merged
merged 3 commits into from
Jul 18, 2024
Merged

Conversation

delatrie
Copy link
Collaborator

@delatrie delatrie commented Jul 17, 2024

Context

The PR fixes Polynomial regular expression used on uncontrolled data code scan error.

The expression in question is /@?allure.label.(?<name>[^\s]+?)[:=](?<value>[^\s]+)/. Due to how the default regexp engine is implemented in Node.js, it may have polynomial complexity on some inputs.

Example

Here, we test the regular expression against a string allure.label. followed by multiple occurrences of allure.label.!. The engine scans the string from its beginning to the end until it finds out it doesn't match. The backtracking is then engaged, and the engine starts over, this time from the 14th char ('a' in the second allure.label).

The process repeats until all repetitions of allure.label are handled, yielding the O(n*m) complexity, where n is the total length of the string, and m is the number of allure.label. occurrences in the string.

This can be illustrated with the following code:

const createTest = (pattern, prefix, repetitivePart) =>
  (n) => {
    const input = (prefix + new Array(n).fill(repetitivePart).join(""));
    const start = Date.now();
    const result = input.match(pattern);
    const duration = Date.now() - start;
    return { duration: `${duration}ms`, result: result };
  };

const test = createTest(/@?allure.label.(?<name>[^\s]+?)[:=](?<value>[^\s]+)/, "allure.label.", "allure.label.!");

test(10000);
test(20000);
{ duration: '1266ms', result: null }
{ duration: '5052ms', result: null }

Here, doubling the input length gives us x4 increase in the total parsing time.

Change

In this particular case, the regular expression's performance can be fixed by anchoring the expression to the initial position:

export const allureLabelRegexp = /(?:^|\s)@?allure\.label\.(?<name>[^:=\s]+)[:=](?<value>[^\s]+)/;

Now, the following two facts are true about this regular expression:

  1. A matched string cannot contain a whitespace character (that was also the case previously).
  2. A matched string is bound to either the beginning of the input or to a preceding whitespace character.

Together, those give us the complexity of O(n) because once an attempt to match fails, the engine continues from the next character of the input without backtracking.

Behaviour changes

  1. The new expression extracts the preceding whitespace character from the title:
    Before:
    expect(extractMetadataFromString("foo @allure.id:1 bar").cleanTitle).toBe("foo  bar") // two whitespaces between foo and bar
    After:
    expect(extractMetadataFromString("foo @allure.id:1 bar").cleanTitle).toBe("foo bar") // one whitespace between foo and bar
  2. The new expression doesn't match against strings like "foo@allure.id:10". A whitespace is mandatory now. We may work this around by anchoring the pattern to the @ character as well, but that would also require us to exclude it from both label names and values. Otherwise, the same polynomial complexity problem occurs.

Extra changes

  • the PR adds a test suite for extractMetadataFromString.

Checklist

@github-actions github-actions bot added the theme:api Javascript API related issue label Jul 17, 2024
@delatrie delatrie added the type:bug Something isn't working label Jul 18, 2024
Also add tests for extractMetadataFromString.
@delatrie delatrie marked this pull request as ready for review July 18, 2024 09:41
@baev baev merged commit 7c3780b into main Jul 18, 2024
10 checks passed
@baev baev deleted the label-meta-parsing-fix branch July 18, 2024 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme:api Javascript API related issue theme:playwright type:bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants