Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

badTags tagStripper can suffer catastrophic backtracking #111

Open
alexmdavis opened this issue Jun 7, 2023 · 10 comments
Open

badTags tagStripper can suffer catastrophic backtracking #111

alexmdavis opened this issue Jun 7, 2023 · 10 comments

Comments

@alexmdavis
Copy link
Contributor

The tagStripper RegExp for badTags can backtrack catastrophically if the content is very large, and if no closing tag exists (such as a meta tag). https://github.com/DiemenDesign/summernote-cleaner/blob/ee248b92f047bdcfb30007b31a047d78253ae814/summernote-cleaner.js#L217-L218

@alexmdavis
Copy link
Contributor Author

My quick solution is to check for any closing tag first:

var tagClose = new RegExp('</' + badTag + '[^>\v]*>')
if (output.match(tagClose)) {
  tagStripper = new RegExp('<' + badTag + '(.|\r|\n)*</' + badTag + '[^>\v]*>', 'gmi');
  output = output.replace(tagStripper, '');
}

However I imagine that there is some modification to the RegExp that can fix the problem.

@RichardSquires
Copy link
Contributor

Can you give an example that would "backtrack catastrophically" and the configuration you are currently using?

@DennisSuitters
Copy link
Owner

Do you think this issue might have something to do with this: summernote/summernote#4472

@alexmdavis
Copy link
Contributor Author

I don't think it's related to that issue.

Here is an example of catastrophic backtracking, using the tag meta: https://regex101.com/r/5Xyvyu/2

The content consists of just one meta tag and one img tag. The regex reaches the end of the string without finding a closing tag for the meta, and then crashes when trying to backtrack through the huge img content.

@RichardSquires
Copy link
Contributor

RichardSquires commented Jun 23, 2023

@alexmdavis sorry didn't see the notification for the reply.

That is the regex, but I'm curious as to the configuration you have as I have seen this code parse far bigger texts than this and with other single tags such as <hr> (in the html just like that). If the issue occurs for meta I would expect the same for hr, and I'm not confident what the issue is given your description.

It would help myself debug this if I could get the plugin configuration (I can use what you put in the regex as a test for this issue).

Many thanks for you help so far.

@alexmdavis
Copy link
Contributor Author

@RichardSquires this is one config that reproduces the problem (with meta in particular).

      action: 'paste'
      keepHtml: true
      badTags: ['applet', 'col', 'colgroup', 'embed', 'noframes', 'noscript', 'script', 'title', 'meta', 'link', 'head']
      keepTagContents: ['html', 'body']
      badAttributes: ['class', 'data-(.*?)', 'id']
      limitChars: 30000
      limitDisplay: 'none'
      limitStop: false

@DennisSuitters
Copy link
Owner

Should you be using meta within content markup? Keep in mind Summernote is primarily concerned with editing content that is visual, i.e. content that doesn't use elements normally used for page layout, head areas, or script elements. Meta tags shouldn't really be used within editable content areas.

Do you really mean the issue occurs where elements don't use a closing element or ending /?

@alexmdavis
Copy link
Contributor Author

Yes, perhaps I have been too specific to one example. It appears to happen with any tag that doesn't have a closing element or /. In my regex example, the issue persists with hr (or anything else) instead of meta.

@DennisSuitters
Copy link
Owner

Well, that makes sense then, and more importantly img considering it would be more often used than hr.

@RichardSquires
Copy link
Contributor

@RichardSquires this is one config that reproduces the problem (with meta in particular).

      action: 'paste'
      keepHtml: true
      badTags: ['applet', 'col', 'colgroup', 'embed', 'noframes', 'noscript', 'script', 'title', 'meta', 'link', 'head']
      keepTagContents: ['html', 'body']
      badAttributes: ['class', 'data-(.*?)', 'id']
      limitChars: 30000
      limitDisplay: 'none'
      limitStop: false

@alexmdavis Can you shed some light on why keepTagsContents is set to ['html', 'body'] instead of []?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants