Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve en/em/non-breaking/hair space etc. while minifying #849

Merged

Conversation

papandreou
Copy link
Contributor

There are some "precious" space characters matched by /\s/ that should never be collapsed or trimmed away.

See assetgraph/assetgraph#778

There are some "precious" space characters matched by /\s/ that should
never be collapsed or trimmed away.

See assetgraph/assetgraph#778
@papandreou
Copy link
Contributor Author

According to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-white-space \s is equivalent to [ \f\n\r\t\v\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

Maybe the [ \n\r\t] character class I suggest here should be tweaked further to include more characters.

The White Space Processing Rules says:

Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), form feed (U+000C), and carriage return (U+000D) must be rendered as a visible glyph and otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its codepoint value, or use some other method to provide an appropriate visible glyph. As required by [UNICODE], unsupported Default_ignorable characters must be ignored for rendering.

... so at least form feed (\f) should also be added.

@alexlamsl
Copy link
Collaborator

alexlamsl commented Sep 11, 2017

I prefer only changing the /\s/ that is necessary to pass your new unit tests rather than a blanket replacement.

For instance, https://github.com/kangax/html-minifier/pull/849/files#diff-bd35077e6c438d3f6866b57bb2481260L33 isn't necessary.

(I have tested on a few web browsers and agree that the issue illustrated by your test cases need to be fixed.)

@papandreou
Copy link
Contributor Author

I prefer only changing the /\s/ that is necessary to pass your new unit tests rather than a blanket replacement.
For instance, https://github.com/kangax/html-minifier/pull/849/files#diff-bd35077e6c438d3f6866b57bb2481260L33 isn't necessary.

Just to be sure that I understand your POV correctly -- isn't that actually an argument for adding test cases that validate that and the other replacement?

@alexlamsl
Copy link
Collaborator

@papandreou I see - if you can have the new test cases cover all the replacements of \s to [ \r\n\t\f], I'll be very happy and convinced 😉

@papandreou papandreou force-pushed the feature/preserveOddballWhitespace branch 2 times, most recently from dab9ad2 to 1d600cc Compare September 11, 2017 20:55
@papandreou
Copy link
Contributor Author

Okay, I added some more tests for the cases I could find, then rolled back to \s in the remaining cases.

However, I'm fairly sure that some of those tests can be written, so I probably wouldn't include 1d600cc -- those other kinds of spaces should basically be treated as regular glyphs, so I think /[ \n\r\t\f]/ is a safer default.

@alexlamsl
Copy link
Collaborator

I prefer changing them only if/when we hit an issue, and would therefore has test coverage for it.

The new tests & rollbacks LGTM - if there isn't anything else I shall go ahead & merge.


// Preserve hair space in attributes:
input = '<p class="foo\u200abar"></p>';
assert.equal(minify(input, { collapseWhitespace: true, preserveLineBreaks: true }), input);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preserveLineBreaks doesn't seem to be required here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, fixed + rebased.

…ttributes

Unfortunately this means we'll have to abandon String#trim as it
seems to be implemented as str.replace(/^\s+|\s+$/g, '')

Also, test preservation of oddball whitespace in class names when deduplicating and reordering
@papandreou
Copy link
Contributor Author

if there isn't anything else I shall go ahead & merge

Nothing further for now. I might pick it up again when I get some time :)

@alexlamsl
Copy link
Collaborator

@papandreou I shall merge after your rebase is pushed out then (GitHub having a slow day?)

Thanks a lot for the patch!

@papandreou papandreou force-pushed the feature/preserveOddballWhitespace branch from 1d600cc to c3d239d Compare September 11, 2017 21:30
@papandreou
Copy link
Contributor Author

@alexlamsl Hah, I don't know where I tried to push to before, but here it is :)

@alexlamsl alexlamsl merged commit 7683fbc into kangax:gh-pages Sep 11, 2017
@papandreou
Copy link
Contributor Author

@alexlamsl Thanks! Can I persuade you to do a new release with this fix?

@alexlamsl
Copy link
Collaborator

That's the plan - I went straight to bed after merging your PR 😅

Give me a few hours while I sort out my $day_job, then I'll make 3.5.5 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants