Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer chokes on certain kinds of whitespace #29590

Closed
catern opened this issue Nov 4, 2015 · 4 comments · Fixed by #30595
Closed

Lexer chokes on certain kinds of whitespace #29590

catern opened this issue Nov 4, 2015 · 4 comments · Fixed by #30595
Labels
A-parser Area: The parsing of Rust source code to an AST. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@catern
Copy link

catern commented Nov 4, 2015

The presence in a Rust source file of unusual but useful kinds of whitespace, such as ASCII 0x0C (form feed), leads to the following error:

src/main.rs:1:1: 1:2 error: unknown start of token: \u{c}
src/main.rs:1 
              ^

I have a specific use case for form-feeds in source files. But I think in general it is nice to ignore the same whitespace that every other programming language and file format ignores; it lessens confusion for people coming from other languages and backgrounds.

My specific use case is the long-standing, but somewhat uncommon use of the form-feed character (which semantically is a separator between pages of text) as a way to group together especially closely related functions or blocks in a file of source code. Text editors or IDEs such as vim, Emacs or XCode provide convenience features to display these form-feeds in aesthetically pleasing way, move between form-feed-delimited pages, and restrict editing to one form-feed-delimited page at a time. It's just a simple convenience feature, but it would really be nice to support it.

@rntz
Copy link
Contributor

rntz commented Nov 4, 2015

+1, I also use this feature and was disappointed when I found Rust didn't treat it as whitespace.

@Aatch
Copy link
Contributor

Aatch commented Nov 5, 2015

Looks like a fairly simple change could be made to the lexer so it uses char::is_whitespace instead of limiting to ' ', '\n', '\t', '\r'. The only think I can think of is that the is_whitespace function in lexer/mod.rs has been around since before we had a better is_whitespace function and nobody has changed it since then.

@steveklabnik steveklabnik added A-frontend Area: frontend (errors, parsing and HIR) A-parser Area: The parsing of Rust source code to an AST. and removed A-frontend Area: frontend (errors, parsing and HIR) labels Nov 5, 2015
@steveklabnik
Copy link
Member

/cc @rust-lang/lang , do we want to accept all kinds of whitespace?

@nikomatsakis
Copy link
Contributor

I believe we should, yes.

steveklabnik added a commit to steveklabnik/rust that referenced this issue Dec 28, 2015
Some history:

While getting Rust to 1.0, it was a struggle to keep the book in a
working state. I had always wanted a certain kind of TOC, but couldn't
quite get it there.

At the 11th hour, I wrote up "Rust inside other langauges" and "Dining
Philosophers" in an attempt to get the book in the direction I wanted to
go. They were fine, but not my best work. I wanted to further expand
this section, but it's just never going to end up happening. We're doing
the second draft of the book now, and these sections are basically gone
already.

Here's the issues with these two sections, and removing them just fixes
it all:

// Philosophers

There was always controversy over which ones were chosen, and why. This
is kind of a perpetual bikeshed, but it comes up every once in a while.

The implementation was originally supposed to show off channels, but
never did, due to time constraints. Months later, I still haven't
re-written it to use them.

People get different results and assume that means they're wrong, rather
than the non-determinism inherent in concurrency. Platform differences
aggrivate this, as does the exact amount of sleeping and printing.

// Rust Inside Other Languages

This section is wonderful, and shows off a strength of Rust. However,
it's not clear what qualifies a language to be in this section. And I'm
not sure how tracking a ton of other languages is gonna work, into the
future; we can't test _anything_ in this section, so it's prone to
bitrot.

By removing this section, and making the Guessing Game an initial
tutorial, we will move this version of the book closer to the future
version, and just eliminate all of these questions.

In addition, this also solves the 'split-brained'-ness of having two
paths, which has endlessly confused people in the past.

I'm sad to see these sections go, but I think it's for the best.

Fixes rust-lang#30471
Fixes rust-lang#30163
Fixes rust-lang#30162
Fixes rust-lang#25488
Fixes rust-lang#30345
Fixes rust-lang#29590
Fixes rust-lang#28713
Fixes rust-lang#28915

And probably others. This lengthy list alone is enough to show that
these should have been removed.

RIP.
Manishearth added a commit to Manishearth/rust that referenced this issue Dec 29, 2015
…ankro

Some history:

While getting Rust to 1.0, it was a struggle to keep the book in a
working state. I had always wanted a certain kind of TOC, but couldn't
quite get it there.

At the 11th hour, I wrote up "Rust inside other langauges" and "Dining
Philosophers" in an attempt to get the book in the direction I wanted to
go. They were fine, but not my best work. I wanted to further expand
this section, but it's just never going to end up happening. We're doing
the second draft of the book now, and these sections are basically gone
already.

Here's the issues with these two sections, and removing them just fixes
it all:

// Philosophers

There was always controversy over which ones were chosen, and why. This
is kind of a perpetual bikeshed, but it comes up every once in a while.

The implementation was originally supposed to show off channels, but
never did, due to time constraints. Months later, I still haven't
re-written it to use them.

People get different results and assume that means they're wrong, rather
than the non-determinism inherent in concurrency. Platform differences
aggrivate this, as does the exact amount of sleeping and printing.

// Rust Inside Other Languages

This section is wonderful, and shows off a strength of Rust. However,
it's not clear what qualifies a language to be in this section. And I'm
not sure how tracking a ton of other languages is gonna work, into the
future; we can't test _anything_ in this section, so it's prone to
bitrot.

By removing this section, and making the Guessing Game an initial
tutorial, we will move this version of the book closer to the future
version, and just eliminate all of these questions.

In addition, this also solves the 'split-brained'-ness of having two
paths, which has endlessly confused people in the past.

I'm sad to see these sections go, but I think it's for the best.

Fixes rust-lang#30471
Fixes rust-lang#30163
Fixes rust-lang#30162
Fixes rust-lang#25488
Fixes rust-lang#30345
Fixes rust-lang#29590
Fixes rust-lang#28713
Fixes rust-lang#28915

And probably others. This lengthy list alone is enough to show that
these should have been removed.

RIP.
bors added a commit that referenced this issue Jan 5, 2016
Some history:

While getting Rust to 1.0, it was a struggle to keep the book in a
working state. I had always wanted a certain kind of TOC, but couldn't
quite get it there.

At the 11th hour, I wrote up "Rust inside other langauges" and "Dining
Philosophers" in an attempt to get the book in the direction I wanted to
go. They were fine, but not my best work. I wanted to further expand
this section, but it's just never going to end up happening. We're doing
the second draft of the book now, and these sections are basically gone
already.

Here's the issues with these two sections, and removing them just fixes
it all:

// Philosophers

There was always controversy over which ones were chosen, and why. This
is kind of a perpetual bikeshed, but it comes up every once in a while.

The implementation was originally supposed to show off channels, but
never did, due to time constraints. Months later, I still haven't
re-written it to use them.

People get different results and assume that means they're wrong, rather
than the non-determinism inherent in concurrency. Platform differences
aggrivate this, as does the exact amount of sleeping and printing.

// Rust Inside Other Languages

This section is wonderful, and shows off a strength of Rust. However,
it's not clear what qualifies a language to be in this section. And I'm
not sure how tracking a ton of other languages is gonna work, into the
future; we can't test _anything_ in this section, so it's prone to
bitrot.

By removing this section, and making the Guessing Game an initial
tutorial, we will move this version of the book closer to the future
version, and just eliminate all of these questions.

In addition, this also solves the 'split-brained'-ness of having two
paths, which has endlessly confused people in the past.

I'm sad to see these sections go, but I think it's for the best.

Fixes #30471
Fixes #30163
Fixes #30162
Fixes #25488
Fixes #30345
Fixes #29590
Fixes #28713
Fixes #28915

And probably others. This lengthy list alone is enough to show that
these should have been removed.

RIP.
@steveklabnik steveklabnik reopened this Jan 5, 2016
bors added a commit that referenced this issue Mar 8, 2016
libsyntax: be more accepting of whitespace in lexer

Fixes #29590.

Perhaps this may need more thorough testing?

r? @Aatch
@steveklabnik steveklabnik added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Mar 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants