Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse table names that start with "e1" #578

Open
sergio-paternoster opened this issue Sep 3, 2024 · 8 comments
Open

Unable to parse table names that start with "e1" #578

sergio-paternoster opened this issue Sep 3, 2024 · 8 comments
Labels

Comments

@sergio-paternoster
Copy link

sergio-paternoster commented Sep 3, 2024

Not sure if this is a bug of just a table name with reserved keywords. If the name of my table starts with "e1" the parser is unable to recognize it. It works with any other table name that doesn't start with "e1". Here is my code (SqlParser v.5.10, php version 8.2.16 CLI):

use PhpMyAdmin\SqlParser\Parser         as Parser;

$query = "LOAD DATA LOCAL INFILE '/home/user/myloadfile.csv' 
          IGNORE INTO TABLE erp.e1_table
          FIELDS TERMINATED BY '\t'
          LINES TERMINATED BY '\n'
          IGNORE 0 LINES;";

$parser                                 = new Parser($query);
echo "\n file:   ".$parser->statements[0]->file_name->file;
echo "\n schema: ".$parser->statements[0]->table->database;
echo "\n table:  ".$parser->statements[0]->table->table;

This is the actual output:

 file:   /home/user/myloadfile.csv
 schema:
 table:  erp

Expected output:

 file:   /home/user/myloadfile.csv
 schema: erp
 table:  e1_table
@sergio-paternoster
Copy link
Author

Actually the same problem happens with any table name that starts with "e" followed by a number.

@williamdes williamdes added the bug label Sep 3, 2024
@williamdes
Copy link
Member

Thank you @sergio-paternoster for reporting this

@niconoe- did you encounter this bug ?

@niconoe-
Copy link
Contributor

niconoe- commented Sep 3, 2024

Thank you @sergio-paternoster for reporting this

@niconoe- did you encounter this bug ?

Never heard about it, but based on what has been said here, my thoughts are that the lexer fails to understand "e[0-9]+" as a string here, but instead as an integer (10^n).
I'll try to investigate about it.

EDIT:
So far, I've tested @sergio-paternoster 's query and I confirm there's an error in the parser.
I also tried SELECT * FROM e1_table WHERE 1; and everything went fine. But when I tried with a database prefix SELECT * FROM erp.e1_table WHERE 1;, the parser did not error'd something but I can see in the tokens that this is not OK:

"from": [
    {
        "@type": "PhpMyAdmin\\SqlParser\\Components\\Expression",
        "database": null,
        "table": "erp",
        "column": null,
        "expr": "erp.e1",
        "alias": "_table",
        "function": null,
        "subquery": null
    }
],

I'll try to continue to take a look about it.

@niconoe-
Copy link
Contributor

niconoe- commented Sep 3, 2024

Sorry for double posting here, but this is no longer about testing the bug, but rather about the explaination of the bug.

The lexer is trying all of these methods and stops as soon as the method matches with a "lexable" token:

  • parseDelimiter
  • parseWhitespace
  • parseNumber
  • parseComment
  • parseOperator
  • parseBool
  • parseString
  • parseSymbol
  • parseKeyword
  • parseLabel
  • parseUnknown

With the string "erp.my_table", the lexer detects that:

  • "erp" is matched thanks to parseUnknown
  • "." is matched thanks to parseOperator
  • "my_table" is matched thanks to parseUnknown

But, with the string "erp.e1_table", the lexer detects that:

  • "erp" is matched thanks to parseUnknown
  • ".e1" is matched thanks to parseNumber !!
  • "_table" is matched thanks to parseUnknown

Why that ? Because parseNumber is executed before parseOperator, and the string of characters .e1 is wrongly detected as a number because of scientific notation.

@williamdes it may have an impact, but looking at the parseNumber method in the Lexer, we should review the steps in order to stop considering something that starts with a dot "." can be followed by the "e" or "E" char (if we actually want to parse a number).
What do you think about?

@williamdes
Copy link
Member

Why that ? Because parseNumber is executed before parseOperator, and the string of characters .e1 is wrongly detected as a number because of scientific notation.

@williamdes it may have an impact, but looking at the parseNumber method in the Lexer, we should review the steps in order to stop considering something that starts with a dot "." can be followed by the "e" or "E" char (if we actually want to parse a number).
What do you think about?

If I understand this correctly by reading the code of your PR you change the step order to avoid this e/E detection ?

@niconoe-
Copy link
Contributor

niconoe- commented Sep 3, 2024

If I understand this correctly by reading the code of your PR you change the step order to avoid this e/E detection ?

Yes, but only on a certain context.

When a string starts with ".", it must be detected as a number if it is followed by a digit. But when a string contains a ".", it must be detected as a number if it is followed by a digit or "e"/"E" for scientific notation. This difference wasn't taken into account in the lexer.

I hope this is understandable through my comments and my PR 😄

niconoe- added a commit to niconoe-/sql-parser that referenced this issue Sep 3, 2024
@niconoe-
Copy link
Contributor

niconoe- commented Sep 4, 2024

@sergio-paternoster I think you can avoid the parsing error by using quote-identifiers around your table names.

use PhpMyAdmin\SqlParser\Parser         as Parser;

$query = "LOAD DATA LOCAL INFILE '/home/user/myloadfile.csv' 
          IGNORE INTO TABLE `erp`.`e1_table`
          FIELDS TERMINATED BY '\t'
          LINES TERMINATED BY '\n'
          IGNORE 0 LINES;";

This should unlock the situation for you waiting for a patch version integrating #579 is released.

@sergio-paternoster
Copy link
Author

@niconoe- Thanks for the fix! The workaround you suggested works well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants