Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less false positives for len-as-conditions (pandas's DataFrame, numpy's Array) #3821

Merged

Conversation

Pierre-Sassoulas
Copy link
Member

Description

len-as-conditions is now triggered only for classes that are inheriting directly from list, dict, or set and not implementing the __bool__ function, or from generators like range or list/dict/set comprehension. This should reduce the false positive for other classes, like pandas's DataFrame or numpy's Array. See discussion in related issue.

Type of Changes

Type
🐛 Bug fix

Related Issue

#1879

@Pierre-Sassoulas
Copy link
Member Author

@hippo91 and @bersbersbers I'd like your opinion on this :)
I think @ethan-leba could be interested too.

I don't think infering the class and checking for its implementation is very efficient, suggestions welcomed :)

Copy link
Contributor

@hippo91 hippo91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change! The message should be clearer thanks to it!

@coveralls
Copy link

coveralls commented Sep 7, 2020

Coverage Status

Coverage increased (+0.02%) to 90.66% when pulling b228d77 on Pierre-Sassoulas:fix-github-issue-1879 into 973cdeb on PyCQA:master.

Comment on lines 91 to 75
try:
instance = next(node.args[0].infer())
except astroid.InferenceError:
self.add_message("len-as-condition", node=node)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about the internals of astroid's infer, but what's the reasoning to emitting the error if the value can't be inferred? I'm assuming that astroid would probably less likely to be able to infer the type of an imported library, but that could be wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually happen for list comprehension, dict comprehension, set comprehension and generators like range. It's not very elegant nor explicit..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly what to think about it.
The implementation you propose is quite powerfull but it will emit a message as soon as the len argument cannot be inferred. I'm not confident about the fact it will occur only for comprehensions and generator...

Maybe we could try to filter comprehension before trying to infer. Something along the lines:

# we're finally out of any nested boolean operations so check if
# this len() call is part of a test condition
if _is_test_condition(node, parent):
    len_arg = node.args[0]
    # First, the node is a generator or comprehesion as in len([x for x in ...])
    if isinstance(len_arg, (astroid.ListComp, astroid.SetComp, astroid.DictComp, astroid.GeneratorExp):
        self.add_message("len-as-condition", node=node)
        return
    # Or the node is a name as in len(foo) and we look for name assignment and analyze it
    if isinstance(len_arg, astroid.Name):
        lookup for name assignment and check if it is a Comprehension or Generator
    # Last case i see is the node is a call to a function as in len(my_func(my_arg)) but it that case we should be able to check whether the result is a Generator
   if isinstance(len_arg, astroid.Call) and astroid.GeneratorExpr in len_arg.inferred():
       self.add_message("len-as-condition", node=node)
...

Note that i'm not found of this implementation neither because it uses a lot of isinstance calls...
What do you think about it?
Maybe we can ask @AWhetter advice...

Copy link
Member Author

@Pierre-Sassoulas Pierre-Sassoulas Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's very heavy. I'm searching for a better way than checking all the code of the class in search of a __bool__ function. If there isn't one, maybe not raising a message for anything that is not a list, set, dict, range, comprehension or generator has a better yield in term of result versus calculation to do ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing the checks @hippo91 suggested might even end up being faster than calling .infer(). If we're worried about the performance that much, we could do some performance comparisons and see which is faster.
My preference is to do the checks because we're way less likely to do raise false positives, and I think they're more annoying than pylint running slowly. At least this section of code will get called only when len() is calling in the source code being analysed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it like @hippo91 suggested is fine, my implementation wasn't efficient at all.

pylint/testutils.py Outdated Show resolved Hide resolved
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 2 times, most recently from c9c653c to 4fe3ff4 Compare September 7, 2020 20:17
@bersbersbers
Copy link

Based on the change log, it seems this has the desired effect. I did not check the implementation as I am not used to pylint internals.

I don't think infering the class and checking for its implementation is very efficient

I don't even see why the current change does not already do that in infer() and instance_has_bool().

@hippo91
Copy link
Contributor

hippo91 commented Sep 8, 2020

@Pierre-Sassoulas i will try to review this PR next week end. Sorry for the delay.

@Pierre-Sassoulas
Copy link
Member Author

I don't even see why the current change does not already do that in infer() and instance_has_bool().

Yes I did exactly that :) But maybe there is a better way, or the value given is not worth the calculation ?

@Pierre-Sassoulas Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 2 times, most recently from f101d3a to d92c8de Compare September 10, 2020 21:34
Copy link
Contributor

@hippo91 hippo91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice work @Pierre-Sassoulas !
I left a bunch of comments for discussion.

pylint/checkers/refactoring/len_checker.py Show resolved Hide resolved
pylint/testutils.py Outdated Show resolved Hide resolved
tests/functional/l/len_checks.py Show resolved Hide resolved
tests/functional/l/len_checks.py Show resolved Hide resolved
tests/functional/l/len_checks.py Outdated Show resolved Hide resolved
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 3 times, most recently from d69f819 to 98b5e41 Compare September 14, 2020 20:00
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.6.1 release milestone Sep 14, 2020
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 2 times, most recently from 55fed7c to ef5c811 Compare September 15, 2020 05:06
tests/functional/l/len_checks.py Show resolved Hide resolved
@hippo91
Copy link
Contributor

hippo91 commented Nov 26, 2020

@Pierre-Sassoulas let me know if you want another feedback before merging it.

@hippo91 hippo91 added the Waiting on author Indicate that maintainers are waiting for a message of the author label Nov 26, 2020
@Pierre-Sassoulas
Copy link
Member Author

@hippo91 I'm stuck on the way to handle functions. Here's my functional test:

    def function_returning_list(r):
        if r==1:
            return [1]
        return [2]

    def function_returning_int(r):
        if r==1:
            return 1
        return 2

    def function_returning_generator(r):
        for i in [r, 1, 2, 3]:
            yield i

    def function_returning_comprehension(r):
        return [x+1 for x in [r, 1, 2, 3]]

    def function_returning_function(r):
        return function_returning_generator(r)

    assert len(function_returning_list(z))  # [len-as-condition]
    assert len(function_returning_int(z))
    assert len(function_returning_generator(z))  # [len-as-condition] (line 173 here)
    assert len(function_returning_comprehension(z))  # [len-as-condition]
    assert len(function_returning_function(z))  # [len-as-condition]

The last three one (l 173, 174, 175) aren't working right now. How would you infer the type of the return of a function from the following ?

 Call(func=<Name.function_returning_generator l.173 at 0x7fb5e231f970>,
     args=[<Name.z l.173 at 0x7fb5e231f9a0>],
     keywords=None)
{'lineno': 173, 'col_offset': 15, 'parent': <Call l.173 at 0x7fb5e231f880>,
 'func': <Name.function_returning_generator l.173 at 0x7fb5e231f970>, 
'args': [<Name.z l.173 at 0x7fb5e231f9a0>], 'keywords': None, 'fromlineno': 173, 'tolineno': 173}

 Call(func=<Name.function_returning_comprehension l.174 at 0x7fb5e231fb50>,
     args=[<Name.z l.174 at 0x7fb5e231fb80>],
     keywords=None)
{'lineno': 174, 'col_offset': 15, 'parent': <Call l.174 at 0x7fb5e231fa60>,
 'func': <Name.function_returning_comprehension l.174 at 0x7fb5e231fb50>, 
'args': [<Name.z l.174 at 0x7fb5e231fb80>], 'keywords': None, 'fromlineno': 174, 'tolineno': 174}

 Call(func=<Name.function_returning_function l.175 at 0x7fb5e231fd30>,
     args=[<Name.z l.175 at 0x7fb5e231fd60>],
     keywords=None)
{'lineno': 175, 'col_offset': 15, 'parent': <Call l.175 at 0x7fb5e231fc40>, 
'func': <Name.function_returning_function l.175 at 0x7fb5e231fd30>,
 'args': [<Name.z l.175 at 0x7fb5e231fd60>], 'keywords': None, 'tolineno': 175, 'fromlineno': 175}

To be fair, I found other adjacent problem in testutil while doing that and did not take a lot of time for this. But maybe we could ignore function call if this is hard to do ?

@hippo91
Copy link
Contributor

hippo91 commented Dec 12, 2020

@Pierre-Sassoulas sorry for the last answer.
The problem here is that astroid doesn't seem to be able to infer list comprehension nor generator.
For example, if you run this little snippet:

from astroid import extract_node, helpers

ast_node = extract_node(
"""
(x for x in [1,2,3])
""")
print(ast_node.inferred())

you obtain the following:

File "astroid/astroid/node_classes.py", line 734, in _infer
    "No inference function for {node!r}.", node=self, context=context
astroid.exceptions.InferenceError: No inference function for <GeneratorExp l.2 at 0x7f68f3bde990>.

The same arises for ListComp. We will have to try to fix this but IMHO it will be in another astroids PR.

@Pierre-Sassoulas
Copy link
Member Author

Thanks for the review @hippo91, I commented the test cases that are not possible with the current astroid and I created #4002 to keep track of this. I'll merge the current version.

@Pierre-Sassoulas Pierre-Sassoulas merged commit f95bbac into pylint-dev:master Dec 31, 2020
@Pierre-Sassoulas Pierre-Sassoulas added Bug 🪲 False Positive 🦟 A message is emitted but nothing is wrong with the code and removed Waiting on author Indicate that maintainers are waiting for a message of the author Work in progress labels Dec 31, 2020
@Pierre-Sassoulas Pierre-Sassoulas deleted the fix-github-issue-1879 branch December 31, 2020 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🪲 False Positive 🦟 A message is emitted but nothing is wrong with the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants