Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect "referenced before assignment" variables #236

Closed
pyflakes-bot opened this issue Dec 2, 2015 · 11 comments
Closed

detect "referenced before assignment" variables #236

pyflakes-bot opened this issue Dec 2, 2015 · 11 comments

Comments

@pyflakes-bot
Copy link

Original report by spaceone (@spaceone?) on Launchpad:


Assume the following code:

try:
    fd = open(filename, 'r')
finally:
    if fd is not None:
        fd.close()

This would raise "UnboundLocalError: local variable 'fd' referenced before assignment" if opening the file fails e.g. because of no permissions/not existing files/etc..
It would be very nice if pyflakes detect this kind of error and warn about the use of 'fd' in the finally block if 'fd' is created inside of the try-block.

@pyflakes-bot
Copy link
Author

Original comment by jayvdb (@jayvdb?) on Launchpad:


Note that this would cause errors for 'valid' code, when finally referenced a variable created only in try: which may have been bound before the exception. The coder may know 99% that the variable created in the try block will never fail (except maybe out of memory errors, I suppose). e.g. something like

try:
    bar = foo.x
    baz = bar.y
finally:
    if bar is not None:
        bar.z

Anyway, it is extremely prone to bugs, so IMO it is acceptable to error in that scenario.

I suspect this is quite easy to solve moderately well.

Instead of simply processing all TRY children nodes sequentially

https://github.com/pyflakes/pyflakes/blob/master/pyflakes/checker.py#L1092

process finally: children first, and then the try:, except: and else: children.

The better approach would be to save a copy of the contents of the scope before processing the try: children nodes, and use that saved copy of the scope while processing the finally: children. That way it is possible to check whether the referenced names were created during the try: branch.

@pyflakes-bot
Copy link
Author

Original comment by asmeurer (@asmeurer?) on Launchpad:


I think it's a good idea. I've been bitten by it before with something like

try:
    p = subprocess.Popen([cmd])
    p.communicate()
finally:
    return p.returncode

where cmd is not found. It's worse in Python 2 where the exceptions don't chain, but a NameError or UnboundLocalError looks bad regardless. It's also a good reason to use context managers instead of try/finally when possible, and to minimize the code that is inside of a try block.

But I suppose it would be good to also check some real code for potential false positives.

@eddieViscosity
Copy link

eddieViscosity commented Nov 4, 2017

This is a very common programming mistake and should be caught. Even this simple example is not detected:

def Afunction(d):
    if d:
        x=1
    return x



@bitglue
Copy link
Member

bitglue commented Nov 7, 2017

Separating the cases where there is actually an error from the ones where there is not seems difficult. For example:

def f(d):
    if d % 2 == 0:
        result = "d is even"
    if d % 2 == 1:
        result = "d is odd"
    return result

Or what about:

def Afunction(d):
    """Return 1.

    `d` must be a true value.
    """

    if d:
        x=1
    return x

@eddieViscosity
Copy link

It would still be nice to have unassigned usage flagged even though all cases might not be true errors. Outside of variables defined outside the scope of the module, it could be argued that most cases should be recoded to avoid the warning message.

@bitglue
Copy link
Member

bitglue commented Nov 8, 2017

That would go against the design principles.

@wchill
Copy link

wchill commented Nov 1, 2018

Just ran into this issue as well - minimal repro:

def test(data=None):
    if data is None:
        some_var = 'something'
    else:
        pass
    return some_var

some_var is not defined anywhere else in the file, and it is very evident that there is a code path where some_var will not be bound.

@bitglue
Copy link
Member

bitglue commented Nov 1, 2018

This is a well-known limitation, but it is unlikely it will get fixed unless someone smarter than me comes up with a solution. The problem is there may be logical constraints in the conditional expressions that the programmer knows about but which pyflakes can not. For example:

def test(something):
    if is_a_fruit(something):
        foo = "bar"
    if is_an_apple(something):
        return foo
    else:
        return 42

A naive analysis would say it's possible the first conditional branch could not be taken but the second might, thus attempting to return the undefined foo. However we know all apples are fruit, so that never actually happens.

There are things that could be done which would correctly reason about these situations in some circumstances. However there remain things (like knowledge that all apples are fruits) that pyflakes just can't reasonably ever know about. Of the things that are theoretically calculable through static analysis, it seems to me the effort and complexity grows quite rapidly, and pyflakes would end up implementing a mini-python interpreter just to be even a little bit useful. Yet people will still report bugs for false positives and false negatives, and the overall utility gain will be very small relative to the effort required.

There is another way to detect this issue: a unit test. A unit test which covers all possible code paths, which is a reasonable expectation of a good unit test, is guaranteed to catch this issue. Pyflakes complements unit tests by finding issues a unit test can not, such as variables and imports which are never used. The design principles follow from this perspective.

In summary:

  1. I agree it would be nice to fix, but
  2. it's really hard, and
  3. even the hard solution is bound to be incomplete and subject to false positives, and
  4. unit testing already catches this issue, so it's not really a big deal if Pyflakes does not.

Since this issue has been around for many years and discussed exhaustively, I'm closing the issue and putting the discussion to rest until someone can propose a patch which:

  1. Demonstrably does not create false positives in real-world code, and
  2. demonstrably does find true positives in real-world code (not just trivial, contrived examples), and
  3. is not insanely complex, and
  4. is not a significant detriment to performance.

@bitglue bitglue closed this as completed Nov 1, 2018
@asmeurer
Copy link
Contributor

asmeurer commented Nov 1, 2018

I'm curious if you or anyone has tested this logic on real code to get an idea of how many false positives it would produce. Specifically, don't allow variables to be defined in an if block unless that block also has an else block, and in that case, it must be defined in every if/elif/else block. In my experience, writing code like your example is very bug prone. You tend to miss some corner case where is_an_apple(something) is True but is_a_fruit(something) is False, and you're better off defining foo as some default value above the if. Even if the default value would not make sense you'd get a better error message than NameError.

By the way, pyflakes still does the completely incorrect thing of concatenating the bodies of every if/elif/else block and assuming they all get run. Actually, exactly one body can ever be run (or one or zero if there is no else). For example, the following gives no errors

def test(x):
    if x:
        a = 1
    else:
        print(a)

I remember discussing this before, so there may already be an issue open for this.

@lilydjwg
Copy link

lilydjwg commented Nov 6, 2018

Sad to see this. I'm bitten by UnboundLocalError twice recently and neither pyflakes nor mypy warns about it. I really miss C or C++ or Rust in which I never need to worry about this.

I don't mind of possible false positives as I think they are easy-to-break code anyway. Also pyflakes gives me a lot of false positives about unused variables that are re-exported.

@bitglue
Copy link
Member

bitglue commented Nov 6, 2018

If you truly don't mind false positives, you might find pylint more to your liking. If you're getting false positives, that's a bug. Please open a new issue.

I understand everyone wants this feature. I want this feature too. It's simply hard to implement. Moving the conversation forward requires an implementation to demonstrate feasibility and provide data on the signal:noise ratio and utility of the feature. I'm happy to continue the discussion when there's a PR.

@PyCQA PyCQA locked as resolved and limited conversation to collaborators Nov 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants