-
-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot seek stdin on pipe #496
Changes from 29 commits
272fd74
7a5e827
583149a
cf4f11a
acd80a7
edcecbe
ca5cfe3
6a33c6e
91543c2
969e816
6535101
10c6378
9f99d63
c9bb2b5
8305291
132e1c2
c88dcac
ce523ba
7b62387
c4dda4d
235acee
8038c11
fe741b4
b035898
f0e8316
f0106e4
e27cb8a
410a7cf
0b43d58
3f0139e
128fb4e
217fe52
1a9b389
4ab2d4a
c42e501
e583627
86ffba3
a5e44ba
2e9b2d4
6562a4d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ | |
# SPDX-License-Identifier: Apache-2.0 | ||
import collections | ||
import fnmatch | ||
import io | ||
import json | ||
import logging | ||
import os | ||
|
@@ -269,7 +270,11 @@ def run_tests(self): | |
self._show_progress("%s.. " % count, flush=True) | ||
try: | ||
if fname == "-": | ||
sys.stdin = os.fdopen(sys.stdin.fileno(), "rb", 0) | ||
open_fd = os.fdopen(sys.stdin.fileno(), "rb", 0) | ||
sys.stdin = io.BytesIO(open_fd.read()) | ||
new_files_list = [ | ||
"<stdin>" if x == "-" else x for x in new_files_list | ||
] | ||
self._parse_file("<stdin>", sys.stdin, new_files_list) | ||
else: | ||
with open(fname, "rb") as fdata: | ||
|
@@ -315,8 +320,8 @@ def _parse_file(self, fname, fdata, new_files_list): | |
# for the line | ||
nosec_lines = dict() | ||
try: | ||
fdata.seek(0) | ||
tokens = tokenize.tokenize(fdata.readline) | ||
buf_data = io.BytesIO(data) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, I'm a bit perplexed by what we're doing here, honestly. If
I'm just worried about what happens here with a 10kloc file with 100-200 line length limits There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code is all a bit inefficient. It reads the file once to count the number of lines. Then reads the full file again to tokenize in order to retrieve comments like # nosec. So this bit of code was a bit perplexing to me as well. So stdin is not a seekable file descriptor (fdata.seek(0) will fail). On line 315, the data is read to its fullest (although I question if that is the case on all platforms). But there's probably a better way to do all this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems Bandit is iterating through the given file at least 4 times.
Performance would probably be greatly improved on large files if we reduced to a single pass. |
||
tokens = tokenize.tokenize(buf_data.readline) | ||
|
||
if not self.ignore_nosec: | ||
for toktype, tokval, (lineno, _), _, _ in tokens: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm 99% certain this is going to be a problem with some integration somewhere but if we were already patching this like this, perhaps it's fine? Would it not be better to do something like:
It's less code, and then we can pass
stdin
toself._parse_file
instead ofsys.stdin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree patching sys.stdin looks really ugly. To avoid that, I think we would need to pass a variable around in quite a few places.