Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32248 - Implement importlib.resources #4911

Merged
merged 18 commits into from
Dec 30, 2017
Merged

Conversation

warsaw
Copy link
Member

@warsaw warsaw commented Dec 17, 2017

This ports importlib_resources to the stdlib as importlib.resource. This includes documentation and tests.

I think we should do the ResourceReader implementation as a separate branch.

https://bugs.python.org/issue32248

@@ -282,7 +282,14 @@ Other Language Changes
New Modules
===========

* None yet.
importlib.resource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib.resources*

*resource* may not contain path separators and it may
not have sub-resources (i.e. it cannot be a directory).
:type resource: ``Resource``
:returns: A context manager for use in a :keyword`with` statement.
Copy link
Contributor

@AraHaan AraHaan Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this line read :returns: A context manager for use in an :keyword:`with` statement.?

@AraHaan
Copy link
Contributor

AraHaan commented Dec 18, 2017

@warsaw I left some review comments of what I seen so far that should be corrected.

Copy link
Contributor

@AraHaan AraHaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright looks good to me now.


Open for binary reading the *resource* within *package*.

:param package: A package name or module object. See above for the API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of markup seems unusual in sdtlib documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unusual, so I vote to drop it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even I do not like this markup style in my own code and drop it.

Copy link
Member Author

@warsaw warsaw Dec 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from the standalone version, where it makes more sense and doesn't have the same consistency problems. I am rewriting it without the markup for this branch.


The following types are defined.

.. class:: Package
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks misleading to describe it as a class, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on what directive to use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I think it's the first time a stdlib module exposes typing declarations. I'm not sure it's a good idea (see current python-dev discussion).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, typing defines types as classes, so with the lack of a better alternative, I think it's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... technically it's correct. What I mean is that a class is supposed to specify concrete behaviour, but being a type, Package doesn't have any methods or attributes of its own. So I don't see the point of mentioning it in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It describes the expected interfaces for arguments in the API, so it seems valuable to me. We'd still have to describe that somewhere and I don't want to have to repeat it for every method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... It might be worth discussing on python-dev to see what a recommended practice can look like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ivan said that Guido prefers data:: and that makes more sense in the face of PEP 560/563. So I've made that switch in my branch for consistency. I do still think it makes sense to document them because user code may want to use those types in their own annotations.


Return an iterator over the contents of the package. The iterator can
return resources (e.g. files) and non-resources (e.g. directories). The
iterator does not recurse into subdirectories.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is one supposed to do with directories?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how much you want to ignore the API. Basically this is an escape hatch for users if they know they are on a file system and so can actually use the subdirectory results. For others they should filter them out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's a little confusing may be that contents() returns str names (i.e. entries) so you can do with them whatever you want. Anyway, if that's what's confusing here, I'm rewriting this section a bit to make that clearer.

Resource = Union[str, os.PathLike]


def _get_package(package) -> ModuleType:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have docstrings on the helper functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

full_path = os.path.join(package_path, resource)
try:
return builtins_open(full_path, mode='rb')
except IOError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but the canonical spelling is OSError nowadays ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that in the original standalone version, but forgot to make that change suggestion. I think it's worth using OSError here.

return BytesIO(data)


def open_text(package: Package,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly the same as open_binary, so how about sharing a common implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both text and binary used to be implemented as a single function, but they got split out, since it made documenting and reasoning about the API easier. And it's actually kind of tricky to refactor as you suggest because the text versions require different calls to open() or wrapping in text wrappers. That makes the API for any refactored commonality less readable. I played around with a couple of options but wasn't super happy with them, so I didn't think it was worth it.


def open_text(package: Package,
resource: Resource,
encoding: str = 'utf-8',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does PEP 8 say about this? I would expect no spaces around the equals sign (i.e. encoding: str='utf-8').

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think PEP 8 says anything about type hints. This is just the style based on the type hint PEPs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But if we take the advice for default parameter values, then there should be no space around the equals :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's one of those situations where I'm not sure which way to go, so I just went with the PEP that Guido wrote most recently. 😁

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the extra spaces a little jarring to originally, but trying it both ways, I think I like the spaces actually. Type hints make default arguments read differently so I don't like the stronger association to str than encoding that the lack of spaces implies. It would be interesting to have that discussion on PEP 8!



@contextmanager
def path(package: Package, resource: Resource) -> Iterator[Path]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it return an iterator? Is it the standard way to express context managers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the contextmanager decorator transforms the function to return a context manager. But the undecorated function itself still only yields Path instances. Basically it looks weird in source but at least mypy does the right thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this issue tries to get at better syntax for describing things like @contextmanager s and such.

# contents, and make sure that this entry doesn't have sub-entries.
archive_path = package.__spec__.loader.archive # type: ignore
package_directory = Path(package.__spec__.origin).parent
with ZipFile(archive_path) as zf:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm extremely surprised to see ZIpFile-specific code here. Isn't this whole module supposed to be an abstraction above the various concrete loaders? This seems wrong to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is until we get proper support in zipimport itself (which has not happened yet). It's also an accidental hold-over since the PyPI project has to support back to Python 3.4 and 2.7.

Copy link
Member Author

@warsaw warsaw Dec 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. When we get built-in zipimport support, we can drop this code (and the in-stdlib version doesn't need to worry about Pythons < 3.7).

if archive_path is None:
raise
relpath = package_directory.relative_to(archive_path)
with ZipFile(archive_path) as zf:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @pitrou has some legitimate change requests.

@@ -282,7 +282,14 @@ Other Language Changes
New Modules
===========

* None yet.
importlib.resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use :mod:`importlib.resources` in titles so it links to the documentation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the other entries in this section (i.e. the "Improved Modules") uses links, so I just went with consistency here.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@warsaw
Copy link
Member Author

warsaw commented Dec 27, 2017

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@brettcannon: please review the changes made to this pull request.

---------------------------------------

.. module:: importlib.resources
:synopsis: Package resource reading, opening, and access
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block contents in reST directives should start with three spaces, not four.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we have some inconsistency in this file. I just copied it from every other section in that file except the top level .. module:: importlib.

if reader is not None:
return TextIOWrapper(reader.open_resource(resource), encoding, errors)
# Using pathlib doesn't work well here due to the lack of 'strict'
# argument for pathlib.Path.resolve() prior to Python 3.6.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A later comment mentions that this version of the code (I mean the stdlib version in comparison to the PyPI backport) can be compatible with 3.7+ only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll fix this by actually using Path objects in the stdlib version (and removing the comment).

Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor things to make decisions on that will deviate the code even further from the PyPI code in minor ways.

from io import BytesIO, TextIOWrapper
from pathlib import Path
from types import ModuleType
from typing import Iterator, Optional, Set, Union # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we drop the directive comment?

Or better yet, do we even need the typing-related imports with the "lazy" type hint changes in Python 3.7?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has that landed yet? I tried removing them (except for Union, which we'd still need because of the Package and Resource definitions). Python was not happy. So I guess we can address that again later.

I could remove the #noqa directive, but it makes my editor happier, so I think I'll leave them in for now.



def is_resource(package: Package, name: str) -> bool:
"""True if `name` is a resource inside `package`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'name', 'package' based on what I think of as common practice for marking common-word parameter names in docstrings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Consistency within the project is probably useful, though I've tended to use backticks (i.e. markdown) for these in my own code.

toc = zf.namelist()
relpath = package_directory.relative_to(archive_path)
candidate_path = relpath / name
for entry in toc: # pragma: nobranch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the pragma?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we'll never run coverage over this code inside the stdlib, so I'm okay with that. Plus this code will get replaced once we implement ResourceReader in zipimport.

subparts = path.parts[len(relpath.parts):]
if len(subparts) == 1:
yield subparts[0]
elif len(subparts) > 1: # pragma: nobranch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pragma

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K

@warsaw
Copy link
Member Author

warsaw commented Dec 29, 2017

Looks like Windows is unhappy.

This reverts commit 3607846.

Let's see if this is the cause of the Windows failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants