Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleaning up data api file cache #737

Merged
merged 5 commits into from
Jun 19, 2023

Conversation

emanuel-schmid
Copy link
Collaborator

@emanuel-schmid emanuel-schmid commented Jun 16, 2023

Changes proposed in this PR:

  • introduce a purge method in the api_client that removes files from the api client file cache if they are outdated

This PR fixes #

PR Author Checklist

PR Reviewer Checklist

@emanuel-schmid
Copy link
Collaborator Author

@chahank: let me know if this acceptably solves the ever increasing api data heap problem. If so I'll add documentation.

Copy link
Member

@chahank chahank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a very quick fix, thanks! Just a few small comments, but otherwise it looks great to me (when the documentation will be added :)).

Comment on lines 599 to 600
" or remove cache entry from database by calling"
f" `Client.purge_cache_db(Path('{local_path}'))`!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit to difficult to understand. When should a user purge the cache? When should a user wait for the download? Any way to help the user better here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception is more verbose now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear now!

with the API client, if they are beneath the given directory and if one of the following
is the case:
- there status is neither 'active' nor 'test_dataset'
= their status is 'test_dataset' and keep_testfiles is set to False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
= their status is 'test_dataset' and keep_testfiles is set to False
- their status is 'test_dataset' and keep_testfiles is set to False


# collect urls from datasets that should not be removed
test_datasets = self.list_dataset_infos(status='test_dataset') if keep_testfiles else []
test_urls = set(filinf.url for dsinf in test_datasets for filinf in dsinf.files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the short variables dsinf and filinf. Are these some standard abbreviations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're called ds_info and file_info now

rm_empty_dirs(subdir)
try:
directory.rmdir()
except OSError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error is ignored here? I do not understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an inline comment # raised when directory is not empty

Path(temp_dir).joinpath('hazard/tropical_cyclone/rename_files2/v1').is_dir()
)
self.assertEqual( # test files are still there
3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there 3 test files?

Copy link
Collaborator Author

@emanuel-schmid emanuel-schmid Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's just the nature of that test dataset. datasets can have any number of files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I meant, why does this particular test result in 3 files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beats me. I picked it for being small (in file size) and expired. I suspect it's an experimental dataset that was used to explore the data api itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a better-known test file then? The test looks rather mysterious like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Wrong. Sorry. The one with 3 files is used in TestStormEurope.test_icon_read. Reading icon files takes a directory as input and collects data from there. Having more than one makes complete sense. And the test is fairly known. Apart from that I think it doesn't really matter which dataset we pick as long as size is acceptable and status and version make a difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, if it is clear to you, I am fine with it.

@emanuel-schmid
Copy link
Collaborator Author

@chahank many thanks for the equally quick review! 🙌
Amending the docs....

@emanuel-schmid emanuel-schmid merged commit c15fc53 into develop Jun 19, 2023
@emanuel-schmid emanuel-schmid deleted the feature/cleanup_dataapi_filecache branch June 20, 2023 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants