Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Lazy tifffile #179

Merged
merged 10 commits into from
Jun 18, 2021
Merged

Conversation

scottclowe
Copy link
Member

Add a new datahandler, DataHandlerTifffileLazy, which uses tifffile to handle TIFFs but does so in a lazy manner, loading the data frame-by-frame when determining the traces (like DataHandlerPillow).

The advantage over DataHandlerPillow is it can handle more dtypes and shapes in the TIFF files.

This replaces DataHandlerPillow as the lowmemory_mode datahandler.

@codecov
Copy link

codecov bot commented Jun 16, 2021

Codecov Report

Merging #179 (1d69acc) into master (fd9d282) will increase coverage by 0.17%.
The diff coverage is 95.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #179      +/-   ##
==========================================
+ Coverage   92.65%   92.83%   +0.17%     
==========================================
  Files           8        8              
  Lines         831      879      +48     
  Branches      165      175      +10     
==========================================
+ Hits          770      816      +46     
- Misses         31       32       +1     
- Partials       30       31       +1     
Flag Coverage Δ
unittests 92.71% <95.91%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
fissa/extraction.py 95.86% <95.83%> (-0.02%) ⬇️
fissa/core.py 96.22% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd9d282...1d69acc. Read the comment docs.

Copy link
Member

@swkeemink swkeemink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important potential bug.

We should also clarify in the explanation of the low_memory option in the notebooks that it only applies for loading tiff files which might be too high dimensional, and won't work for the option we provide of just inputting arrays instead of tifffiles (if that's not already done properly).

(Incidentally perhaps we should have a separate dataloader for arrays, and detect in the beginning if the provided images are tiffs or strings, and then load the appropriate data loader.)

Parameters
----------
image : str or array_like
Either a path to a TIFF file, or array_like data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current function can only take in a TiffFile object, no arrays.

Example:

import tifffile
import numpy as np
image = np.random.rand(10, 10,10)
tifffile.TiffFile(image)

will give an error.
(Specifically, TiffFile can take as a first argument 'a file name, seekable binary stream, or FileHandle')

@scottclowe
Copy link
Member Author

One important potential bug.

It's not a bug, it's just the docstring is wrong. Like the Pillow loader, lazy loader doesn't support a list of file names.

We should also clarify in the explanation of the low_memory option in the notebooks that it only applies for loading tiff files which might be too high dimensional, and won't work for the option we provide of just inputting arrays instead of tifffiles (if that's not already done properly).

It isn't only for TIFF files which are high dimensional. It's for any (large) TIFF files.

(Incidentally perhaps we should have a separate dataloader for arrays, and detect in the beginning if the provided images are tiffs or strings, and then load the appropriate data loader.)

We could. Although you would want to check which dataloader to use for each argument in the list, since there is no guarantee the list is all file name strings or all arrays (it could be a mixture). I don't think this is worth prioritising.

@swkeemink
Copy link
Member

swkeemink commented Jun 18, 2021

We could. Although you would want to check which dataloader to use for each argument in the list, since there is no guarantee the list is all file name strings or all arrays (it could be a mixture). I don't think this is worth prioritising.

Agreed fully, was more of a side-note.

With the docstring update happy with this PR otherwise.

@scottclowe
Copy link
Member Author

With the docstring update happy with this PR otherwise.

Great! Thanks for spotting the inconsistency between docstring and behaviour.

@scottclowe scottclowe merged commit ab98b90 into rochefort-lab:master Jun 18, 2021
@scottclowe scottclowe deleted the enh_lazy-tifffile branch June 18, 2021 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants