-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update reco datasets and tests #954
Conversation
Codecov Report
@@ Coverage Diff @@
## main #954 +/- ##
==========================================
+ Coverage 95.16% 95.18% +0.01%
==========================================
Files 134 134
Lines 5520 5521 +1
==========================================
+ Hits 5253 5255 +2
+ Misses 267 266 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR Felix!
I added a few questions 👍
self.data: List[Tuple[Union[str, np.ndarray], Dict[str, Any]]] = [] | ||
self.data: List[Tuple[Union[str, np.ndarray], Union[Dict[str, Any], str]]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make a decision here on the output format for the __getitem__
output signature first
Then we'll update this accordingly
Either way, I agree that self.data should have a string as second member (to save memory, and then the getter can choose whether it should be packed within a dict or not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to return a dict in fact that we support datasets with more than one annotation (boxes/labels) something like
{image: ..,
boxes: .. (can be None),
labels: .. (can be None),
}
both None -> raise
to hold some key informations wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmmh, about getitem:
- dict have a
.get
method, I'd argue we should limit the keys to the target task ("labels" for text reco)
About self.data:
- the simplest / lightest data format that fit the target task (string for text recognition)
So the getitem method need to pack this into a dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frgfm after thinking again about the output i would say it is better to return only a dict if we have more then 2 annotations (labels and boxes) otherwise we return the plain where the output should be clear.
Only recognition has str annotations and pure detection has box annotations.
OCR-end-to-end is the only format with dict where we need both. wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's dangerous to have heterogenous target data types (dict vs non-dict)
That makes transforms are to implement 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh yes i agree will update this next week that all annotations are provided in a dict
doctr/datasets/mjsynth.py
Outdated
label = [path.split('_')[1]] | ||
label = path.split('_')[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the label used to be a list of string? (with a single element)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before: ['xyz'] ( only list() splits the string )
After: 'xyz' :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I don't get how "before" used to work 🤔
Why would we have a list of string? Is it because we weren't thorough enough in the unittest and it's a mistake? Or I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmhh no unitests are fine but we have had no case where a dataset for recognition was used (I have introduced this a bit earlier).. first is the eval_recognition script with:
targets = [t['labels'][0] for t in targets]
The goal was to have a unified output (Tensor, Dict) but this makes no sense only if we have more than 1 annotation we need a dictonary otherwise str (reco) / np.ndarray (boxes) should be enough
d803e72
to
02f7518
Compare
failing tests: link to image isn't currently available |
02f7518
to
65291f8
Compare
5ee8b56
to
42bd64e
Compare
@frgfm after thinking again about this changes i think it is fine as is ... (i know that's a relative overhead to map strings in a dict and a list but in this case we can ensure all especially the transforms are handled the same way). |
I will rethink about this |
@felixdittrich92 I like it's a rather subjective question because we're trying to balance:
Perhaps we could manage all this by using a flag |
@felixdittrich92 #1041 takes one approach but doesn't offer the possibility to switch between dict & simple target The good thing about minimal target (without dict) is that batching is much easier |
@frgfm mh 🤔 but i think it is currently a good way to handle samples which have multible annotations (with = dict / without = string or np.array) do we really need a switch ? Especially for a clear transformation format |
Not necessarily but perhaps we should run a test to check latency overhead due to this 🤷♂️ |
This PR:
Any feedback is welcome 🤗