Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing objects when dumping in COCO format #1941

Closed
2 tasks done
Shravan1804 opened this issue Jul 27, 2020 · 24 comments
Closed
2 tasks done

Missing objects when dumping in COCO format #1941

Shravan1804 opened this issue Jul 27, 2020 · 24 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Shravan1804
Copy link

My actions before raising this issue

Expected Behaviour

Annotations file should contain all objects that were labeled in the task.

Current Behaviour

Dumping annotations in CVAT XML format works fine but when using COCO format, some objects are missing in several frames.

For example with frame 211:

CVAT XML (correct annotations):
<image id="211" name="ppp_test_study_raw_processed_patched1024_labeling_task/00/_0123_A__SEP___h992_w1996.jpg" width="1024" height="1024">
<polygon label="pustules" occluded="0" points="651.79,589.81;635.32,624.24;666.76,661.67;698.20,639.21;696.70,610.77">
</polygon>
<polyline label="brown_spots" occluded="0" points="463.16,474.53;463.16,499.98;467.65,525.43;487.11,517.95;488.61,505.97;488.61,489.50;461.66,471.54">
</polyline>
<polyline label="brown_spots" occluded="0" points="295.48,702.09;289.49,729.04;313.45,744.01;328.42,727.54;325.42,711.07;292.49,703.59">
</polyline>
<polyline label="pustules" occluded="0" points="657.78,814.37;642.80,839.82;659.27,859.28;683.23,859.28;687.72,826.35;663.76,823.35;659.27,812.87">
</polyline>
</image>

COCO JSON (incomplete annotations, no other annotations present in file for this frame):
{"bbox": [635.3192982456148, 589.80859375, 62.87719298245611, 71.859242250730932],
"id": 828,
"segmentation": [[651.787109375, 589.80859375, 635.3192982456148, 624.2409356725148, 666.757894736842, 661.6678362573093, 698.196491228071, 639.2116959064333, 696.6994152046791, 610.767251461988]],
"category_id": 1,
"iscrowd": 0,
"image_id": 211,
"area": 2759.0}

There are no error in the docker logs:

2020-07-27 12:04:55,246 DEBG 'rqworker_default_0' stderr output:
12:04:55 default: cvat.apps.engine.annotation.dump_task_data('14', <SimpleLazyObject: <User: XX>>, '/home/django/data/14/YY.XX.2020_07_27_12_04_55.json', <AnnotationDumper: AnnotationDumper object (COCO JSON 1.0)>, 'http', 'ZZ:8080') (XX@/api/v1/tasks/14/annotations/COCO JSON 1.0/YY)

2020-07-27 12:04:59,239 DEBG 'rqworker_default_0' stderr output:
12:04:59 default: Job OK (XX@/api/v1/tasks/14/annotations/COCO JSON 1.0/YY)
INFO:rq.worker:default: Job OK (XX@/api/v1/tasks/14/annotations/COCO JSON 1.0/YY)

2020-07-27 12:04:59,239 DEBG 'rqworker_default_0' stderr output:
12:04:59 Result is kept for 500 seconds
INFO:rq.worker:Result is kept for 500 seconds

2020-07-27 12:04:59,260 DEBG 'rqworker_default_0' stderr output:
12:04:59 Cleaning registries for queue: default

Possible Solution

Temporary workaround: convert from CVAT XML to COCO JSON

Steps to Reproduce (for bugs)

I didn't find yet how to reproduce the bug, I tried to create a task with just problematic frames but when dumping annotations in COCO format but all objects were presents.

Context

Task of 2400 frames with 2719 objects labeled as polygons distributed in two categories.

Your Environment

  • Git hash commit (git log -1): commit 07de714 (HEAD -> master, tag: v1.0.0, origin/master, origin/HEAD)
  • Docker version docker version (e.g. Docker 17.0.05): Docker version 19.03.8, build afacb8b7f0
  • Are you using Docker Swarm or Kubernetes?: Docker
  • Operating System and version (e.g. Linux, Windows, MacOS): Linux
  • Code example or link to GitHub repo or gist to reproduce problem:
  • Other diagnostic information / logs:
    Logs from `cvat` container

Next steps

You may join our Gitter channel for community support.

@Shravan1804 Shravan1804 changed the title Missing objects when dumping in coco format Missing objects when dumping in COCO format Jul 27, 2020
@Shravan1804
Copy link
Author

Shravan1804 commented Jul 27, 2020

I just realized some of the objects were polylines instead of polygons. This is probably the cause of this issue. It would be nice if a warning was issued in such cases.

Is there a way to convert polylines to polygons within CVAT?

@thomasstats
Copy link

I encountered this issue as well, however did not have the polylines issue. I was supposed to raise this issue after discussing it in the gitter, but never got around to it. We discovered the issue when annotators marked a whole set of 1 object as another object so we wrote a quick script to parse the dumped coco json, replace the tag, and then reuploaded it. We noticed when we reuploaded that some objects were missing and some objects were on totally different frames. We then replicated it without modifying anything to make sure we didn't do anything wrong. Dumping coco, reuploading, and then dumping it again provided two different outputs, so something in datumaro is not very keen on coco at the moment.

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Jul 28, 2020

@thomasstats, can you help in reproducing this problem? If you could share annotations in CVAT and COCO formats, this would greatly help. Thanks for reporting!

@azhavoro azhavoro added the bug Something isn't working label Jul 28, 2020
@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Jul 28, 2020

@thomasstats, is it possible that missing annotations were grouped boxes or polygons? If so, could you describe the expected COCO output for such annotations?

@thomasstats
Copy link

@zhiltsov-max

The initial dumped COCO annotation was this:
{"id": 1, "area": 605.0, "iscrowd": 0, "bbox": [525.87, 219.73, 14.409999999999968, 46.430000000000035], "segmentation": [[540.28, 220.69, 539.0, 263.92, 525.87, 266.16, 526.51, 219.73]], "category_id": 2, "image_id": 70}

After uploading and dumping as COCO again I get this:
{"id": 543, "area": 605.0, "iscrowd": 0, "bbox": [525.87, 219.73, 14.409999999999968, 46.430000000000035], "segmentation": [[540.28, 220.69, 539.0, 263.92, 525.87, 266.16, 526.51, 219.73]], "category_id": 2, "image_id": 70}

There are other differences obviously too. For instance the initial dump was 6790 KB, after uploading and dumping again the file is now 4371 KB.

I've attached the sample below.

first_dump.zip
second_dump.zip

@zhiltsov-max
Copy link
Contributor

@thomasstats, from what I can see, there are the following differences:

  • in annotation ids, which are generated on the fly
  • in floating point precision (second has numbers rounded to 2 digits). BTW, this is the reason why second has lesser size.

However, there is the same number of annotations and their counts on images, I can't find differences in the annotation. Is there anything I miss?

@thomasstats
Copy link

@zhiltsov-max i'm not sure, that's just the actual output, but when displaying (as I said before), a significant portion of the annotations don't display and quite a few of them display on the wrong frame. in the gitter discussion we had another person had also mentioned they had a similar problem and got around it by using the bbox instead of the polygon, but in our case we need those polygons.

on top of that, is there any reason why there would even be a difference between the files? shouldn't dumping, uploading and then dumping already provide the same result because it's being processed through datumaro? it seems as if there's some kind of conversion going on during the upload that isn't happening during the download, which doesn't sound like it should be the case.

@nmanovic nmanovic added this to the 1.1.0-release milestone Jul 30, 2020
@zhiltsov-max
Copy link
Contributor

@thomasstats, checking the annotation files with the following python script gives no difference in annotations and their image appearance, except annotation ids and coordinate precision (which implies some differences in the area field).


import json
import sys

def compare_objects(obj1, obj2, ignore_keys=None, fp_tolerance=0.1):
    ignore_keys = ignore_keys or set()
    if isinstance(obj1, dict):
        assert isinstance(obj2, dict), "{} != {}".format(obj1, obj2)
        for k, v1 in obj1.items():
            if k in ignore_keys:
                continue
            v2 = obj2[k]
            compare_objects(v1, v2, ignore_keys, fp_tolerance)
    elif isinstance(obj1, list):
        assert isinstance(obj2, list), "{} != {}".format(obj1, obj2)
        assert len(obj1) == len(obj2), "{} != {}".format(obj1, obj2)
        for v1, v2 in zip(obj1, obj2):
            compare_objects(v1, v2, ignore_keys, fp_tolerance)
    else:
        if isinstance(obj1, float) or isinstance(obj2, float):
            assert abs(obj1 - obj2) < fp_tolerance, "%f != %f" % (obj1, obj2)
        else:
            assert obj1 == obj2

def try_compare_objects(*args, **kwargs):
    try:
        compare_objects(*args, **kwargs)
        return True
    except AssertionError as e:
        return False

def compare_datasets(a, b):
    compare_objects(a['categories'], b['categories'])
    compare_objects(a['images'], b['images'])

    for ann_a in a['annotations']:
        # We might find few corresponding items, so check them all
        ann_b = [x for x in b['annotations']
            if try_compare_objects(ann_a, x, ignore_keys={'id', 'image_id', 'area'})
        ]
        if not ann_b:
            print("can't find matches for", ann_a['id'], ann_a)
        else:
            if ann_a['image_id'] != ann_b[0]['image_id']:
                print('match', ann_a['id'], ann_b[0]['id'], ann_a['image_id'], ann_b[0]['image_id'])
            # print('match:', ann_a, ann_b)
        b['annotations'].remove(ann_b[0]) # avoid repeats
    assert len(b['annotations']) == 0, b['annotations']

with open(sys.argv[1]) as f:
    a = json.load(f)
with open(sys.argv[2]) as f:
    b = json.load(f)

compare_datasets(a, b)

I can imagine there could be some other problems on a path from uploading to displaying in UI, but the annotations seem to be equal, though. Could you dump and share CVAT for images annotations to check further?

However, it's really interesting where the differences came from, and it is a topic for an investigation. Speaking about dumping, uploading, and annotation equality - there are many formats, and each of them has it's own specifics. Then, it should be somehow mapped to the tool's model, and sometimes it is impossible to do unambiguously. For example, there are 2 fields in COCO format: bbox and segmentation. We can import a bbox or polygons. Or both. As well as in export - we can generate bbox from polygons, or leave this field empty, or vise versa - we can generate polygon points from a rectangle. What should we do to provide adequate experience? Such questions arise almost with every format. This is why import and export is not the same as load and save, which is the case only for the CVAT for * formats.

@thomasstats
Copy link

thomasstats commented Jul 31, 2020

@zhiltsov-max

Attached. If this isn't good enough, if there's a way to private message (because I can't have my actual images public), I can send actual image example screenshots too.

cvat images.zip

What should we do to provide adequate experience?

I'm not entirely aware of whether or not it's possible, but couldn't there be options for prioritizing segments or prioritizing boxes during the dumps and upload?

@zhiltsov-max
Copy link
Contributor

@thomasstats, comparing this file with second, I could find only the difference in annotation types - all boxes are converted to polygons. Here are 2 patches (#1953, #1970) to better preserve CVAT annotations in COCO format. However, they do nothing about missing annotations, as nothing seems to be missing yet.

We've thought about providing some context menu for exporting and importing with options, it is reflected in #1804.

@thomasstats
Copy link

thomasstats commented Jul 31, 2020

Perhaps the differences are a red herring and it's a client side issue then? I've been trying to reproduce with a toy case sample, but I haven't be able to so far. However, the issue does exist on multiple of our datasets, not just the one.

@nmanovic
Copy link
Contributor

@bsekachev , could you please look?

@bsekachev
Copy link
Member

@thomasstats

To sum up, from the conversation, as I understood:

  • You have some annotation task, that you dump in COCO format. Then upload the dumped file back (without any changes) to CVAT, open the task and some objects aren't displayed, some are displayed on wrong frames.
  • Also if you dump annotations for the task again, you get a file with less size (because of truncated floating coordinates).
  • And you cannot reproduce the issue on a toy example, but it appears in your real datasets.

Is everything right?

The first question is what CVAT version do you use? Could you provide a commit hash?

@thomasstats
Copy link

thomasstats commented Aug 3, 2020

@bsekachev correct. Here are my details:

Git hash commit (git log -1): Release 1.0.0 (#1335)
Docker version docker version (e.g. Docker 17.0.05): 19.03.8
Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 18.04

Though I was able to replicate it locally (with our dataset) using Windows 10 and commit 7679434

@bsekachev
Copy link
Member

@thomasstats

Sorry for delay, I was on vacation previous week.
Release 1.0.0 is quite outdated. We have fixed a lot of issues since there, your bug may be one of them.
Because of some reasons we support only the latest CVAT version, so it would be great if you updated CVAT to the latest release and submit that the issues we sum up before still persist.

@thomasstats
Copy link

@bsekachev Thanks for the update, but as stated previously, I was able to replicate this issue with our same dataset that's having the issue on the 1.1 beta commit 7679434

@bsekachev
Copy link
Member

bsekachev commented Aug 10, 2020

@thomasstats

Oh, my inattentiveness.
In this case are you able to provide server response with annotations that client get before and after dump, upload?
We need first localize problem. Understand if it is client-side or server-side issue.
You can use request: http://localhost:8080/api/v1/jobs/<job_id>/annotations I suppose

And it would be great if you provided server ids of objects where you meet the issue. If you do not know how to get them from a client, let me know

@zhiltsov-max
Copy link
Contributor

Closing as no response for a long time.

@giuseta
Copy link

giuseta commented Mar 26, 2021

Please reopen, this issue is still valid. Polyline annotations are not reported in COCO format, whereas they are present in CVAT format XML files.

@bsekachev
Copy link
Member

@zhiltsov-max

Could you please look?

@bsekachev bsekachev reopened this Mar 26, 2021
@zhiltsov-max
Copy link
Contributor

COCO format does not know about polylines, the format has official description here: https://cocodataset.org/#format-data. Do you want us to implement our own extension?

@giuseta
Copy link

giuseta commented Mar 26, 2021

I'll explain what I am doing right now and why. I define a thin polygon around the polyline with a thickness of just a few pixels. I do that also because I cannot found any deep model that predicts polyline, but there are models that predict masks.
It could be a workaround but I am not sure about it, maybe developing an extension is more elegant and then the conversion from polyline to segmentation should be a preprocessing phase implemented by the ML developer.

A bit off-topic: it seems that there is an analogous problem with simple points, i.e. they are not translated into coco's keypoints.

@zhiltsov-max
Copy link
Contributor

@giuseta, it looks like you're not bound to the COCO, because no models working with COCO support polylines, and they are not supposed to do this. Maybe you could try to use CVAT for images (xml) or Datumaro (json) formats?

A bit off-topic: it seems that there is an analogous problem with simple points, i.e. they are not translated into coco's keypoints.

It is partially supported, read here: #2910 (comment)

@nmanovic
Copy link
Contributor

I will close the issue as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants