Missing objects when dumping in COCO format #1941

Shravan1804 · 2020-07-27T14:29:14Z

My actions before raising this issue

Read/searched the docs
Searched past issues

Expected Behaviour

Annotations file should contain all objects that were labeled in the task.

Current Behaviour

Dumping annotations in CVAT XML format works fine but when using COCO format, some objects are missing in several frames.

For example with frame 211:

CVAT XML (correct annotations):
<image id="211" name="ppp_test_study_raw_processed_patched1024_labeling_task/00/_0123_A__SEP___h992_w1996.jpg" width="1024" height="1024">
<polygon label="pustules" occluded="0" points="651.79,589.81;635.32,624.24;666.76,661.67;698.20,639.21;696.70,610.77">
</polygon>
<polyline label="brown_spots" occluded="0" points="463.16,474.53;463.16,499.98;467.65,525.43;487.11,517.95;488.61,505.97;488.61,489.50;461.66,471.54">
</polyline>
<polyline label="brown_spots" occluded="0" points="295.48,702.09;289.49,729.04;313.45,744.01;328.42,727.54;325.42,711.07;292.49,703.59">
</polyline>
<polyline label="pustules" occluded="0" points="657.78,814.37;642.80,839.82;659.27,859.28;683.23,859.28;687.72,826.35;663.76,823.35;659.27,812.87">
</polyline>
</image>

COCO JSON (incomplete annotations, no other annotations present in file for this frame):
{"bbox": [635.3192982456148, 589.80859375, 62.87719298245611, 71.859242250730932],
"id": 828,
"segmentation": [[651.787109375, 589.80859375, 635.3192982456148, 624.2409356725148, 666.757894736842, 661.6678362573093, 698.196491228071, 639.2116959064333, 696.6994152046791, 610.767251461988]],
"category_id": 1,
"iscrowd": 0,
"image_id": 211,
"area": 2759.0}

There are no error in the docker logs:

2020-07-27 12:04:55,246 DEBG 'rqworker_default_0' stderr output:
12:04:55 default: cvat.apps.engine.annotation.dump_task_data('14', <SimpleLazyObject: <User: XX>>, '/home/django/data/14/YY.XX.2020_07_27_12_04_55.json', <AnnotationDumper: AnnotationDumper object (COCO JSON 1.0)>, 'http', 'ZZ:8080') (XX@/api/v1/tasks/14/annotations/COCO JSON 1.0/YY)

2020-07-27 12:04:59,239 DEBG 'rqworker_default_0' stderr output:
12:04:59 default: Job OK (XX@/api/v1/tasks/14/annotations/COCO JSON 1.0/YY)
INFO:rq.worker:default: Job OK (XX@/api/v1/tasks/14/annotations/COCO JSON 1.0/YY)

2020-07-27 12:04:59,239 DEBG 'rqworker_default_0' stderr output:
12:04:59 Result is kept for 500 seconds
INFO:rq.worker:Result is kept for 500 seconds

2020-07-27 12:04:59,260 DEBG 'rqworker_default_0' stderr output:
12:04:59 Cleaning registries for queue: default

Possible Solution

Temporary workaround: convert from CVAT XML to COCO JSON

Steps to Reproduce (for bugs)

I didn't find yet how to reproduce the bug, I tried to create a task with just problematic frames but when dumping annotations in COCO format but all objects were presents.

Context

Task of 2400 frames with 2719 objects labeled as polygons distributed in two categories.

Your Environment

Git hash commit (git log -1): commit 07de714 (HEAD -> master, tag: v1.0.0, origin/master, origin/HEAD)
Docker version docker version (e.g. Docker 17.0.05): Docker version 19.03.8, build afacb8b7f0
Are you using Docker Swarm or Kubernetes?: Docker
Operating System and version (e.g. Linux, Windows, MacOS): Linux
Code example or link to GitHub repo or gist to reproduce problem:
Other diagnostic information / logs:

Logs from `cvat` container

Next steps

You may join our Gitter channel for community support.

The text was updated successfully, but these errors were encountered:

Shravan1804 · 2020-07-27T15:04:01Z

I just realized some of the objects were polylines instead of polygons. This is probably the cause of this issue. It would be nice if a warning was issued in such cases.

Is there a way to convert polylines to polygons within CVAT?

thomasstats · 2020-07-28T06:05:53Z

I encountered this issue as well, however did not have the polylines issue. I was supposed to raise this issue after discussing it in the gitter, but never got around to it. We discovered the issue when annotators marked a whole set of 1 object as another object so we wrote a quick script to parse the dumped coco json, replace the tag, and then reuploaded it. We noticed when we reuploaded that some objects were missing and some objects were on totally different frames. We then replicated it without modifying anything to make sure we didn't do anything wrong. Dumping coco, reuploading, and then dumping it again provided two different outputs, so something in datumaro is not very keen on coco at the moment.

zhiltsov-max · 2020-07-28T07:18:08Z

@thomasstats, can you help in reproducing this problem? If you could share annotations in CVAT and COCO formats, this would greatly help. Thanks for reporting!

zhiltsov-max · 2020-07-28T14:32:27Z

@thomasstats, is it possible that missing annotations were grouped boxes or polygons? If so, could you describe the expected COCO output for such annotations?

thomasstats · 2020-07-29T03:37:28Z

@zhiltsov-max

The initial dumped COCO annotation was this:
{"id": 1, "area": 605.0, "iscrowd": 0, "bbox": [525.87, 219.73, 14.409999999999968, 46.430000000000035], "segmentation": [[540.28, 220.69, 539.0, 263.92, 525.87, 266.16, 526.51, 219.73]], "category_id": 2, "image_id": 70}

After uploading and dumping as COCO again I get this:
{"id": 543, "area": 605.0, "iscrowd": 0, "bbox": [525.87, 219.73, 14.409999999999968, 46.430000000000035], "segmentation": [[540.28, 220.69, 539.0, 263.92, 525.87, 266.16, 526.51, 219.73]], "category_id": 2, "image_id": 70}

There are other differences obviously too. For instance the initial dump was 6790 KB, after uploading and dumping again the file is now 4371 KB.

I've attached the sample below.

first_dump.zip
second_dump.zip

zhiltsov-max · 2020-07-29T10:06:14Z

@thomasstats, from what I can see, there are the following differences:

in annotation ids, which are generated on the fly
in floating point precision (second has numbers rounded to 2 digits). BTW, this is the reason why second has lesser size.

However, there is the same number of annotations and their counts on images, I can't find differences in the annotation. Is there anything I miss?

thomasstats · 2020-07-30T05:55:52Z

@zhiltsov-max i'm not sure, that's just the actual output, but when displaying (as I said before), a significant portion of the annotations don't display and quite a few of them display on the wrong frame. in the gitter discussion we had another person had also mentioned they had a similar problem and got around it by using the bbox instead of the polygon, but in our case we need those polygons.

on top of that, is there any reason why there would even be a difference between the files? shouldn't dumping, uploading and then dumping already provide the same result because it's being processed through datumaro? it seems as if there's some kind of conversion going on during the upload that isn't happening during the download, which doesn't sound like it should be the case.

zhiltsov-max · 2020-07-30T10:47:34Z

@thomasstats, checking the annotation files with the following python script gives no difference in annotations and their image appearance, except annotation ids and coordinate precision (which implies some differences in the area field).


import json
import sys

def compare_objects(obj1, obj2, ignore_keys=None, fp_tolerance=0.1):
    ignore_keys = ignore_keys or set()
    if isinstance(obj1, dict):
        assert isinstance(obj2, dict), "{} != {}".format(obj1, obj2)
        for k, v1 in obj1.items():
            if k in ignore_keys:
                continue
            v2 = obj2[k]
            compare_objects(v1, v2, ignore_keys, fp_tolerance)
    elif isinstance(obj1, list):
        assert isinstance(obj2, list), "{} != {}".format(obj1, obj2)
        assert len(obj1) == len(obj2), "{} != {}".format(obj1, obj2)
        for v1, v2 in zip(obj1, obj2):
            compare_objects(v1, v2, ignore_keys, fp_tolerance)
    else:
        if isinstance(obj1, float) or isinstance(obj2, float):
            assert abs(obj1 - obj2) < fp_tolerance, "%f != %f" % (obj1, obj2)
        else:
            assert obj1 == obj2

def try_compare_objects(*args, **kwargs):
    try:
        compare_objects(*args, **kwargs)
        return True
    except AssertionError as e:
        return False

def compare_datasets(a, b):
    compare_objects(a['categories'], b['categories'])
    compare_objects(a['images'], b['images'])

    for ann_a in a['annotations']:
        # We might find few corresponding items, so check them all
        ann_b = [x for x in b['annotations']
            if try_compare_objects(ann_a, x, ignore_keys={'id', 'image_id', 'area'})
        ]
        if not ann_b:
            print("can't find matches for", ann_a['id'], ann_a)
        else:
            if ann_a['image_id'] != ann_b[0]['image_id']:
                print('match', ann_a['id'], ann_b[0]['id'], ann_a['image_id'], ann_b[0]['image_id'])
            # print('match:', ann_a, ann_b)
        b['annotations'].remove(ann_b[0]) # avoid repeats
    assert len(b['annotations']) == 0, b['annotations']

with open(sys.argv[1]) as f:
    a = json.load(f)
with open(sys.argv[2]) as f:
    b = json.load(f)

compare_datasets(a, b)

I can imagine there could be some other problems on a path from uploading to displaying in UI, but the annotations seem to be equal, though. Could you dump and share CVAT for images annotations to check further?

However, it's really interesting where the differences came from, and it is a topic for an investigation. Speaking about dumping, uploading, and annotation equality - there are many formats, and each of them has it's own specifics. Then, it should be somehow mapped to the tool's model, and sometimes it is impossible to do unambiguously. For example, there are 2 fields in COCO format: bbox and segmentation. We can import a bbox or polygons. Or both. As well as in export - we can generate bbox from polygons, or leave this field empty, or vise versa - we can generate polygon points from a rectangle. What should we do to provide adequate experience? Such questions arise almost with every format. This is why import and export is not the same as load and save, which is the case only for the CVAT for * formats.

thomasstats · 2020-07-31T04:41:55Z

@zhiltsov-max

Attached. If this isn't good enough, if there's a way to private message (because I can't have my actual images public), I can send actual image example screenshots too.

cvat images.zip

What should we do to provide adequate experience?

I'm not entirely aware of whether or not it's possible, but couldn't there be options for prioritizing segments or prioritizing boxes during the dumps and upload?

zhiltsov-max · 2020-07-31T09:07:03Z

@thomasstats, comparing this file with second, I could find only the difference in annotation types - all boxes are converted to polygons. Here are 2 patches (#1953, #1970) to better preserve CVAT annotations in COCO format. However, they do nothing about missing annotations, as nothing seems to be missing yet.

We've thought about providing some context menu for exporting and importing with options, it is reflected in #1804.

thomasstats · 2020-07-31T09:40:35Z

Perhaps the differences are a red herring and it's a client side issue then? I've been trying to reproduce with a toy case sample, but I haven't be able to so far. However, the issue does exist on multiple of our datasets, not just the one.

nmanovic · 2020-07-31T11:31:31Z

@bsekachev , could you please look?

bsekachev · 2020-07-31T12:00:10Z

@thomasstats

To sum up, from the conversation, as I understood:

You have some annotation task, that you dump in COCO format. Then upload the dumped file back (without any changes) to CVAT, open the task and some objects aren't displayed, some are displayed on wrong frames.
Also if you dump annotations for the task again, you get a file with less size (because of truncated floating coordinates).
And you cannot reproduce the issue on a toy example, but it appears in your real datasets.

Is everything right?

The first question is what CVAT version do you use? Could you provide a commit hash?

thomasstats · 2020-08-03T02:33:55Z

@bsekachev correct. Here are my details:

Git hash commit (git log -1): Release 1.0.0 (#1335)
Docker version docker version (e.g. Docker 17.0.05): 19.03.8
Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 18.04

Though I was able to replicate it locally (with our dataset) using Windows 10 and commit 7679434

bsekachev · 2020-08-10T07:09:57Z

@thomasstats

Sorry for delay, I was on vacation previous week.
Release 1.0.0 is quite outdated. We have fixed a lot of issues since there, your bug may be one of them.
Because of some reasons we support only the latest CVAT version, so it would be great if you updated CVAT to the latest release and submit that the issues we sum up before still persist.

thomasstats · 2020-08-10T08:14:59Z

@bsekachev Thanks for the update, but as stated previously, I was able to replicate this issue with our same dataset that's having the issue on the 1.1 beta commit 7679434

bsekachev · 2020-08-10T08:24:41Z

@thomasstats

Oh, my inattentiveness.
In this case are you able to provide server response with annotations that client get before and after dump, upload?
We need first localize problem. Understand if it is client-side or server-side issue.
You can use request: http://localhost:8080/api/v1/jobs/<job_id>/annotations I suppose

And it would be great if you provided server ids of objects where you meet the issue. If you do not know how to get them from a client, let me know

zhiltsov-max · 2020-10-28T14:39:01Z

Closing as no response for a long time.

giuseta · 2021-03-26T10:11:15Z

Please reopen, this issue is still valid. Polyline annotations are not reported in COCO format, whereas they are present in CVAT format XML files.

bsekachev · 2021-03-26T10:24:50Z

@zhiltsov-max

Could you please look?

zhiltsov-max · 2021-03-26T11:02:07Z

COCO format does not know about polylines, the format has official description here: https://cocodataset.org/#format-data. Do you want us to implement our own extension?

giuseta · 2021-03-26T14:41:39Z

I'll explain what I am doing right now and why. I define a thin polygon around the polyline with a thickness of just a few pixels. I do that also because I cannot found any deep model that predicts polyline, but there are models that predict masks.
It could be a workaround but I am not sure about it, maybe developing an extension is more elegant and then the conversion from polyline to segmentation should be a preprocessing phase implemented by the ML developer.

A bit off-topic: it seems that there is an analogous problem with simple points, i.e. they are not translated into coco's keypoints.

zhiltsov-max · 2021-03-26T15:07:31Z

@giuseta, it looks like you're not bound to the COCO, because no models working with COCO support polylines, and they are not supposed to do this. Maybe you could try to use CVAT for images (xml) or Datumaro (json) formats?

A bit off-topic: it seems that there is an analogous problem with simple points, i.e. they are not translated into coco's keypoints.

It is partially supported, read here: #2910 (comment)

nmanovic · 2021-11-26T07:23:47Z

I will close the issue as outdated.

Shravan1804 changed the title ~~Missing objects when dumping in coco format~~ Missing objects when dumping in COCO format Jul 27, 2020

azhavoro assigned zhiltsov-max Jul 28, 2020

azhavoro added the bug Something isn't working label Jul 28, 2020

nmanovic added this to the 1.1.0-release milestone Jul 30, 2020

zhiltsov-max closed this as completed Oct 28, 2020

bsekachev reopened this Mar 26, 2021

nmanovic closed this as completed Nov 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing objects when dumping in COCO format #1941

Missing objects when dumping in COCO format #1941

Shravan1804 commented Jul 27, 2020

Shravan1804 commented Jul 27, 2020 •

edited

Loading

thomasstats commented Jul 28, 2020

zhiltsov-max commented Jul 28, 2020 •

edited

Loading

zhiltsov-max commented Jul 28, 2020 •

edited

Loading

thomasstats commented Jul 29, 2020

zhiltsov-max commented Jul 29, 2020

thomasstats commented Jul 30, 2020

zhiltsov-max commented Jul 30, 2020

thomasstats commented Jul 31, 2020 •

edited

Loading

zhiltsov-max commented Jul 31, 2020

thomasstats commented Jul 31, 2020 •

edited

Loading

nmanovic commented Jul 31, 2020

bsekachev commented Jul 31, 2020

thomasstats commented Aug 3, 2020 •

edited

Loading

bsekachev commented Aug 10, 2020

thomasstats commented Aug 10, 2020

bsekachev commented Aug 10, 2020 •

edited

Loading

zhiltsov-max commented Oct 28, 2020

giuseta commented Mar 26, 2021

bsekachev commented Mar 26, 2021

zhiltsov-max commented Mar 26, 2021

giuseta commented Mar 26, 2021 •

edited

Loading

zhiltsov-max commented Mar 26, 2021

nmanovic commented Nov 26, 2021

Missing objects when dumping in COCO format #1941

Missing objects when dumping in COCO format #1941

Comments

Shravan1804 commented Jul 27, 2020

My actions before raising this issue

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Next steps

Shravan1804 commented Jul 27, 2020 • edited Loading

thomasstats commented Jul 28, 2020

zhiltsov-max commented Jul 28, 2020 • edited Loading

zhiltsov-max commented Jul 28, 2020 • edited Loading

thomasstats commented Jul 29, 2020

zhiltsov-max commented Jul 29, 2020

thomasstats commented Jul 30, 2020

zhiltsov-max commented Jul 30, 2020

thomasstats commented Jul 31, 2020 • edited Loading

zhiltsov-max commented Jul 31, 2020

thomasstats commented Jul 31, 2020 • edited Loading

nmanovic commented Jul 31, 2020

bsekachev commented Jul 31, 2020

thomasstats commented Aug 3, 2020 • edited Loading

bsekachev commented Aug 10, 2020

thomasstats commented Aug 10, 2020

bsekachev commented Aug 10, 2020 • edited Loading

zhiltsov-max commented Oct 28, 2020

giuseta commented Mar 26, 2021

bsekachev commented Mar 26, 2021

zhiltsov-max commented Mar 26, 2021

giuseta commented Mar 26, 2021 • edited Loading

zhiltsov-max commented Mar 26, 2021

nmanovic commented Nov 26, 2021

Shravan1804 commented Jul 27, 2020 •

edited

Loading

zhiltsov-max commented Jul 28, 2020 •

edited

Loading

zhiltsov-max commented Jul 28, 2020 •

edited

Loading

thomasstats commented Jul 31, 2020 •

edited

Loading

thomasstats commented Jul 31, 2020 •

edited

Loading

thomasstats commented Aug 3, 2020 •

edited

Loading

bsekachev commented Aug 10, 2020 •

edited

Loading

giuseta commented Mar 26, 2021 •

edited

Loading