Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] further detail for the doc for datasets/dataset_wrappers/ClassBalancedDataset #901

Merged
merged 2 commits into from
Nov 2, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions mmcls/datasets/dataset_wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,17 +175,20 @@ class ClassBalancedDataset(object):

1. For each category c, compute the fraction :math:`f(c)` of images that
contain it.
2. For each category c, compute the category-level repeat factor
2. For each category c, compute the category-level repeat factor.

.. math::
r(c) = \max(1, \sqrt{\frac{t}{f(c)}})

3. For each image I and its labels :math:`L(I)`, compute the image-level

where :math:`t` is `oversample_thr`.
3. For each image I and its labels :math:`L(I)`, compute the image-level.
repeat factor

.. math::
r(I) = \max_{c \in L(I)} r(c)

Each image repeats :math:`\lceil r(I) \rceil` times.

Args:
dataset (:obj:`BaseDataset`): The dataset to be repeated.
oversample_thr (float): frequency threshold below which data is
Expand Down Expand Up @@ -214,8 +217,8 @@ def __init__(self, dataset, oversample_thr):
self.flag = np.asarray(flags, dtype=np.uint8)

def _get_repeat_factors(self, dataset, repeat_thr):
# 1. For each category c, compute the fraction # of images
# that contain it: f(c)
# 1. For each category c, compute the fraction of images
# that contain it: f(c)
category_freq = defaultdict(int)
num_images = len(dataset)
for idx in range(num_images):
Expand All @@ -227,15 +230,15 @@ def _get_repeat_factors(self, dataset, repeat_thr):
category_freq[k] = v / num_images

# 2. For each category c, compute the category-level repeat factor:
# r(c) = max(1, sqrt(t/f(c)))
# r(c) = max(1, sqrt(t/f(c)))
category_repeat = {
cat_id: max(1.0, math.sqrt(repeat_thr / cat_freq))
for cat_id, cat_freq in category_freq.items()
}

# 3. For each image I and its labels L(I), compute the image-level
# repeat factor:
# r(I) = max_{c in L(I)} r(c)
# repeat factor:
# r(I) = max_{c in L(I)} r(c)
repeat_factors = []
for idx in range(num_images):
cat_ids = set(self.dataset.get_cat_ids(idx))
Expand Down