Artificial object balancing

Rafael Iriya
5 min readDec 11, 2020

Object detection has an intrinsic problem of class imbalance. It is very unlikely that a real-world dataset containing an unlimited number of objects per image will have a similar total number of objects per class. This results in a learning bias towards the more frequent classes, which in many cases is undesirable. In order to mitigate the imbalance, a traditional approach involves feeding images to the network in a specific order, such that we hold back on feeding objects of a too frequent class. Such an approach fails to properly balance the dataset as only a few images may contain a specific combination of class instances. Another approach involves modifying the network’s loss function by applying class weights corresponding to the class frequencies. This approach work in some cases, but might create instabilities in the training and does not guarantee better overall precision and recall.

At this point, we notice it’s very hard to tame the dataset’s imbalanced nature without actually modifying the images. One way to do so would be by adding artificial objects to the images. This allows to effectively balance the dataset and obtain the exact same total number of instances per class. In this topic, we discuss how to balance the dataset if artificial objects are available. However, how to obtain artificial objects or where to place them in the images I’ll leave it to another post.

Problem statement

So here is the problem statement: Given a dataset with N classes and an array of class frequency distribution, how many objects from each class needs to be added so that the new frequency distribution is uniform? For example, we might have a dataset with N=4, with classes “CAR”, “DOG”, “CAT”, “PERSON”, with 10 images and the following frequency distribution:

CAR: 5

DOG : 10

CAT: 2

PERSON : 3

What is the solution?

Simple Solution

There is a very intuitive solution to this problem, we just need to add objects until all the classes have the same number as the most frequent class. In this case, we need to add 5 objects to CAR, 0 to DOG, 8 to CAT, and 7 to PERSON and all classes will have 10 objects.

OK, now we know how many objects we need to add. But if we take not the entire dataset, but only a part of the dataset or more specifically, given one image in the dataset: 1) how many objects per image do we need to add? 2) which objects should we add?

Number of objects per image

This one is very simple. We simply divide the total number of additions by the number of images in the dataset. In our example : 5 + 0 + 8 + 7 = 20, since we have 10 images, 20/10 = 2 objects per image. So we know that on average 2 objects are needed to balance the dataset. If for every image, we add the average number of additions, on average we will balance the dataset. Some images might contain more or less objects, but after the additions, on average they will all contain the same number of objects.

But which objects are we adding?

PS: If the number of objects per image is not an integer, we can round it to the nearest integer. This rounding error should not affect the solution significantly.

Determining the added objects

The best way to do this is to think probabilistically. What’s the probability of adding an object of class i? Given the additions array we calculated earlier [5,0,8,7], the probability of addition for each element will be the element divided by the array sum (the total number of additions). In this case, it would be [1/4,0, 2/5, 7/20]. Cool, so now that we have this probabilities array, all we need to do is add 2 objects per image, which are selected by drawing from this probability distribution. How do we do that? This is a problem mathematically formulated by a random variable transformation from uniform to an arbitrary pdf. The answer lies in the cumulative distribution function. This is also a famous leetcode question (#528 — Random Pick with Weight). To make a long story short, here is the algorithm:

1) Compute the cumulative sum of the probability array into an array cum_prob = [1/4,1/4, 13/20,1].

2) Draw a random number between 0 and 1.

3) Find the interval in cum_prob where that number lies. The index of this interval will be the chosen class. This can be efficiently performed with a binary search.

Finally here is the code for balancing:

for _ in range(number_additions_per_img):             
rand = random.random()
ind = bisect.bisect_left(probs_cum, rand)
new_freq[ind] += 1

Large number of additions per image

The simple solution works when the number of additions per image is small, compared to the image size. When it is not, the image starts running out of space to put the artificial objects. What should we do in this case? The problem is so far we only considered ADDING artificial objects but in fact, we could also think of SUBTRACTING objects from the most frequent classes. Now the solution becomes a little more tricky. How do we determine how many objects to add or subtract for each class in order to balance the dataset?

Linear system of equations

We can formulate the problem as following: Given classes 1,2,…,N with original frequencies f1,f2,…,fN, we need to add a1 objects to class 1, a2 objects to class 2, …, aN objects to class N and balance the frequencies. We can put this in the form of equations:

(f1 + a1) / (F + a1 + a2 + … + aN) = 1 / N

(f2 + a2) / (F+ a1 + a2 + … + aN) = 1 / N

(fN + aN) / (F+ a1 + a2 + … + aN) = 1 / N

F = f1 + f2 + … + fN

In matrix form:

Nice, this a linear system Ax = b, which we can solve by inverting A, right? Wrong! The matrix A is singular, which means it’s non-invertible and there is not a unique solution.

So what else can we do? Well, there’s a standard optimization solution — Least Squares — which minimizes ||Ax -b || = 0. However, this is still not what we are looking for as it does not guarantee to find the solution with minimal changes to the number of objects. The solution we are looking for is an x which minimizes:

||Ax — b||, subject to ||x|| = 0

Done, all we need to do is solve this convex optimization problem. Luckily there are libraries written to solve such kind of problems. Matlab has the inbuilt function lsqlin which people have already ported to python and other languages. You can find such a solution here :

An interesting fact is that if we solve for ||Ax — b||, subject to x > 0, we end up with something close to the simple solution discussed previously.

After we find x, we can proceed as before and calculate the probabilities, however, we need to be mindful and account for negative signs.

A complete example code is found here.

--

--

Rafael Iriya

PhD, computer vision engineer and research scientist