Title
Generating Benchmarks for Commonsense Hard Object Detection in Images
Presentation Type
Event
Start Date
27-4-2019 9:30 AM
End Date
3-5-2019 10:44 AM
Abstract
Abstract: Massive class imbalance in object detection can cause major issues in applications such as autonomous driving. State-of-the-art object detection techniques e.g., YOLO (You Only Look Once) can cause detection anomalies in images and videos to some extent because of class imbalance. For example, it is found that a person’s arm or leg may be identified by YOLO as a tennis racket. If used in automated systems, such object detectors can sometimes lead to dangerous situations. In May 2016, a semi-autonomous Tesla vehicle collided with a truck since it mistook the truck for an overpass. The reason for this can be the lack of commonsense knowledge in the automation. The absence of commonsense knowledge is a reason for class imbalance leading to anomalies. Hence, our focus in this research sub-problem is generating benchmark images for commonsense hard object detection, which poses a great challenge. Note that we only focus on images, not videos in this work. The procedure to generate these benchmarks is as follows. By embedding commonsense properties [part of, overlap with, collocate], spatial correlations are generated between bounding boxes of parts in images, and the images with odd correlations are considered anomalies (e.g., racket “overlaps with” pedestrian crossing, is an odd correlation). However, finding all possible images manually with such commonsense anomalies is a difficult and monotonous task. It will not scale well and is not methodologically sound for big data. For example, if we look at 10,000 images, we might find that YOLO does well on 60% of them, while the remaining 4,000 images have some anomalies due to commonsense errors. Finding these erroneous images is very tedious and not scalable. Therefore, we need a faster, semi-automated approach. We achieve this through the idea of a domain-independent procedure to create a standard. This is done by comparing an automated output generated by a spatial correlation script with a manually created a spatial correlation. This helps to generate benchmarks that provide commonsense hard images for object detection, as shown in our work. These generated benchmarks can then be used by systems such as YOLO in order to assess their accuracy with respect to images.
Generating Benchmarks for Commonsense Hard Object Detection in Images
Abstract: Massive class imbalance in object detection can cause major issues in applications such as autonomous driving. State-of-the-art object detection techniques e.g., YOLO (You Only Look Once) can cause detection anomalies in images and videos to some extent because of class imbalance. For example, it is found that a person’s arm or leg may be identified by YOLO as a tennis racket. If used in automated systems, such object detectors can sometimes lead to dangerous situations. In May 2016, a semi-autonomous Tesla vehicle collided with a truck since it mistook the truck for an overpass. The reason for this can be the lack of commonsense knowledge in the automation. The absence of commonsense knowledge is a reason for class imbalance leading to anomalies. Hence, our focus in this research sub-problem is generating benchmark images for commonsense hard object detection, which poses a great challenge. Note that we only focus on images, not videos in this work. The procedure to generate these benchmarks is as follows. By embedding commonsense properties [part of, overlap with, collocate], spatial correlations are generated between bounding boxes of parts in images, and the images with odd correlations are considered anomalies (e.g., racket “overlaps with” pedestrian crossing, is an odd correlation). However, finding all possible images manually with such commonsense anomalies is a difficult and monotonous task. It will not scale well and is not methodologically sound for big data. For example, if we look at 10,000 images, we might find that YOLO does well on 60% of them, while the remaining 4,000 images have some anomalies due to commonsense errors. Finding these erroneous images is very tedious and not scalable. Therefore, we need a faster, semi-automated approach. We achieve this through the idea of a domain-independent procedure to create a standard. This is done by comparing an automated output generated by a spatial correlation script with a manually created a spatial correlation. This helps to generate benchmarks that provide commonsense hard images for object detection, as shown in our work. These generated benchmarks can then be used by systems such as YOLO in order to assess their accuracy with respect to images.