|
4. Experimental ResultsWe present results on the ETHZ shape classes ([10]). It has 5 different object categories with 255 images in total. All categories have significant intra-class variations, scale changes, and illumination changes. Moreover, many objects are surrounded by extensive background clutter and have interior contours. This dataset comes with ground truth gray level edge maps, which is a very important factor for fair comparison, in particular for contour based methods. Fig. 7 shows P/R curves for three methods: Contour Selection by Zhu et al. [34], Ferrari et al. [8], and our method. We selected these two methods for comparison, since they also are contour based methods, and direct comparison is possible, since [34] published P/R curves in the paper and [8] published their code. We first choose the same criterion that is used in [8] and [34], i.e., a detection is deemed as correct if the detected bounding box covers over 20% of the ground truth bounding box. Our approach performs better than [8] on four categories (exception: "Mugs") and also outperforms [34] on four categories (exception: "Bottles").
Similar to the experimental setup in [34], we use only the single hand-drawn models provided for each class. Since the criterion of 20% overlap may not indicate a true detection, we also show the results with 50% overlap, which is a standard measure on PASCAL collection. Our P/R curves with 20% and 50% overlap are identical for "Applelogos" and "Swans". The performance of our system did not change much with 50% criterion for "Mugs". For "Bottles" and "Giraffes" we notice a drop with 50% overlap, but still our performance is better than that of [8]. The 50% overlap results are not reported in [34] and in [8]. However, by running the released code of [8], we are able to report P/R results with both 20% and 50% overlap on the classes "Bottles", "Giraffes" and "Mugs". In [8] only detection rate (DR) vs. false positive per image (FPPI) is reported. Since we were not able to successfully run the code on "Swans" and "Applelogos, for these two classes, we report the translation of their results into P/R from [34]. From the P/R curves we found that our method performs significantly better than the other two methods on non-rigid objects: "Swans" and "Giraffes". We benefit here from our novel shape descriptor. Thin-Plate Spline Robust Point Matching algorithm (TPS-RPM) is used to fine tune the detected contour in [8]. [34] uses shape context as shape descriptor. To illustrate the benefits of our new shape descriptor in the presence of noise and deformation we compare it with shape context (SC) [1] on Kimia99 dataset [24] in Table 1. This dataset has a lot of intra-class deformation.
Table 1. Retrieval results on Kimia99 dataset We use 150 particles, K = 25 nearest neighbors for proposal distribution. For shape descriptor, we select 6 distance bins (in log space), and 12 angle bins (between 0 and π). We also use detection rate vs false positive per image (DR/FPPI) in Fig. 8 to evaluate our results. We quote the other curves from [9], which is a longer version of [8], and compare our system to three different methods: [10], [9], and Chamfer matching, also reported in [9]. All the methods use 20% bounding box overlap. From the results we can see that our method outperforms all of them on 0.3 FPPI, and is better in four categories (except "Swans") on 0.4 FPPI. Our precisions at 0.3/0.4 FPPI are Applelogos: 92.5/92.5, Bottles:95.8/95.8, Giraffes: 86.2/92.0, Mugs: 83.3/85.4, Swans: 93.8/93.8. Some detection examples of our method can be found in Fig. 9. Since we group edge fragments, the detected objects are precisely localized, which is in contrast to appearance based sliding window approaches. We also show some false positives in the bottom row. In addition to the well-known sequential filtering benefit of particle filters that implements delayed decision in a sound statistical framework, one of the main benefits of the proposed PF framework for grouping of edge fragments is the fact that global shape similarity can be explicitly employed. It measures how similar the edge fragments of each particle are to the model contour. Thus, providing strong likelihood function for evaluation of each particle. Since each particle carries a contour hypothesis, the proposed approach can handle large variations of object contours including nonrigid deformation and missing parts in cluttered images.
The main limitation of the proposed system is that it works with edge fragments, which are obtained by bottomup, low level linking of edge pixels, and therefore, it heavily relies on good edge detection results. We assume that the occluding contour of a target object is composed of no more than 10 to 20 edge fragments, which can be broken, deformed, and some parts can be missing. With the recent progress in edge detection, e.g., pb edge detector [20], good edge detection results on many images are possible, e.g., on the ETHZ dataset [10], and consequently our assumption is satisfied. However, still on many images the performance of edge detectors is unsatisfactory, i.e., our assumption is not satisfied, e.g., the occluding contour of a target object is composed of more than 20 edge fragments that are only few pixels long. This is the main reason why we do not report any experimental results on PASCAL challenge collection of data sets. While on many images in the ETHZ collection, edge detection performs sufficiently well so that our assumption is satisfied, this is not the case for a large percentage of PASCAL images. For a large part this is due to low resolution of these images.
|