PanorAMS: Automatic Annotation for Detecting Objects in Urban Context

The PanorAMS framework involves a method to automatically generate bounding box annotations in geo-referenced panoramic images based on geospatial context information. Following this method, we acquire large-scale (albeit noisy) annotations solely from open data sources in a fast and automatic manner. For detailed evaluation, the framework includes an efficient protocol (using the generated boxes as a starting point) to crowdsource groundtruth annotations for a subset of the images.

The PanorAMS-noisy dataset covers the entire land area of 219.5 km2 of the City of Amsterdam and includes over 14 million noisy bounding box annotations of 22 object categories present in 771,299 panoramic images. For many objects further fine-grained information is available (obtained from geospatial meta-data), such as building value, function and average surface area. Such information would have been difficult, if not impossible, to acquire via manual labeling based on the image alone.
The PanorAMS-clean dataset includes 147,075 ground-truth object annotations for a subset of 7,348 images of PanorAMS-noisy, and is constructed in such a way that it can be used:
- in conjunction with an overlapping subset of the PanorAMS-noisy dataset (e.g. to train teacher-student networks);
- to evaluate performance of networks trained on the non-overlapping subset of images of the PanorAMS-noisy dataset;
- as a standalone dataset with clean annotations for fully supervised classification and detection.
Together the PanorAMS datasets enable future study of classification and object detection in a real-world setting with annotations involving both class and bounding box location noise.

News

23/09/2024
PanorAMS datasets available online

06/06/2023
Paper published in IEEE Transactions on Multimedia

06/03/2023
Website live

BibTeX

If you use PanorAMS in your research, please cite:

                @ARTICLE{10144782,
                    author={Groenen, Inske and Rudinac, Stevan and Worring, Marcel},
                    journal={IEEE Transactions on Multimedia},
                    title={PanorAMS: Automatic Annotation for Detecting Objects in Urban Context},
                    year={2024},
                    volume={26},
                    number={},
                    pages={1281-1294},
                    keywords={Annotations;Geospatial analysis;Noise measurement;Urban areas;Protocols;Object detection;Labeling;Object detection;noisy labeling;crowdsourcing;urban computing;panoramic image datasets},
                    doi={10.1109/TMM.2023.3279696}}

Automatically generating PanorAMS-noisy annotations for training

Overview of our pipeline to automatically generate noisy bounding box annotations
in 360° street level images based on geographic context information. Our step-by-step method to use geospatial context information to automatically generate bounding box annotations in geo-referenced panoramic images:

Based on city observations, geospatial object information, and elevation map data, acquire object attributes and 3D real-world measurements of all objects falling within a 150 meter radius of the image GPS location.
Convert this information to 2D image coordinates using the pinhole camera model in order to generate an initial set of bounding boxes.
Refine and filter the initial set of bounding boxes acquired during step 2 via geometric reasoning based on the percentage of overlap between boxes, the classes associated with overlapping boxes, and the real-world distance between overlapping objects and the camera. Urban knowledge is incorporated at this stage by optimizing these thresholds per class
Map the final set of bounding boxes onto the image in order to qualitatively analyze the generated bounding boxes per class.
Optimize class rules, thresholds, and estimates of objects’ real-world measurements by qualitative analysis of images and corresponding bounding boxes.
Link object metadata that is available from geospatial object information (e.g. building value) to the automatically generated bounding boxes.

Efficiently crowdsourcing PanorAMS-clean annotations for evaluation

The user interface of our labeling tool for crowd-sourcing accurate bounding box annotations To evaluate the quality of our automatically generated bounding boxes, we crowdsource ground-truth annotations for a subset of the images contained in PanorAMS-noisy. For this, we implement an efficient crowdsourcing protocol using the generated boxes as a starting point. In the interest of minimizing the required annotation time, the user interface of our crowdsourcing tool is built such that the necessary mouse and eye movements are kept to a minimum. Our crowdsourcing protocol is subdivided into three tasks in order to avoid task-switching, which is well-known to increase response time and decrease accuracy. We introduce the concept of linked bounding boxes that is specific to objects split across the left and right side of 360° images, whereby two bounding boxes are labeled as belonging to the same object. The image depicts two active linked boxes, ready to be broken up (by clicking the linkage button below the left active box), corrected (by dragging the middle point and borders of the active box) or deleted (by clicking the red X mark button) as need be. The linkage icon in the middle of the screen informs the user that there are two active linked boxes. The boxes can be verified by clicking the green check mark button. The orange color is specific for the playground class.