Paper Review - Finding Label Errors in Autonomous Vehicle Data with Learned Observation Assertions
- Akshat Mandloi
- Feb 2, 2023
- 2 min read
Updated: Feb 8, 2023

Link - Finding Label and Model Errors in Perception Data With Learned Observation Assertions (arxiv.org)
Why label errors are an issue?
Label errors can lead to downstream safety risks in trained models
They are a serious risk where trained models are used in safety critical applications
They impact model training.
Contributions
New abstraction - Learned Observation Assertions are proposed.
Leverages existing labeled datasets, and ML models to learn a probabilistic model for finding errors in labels.
The proposed system learns priors using user-provided features and existing resources.
Learned Observations Assertions (LOA) -
three components -> data associations, priors over features, application objective functions (AOF)
supports associating observations together - across frames and across time - which are jointly considered for finding errors.
Methods of leveraging organizational resources
users specify features over data
these features are used to generate priors and application objective functions to guide search for errors
Priors - input is sets of observations, output is probability of seeing a feature of input
AOF - transform prior values for application at hand.
Scene Syntax -
Scene is a set of tracks
track consists of observation bundles
observation bundles consists of observations
features are defined over individual elemets , elemets across scenes, whole tracks, can be anything
AOF are defined over these feature distribution - by default its KDEobsDistribution
Once fit is done, graphical model is generated, creating nodes for each observation and feature distribution.
Feature Distributions -
Features over single observations -> eg box volume
Features over observation bundles -> eg observations within bundles should agree on a class
Features between observations -> eg velocity estimated by box center offset
Features over tracks -> can be used to normalize score over tracks'
These act as input to Learning Feature Distributions which further act as input to Application Objective Functions
Scoring -

The heart of the process -
LOA and AOF are the heart- which by default its KDE
Kernel Density Estimator from sklearn is used.
Uses Ball Tree or KD Tree for efficient queries
Can be used to tell whether a data point is unusual.
Testing on COCO to follow. Stay Tuned.
Comentários