Paper Review - Finding Label Errors in Autonomous Vehicle Data with Learned Observation Assertions

Updated: Feb 8, 2023

Label errors can lead to downstream safety risks in trained models
They are a serious risk where trained models are used in safety critical applications
They impact model training.

Contributions

New abstraction - Learned Observation Assertions are proposed.
Leverages existing labeled datasets, and ML models to learn a probabilistic model for finding errors in labels.
The proposed system learns priors using user-provided features and existing resources.
Learned Observations Assertions (LOA) -
- three components -> data associations, priors over features, application objective functions (AOF)
- supports associating observations together - across frames and across time - which are jointly considered for finding errors.
Methods of leveraging organizational resources
- users specify features over data
- these features are used to generate priors and application objective functions to guide search for errors
- Priors - input is sets of observations, output is probability of seeing a feature of input
- AOF - transform prior values for application at hand.

Scene Syntax -

Scene is a set of tracks
track consists of observation bundles
observation bundles consists of observations
features are defined over individual elemets , elemets across scenes, whole tracks, can be anything
AOF are defined over these feature distribution - by default its KDEobsDistribution
Once fit is done, graphical model is generated, creating nodes for each observation and feature distribution.

Feature Distributions -

Features over single observations -> eg box volume
Features over observation bundles -> eg observations within bundles should agree on a class
Features between observations -> eg velocity estimated by box center offset
Features over tracks -> can be used to normalize score over tracks'
These act as input to Learning Feature Distributions which further act as input to Application Objective Functions

Scoring -