Paper Review - GPS-Net: Graph Property Sensing Network for Scene Graph Generation
- Akshat Mandloi
- Sep 2, 2023
- 2 min read
Introduction
Three key properties of scene graph - a) edge direction information b) difference in priority between nodes c) long tailed distribution of relationships
Importance of node varies according to number of triplets they are included in the graph - existing works treat all nodes as equal in a scene graph
Motivation of GPS-Net -
Direction Aware Message Passing (DMP) which enhance node feature with node-specific contextual information
Node-Priority Sensitive Loss (NPS Loss) - to encode difference in priority between different nodes
Adaptive Reasoning Module - for handling long tailed distribution of relationships
Approach

Faster RCNN to obtain object proposals for each image
O - object categories (including background), R - relationship cateories
Direction Aware Message Passing
Global Context Message Parsing - adopts softmax for normalization
Takes node features as input and gives an output + Neighborhood of this node

The exponential part is defined as the pairwise contextual coefficient between two nodes
However, GCMP generates the same contextual information for all nodes - why??

Since, GCMP treats all nodes equal it can be simplifies as Figure 3 (b) - However this also ignores edge direction information and cannot provide node specific contextual information.
DCM - inspired by multi-modal low rank bilinear pooling
Contextual info is formulated as tri linear model based on Tucker Decomposition
Advantages - a) union box features to expand receptive field b) tri-linear model is better c) Hadamard product jointly affects context modeling d) position of subject and object is specified - considers edge direction information
Consider both forward and backward relations and stack them and then take a kornecker product
Transformer layer - Refine the contextual information
Node Priority Sensitive Loss
Cross-entropy loss - doesn't account for importance of nodes
Priority proportional to number of triplets they are involved in
Inspired by focal loss - key differences - a) Mainly used to solve node priority problem in Scene Graph Generation(SGG) b) focusing parameter is a function of node priority
Loss function formula similar to focal loss


Adaptive Reasoning Module
To classify relationships
provides prior for classification - 2 steps - a) frequency softening b) bias adaptation
Frequency Softening -

Bias Adaptation -

Results

コメント