見出し画像

The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks


Arxiv


Tensorflow / Pytorch implementation


Demo



1. Abstract

画像3

1-1.   Loss function for direct optimization w.r.t Jaccard index

1-2.   Optimizing the Jaccard index in a continuous optimization framework

1-3.   Lovasz hinge loss: binary image segmentation
         Lovasz Softmax loss: multi class image segmentation



2. Optimization surrogates for submodular loss functions

画像2

2-1.   Optimization of Jaccard loss (a problem to select a class for each pixel) is a discrete optimization problem and NP-hardness (2^p)

2-2.   Jaccard set function (6) has been shown to be submodular ( Yu, 2015, The Lovász Hinge: A Novel Convex Surrogate for Submodular Losses ) and can be computed in polynomial time.

 


画像3

2-3.   The Lovasz extension of a set function (=Jaccard loss) can be expressed as equation (8) and (9).

2-4.   By submodularity, the lovasz extension of a set function is the tight convex closure, piecewise linear and interpolates the value of an original set function.

2-5.   The vector g(m) of which the components are defined in equation (9) directly corresponds to the derivative of the Lovasz extension of a set function w.r.t m.



3. Lovasz hinge

画像4

3-1.   Figure above shows the Lovasz hinge applied to the Jaccard loss.

3-2.   Substituting m_i to equation (8), can derive Lovasz hinge applied to the Jaccard loss.

3-3.   In the case of a single prediction (only one pixel), the Lovasz hinge reduces to the standard hinge loss.

3-4.   In the figure above, red points indicate the values of Jaccard index. Dots are on the  Lovasz hinge convex surface.

3-5.   In the process of optimization, network is trained as a relative margin vector r goes to the lowest red point.



4. Lovasz Softmax

画像5

4-1.   Figure above shows the Lovasz Softmax applied to the Jaccard loss.

4-2.   Instead of using max-margin setting used in Lovasz hinge setting, Softmax probabilities are used to derive m_i(c) in equation (11).

4-3.   When considering the class-averaged mIoU metric, average the class-specific loss surrogates in equation (12).



5. Optimization of intersection over union

5-1.   The Lovasz extension (Equation (8)) can be achieved by sorting the elements of m in O(p log p) time and doing O(p) calls due to pixel size (p).

5-2.   The algorithm below can derive gradient of the Jaccard loss extension ( =[g_1(m), g_2(m), ..., g_p(m)] ) in time O(p log p).

画像6



6. Reference


この記事が気に入ったらサポートをしてみませんか?