Apr 17, 2017

Deep Residual Network (ResNet)

Main idea:
The central idea of the paper itself is simple and elegant. They take a standard feed-forward ConvNet and add skip connections that bypass (or shortcut) a few convolution layers at a time. Each bypass gives rise to a residual block in which the convolution layers predict a residual that is added to the block's input tensor.


 Although, Deep feed-forward conv nets tend to suffer from optimization difficulty (high training and high validation error). The residual network architecture solves this by adding shortcut connections that are summed with the output of the convolution layers.

  • add the previous conv output 'x' (as residual) to the next output 
    • H(x) = F(x)+x
    •         = F(x)+Ix // multiplication with identity I-called identity mapping
  • If x is sufficient then F(.) will learn to weight the filters to zero. Otherwise learn to adjust weights to get optimal value.
  • Simply adding series of conv layers has large training error. 56-layer net has higher training error and test error than 20-layer net "Overly deep" plain nets have higher training error
  • Very simple design (series of fixed 3x3 conv layers)
  • Shortcut mapping is identity then forward pass additively propagates and Loss additively passes back as gradient (as opposed to multiplicative gradient propagation in other case)
  • what if shortcut mapping ℎ ≠ identity?
    • eg, conv(), xor, multiply with 0.5 etc increases the error
  • Keep the shortest path as smooth as possible by 
    • using identity
    • forward/backward signals directly flow through this path

Apr 16, 2017

Quote of the day

As I age, the temptation of Fjaka is plunging me over onto the bed.
All I wish is to spend all day long lying, mulling over and listening to a serenade.

Apr 15, 2017

Quote of the day

Mono no aware 

Apr 13, 2017


               Reference: https://arxiv.org/pdf/1703.04044v2.pdf

Apr 6, 2017

Secret Garden: 'The things you are to me'

Why CRF?

MRF is a generative model. Hence we need to model
i) the likelihood of image given label
ii) prior of label
the inference can be modeled from the joint probability (using Bayes theorm)  as a conditional probability of label given the image.  
To make the inference tractable only local relationship between labels are encoded into in the form ii).

CRF can directly model the conditional probability of label given image, hence we don't need to explicitly model i) and ii).