Conv 1x1 configuration for feature reduction
Solution 1:
Since you are going to train your net end-to-end, whatever configuration you are using - the weights will be trained to accommodate them.
BatchNorm?
I guess the first question you need to ask yourself is do you want to use BatchNorm
? If your net is deep and you are concerned with covariate shifts then you probably should have a BatchNorm
-- and here goes option no. 3
BatchNorm first?
If your x
is the output of another conv
layer, than there's actually no difference between your first and second alternatives: your net is a cascade of ...-conv
-bn
-ReLU
-conv
-BN
-ReLU
-conv
-... so it's only an "artificial" partitioning of the net into triplets of functions conv
, bn
, relu
and up to the very first and last functions you can split things however you wish. Moreover, since Batch norm is a linear operation (scale + bias) it can be "folded" into an adjacent conv
layer without changing the net, so you basically left with conv
-relu
pairs.
So, there's not really a big difference between the first two options you highlighted.
What else to consider?
Do you really need ReLU
when changing dimension of features? You can think of the reducing dimensions as a linear mapping - decomposing the weights mapping to x
into a lower rank matrix that ultimately maps into c
dimensional space instead of 2c
space. If you consider a linear mapping, then you might omit the ReLU
altogether.
See fast RCNN SVD trick for an example.