Depthwise-Separable convolutions in Pytorch
In the context of machine learning, Convolution is the process of computing 2 matrices A and B, where matrix A is going to be the input and B is the filter — also called kernel -, this will then generate a new matrix C which is called feature map.
There are different types of convolutions, each one of them with their pros and cons, but here we will take a look at depthwise-separable convolutions as well as how to implement them in Pytorch
Depthwise-separable convolutions were first utilized in Rigid-Motion Sctattering and then popularized by Xception networks
Depthwise Convolution is a type of convolution where we apply a single convolutional filter for each input channel
Pointwise Convolution is a type of convolution that uses a 1x1 kernel
A depthwise-separable convolution is the combination of both depthwise followed by a pointwise convolutions.
Main advantages of depthwise-separable convolutions
- They have lesser number of parameters to adjust as compared to the standard CNN’s, which reduces overfitting
- They are computationally cheaper because of fewer computations which makes them a great option to run on low-end hardware (see MobileNet)
Pytorch implementation
As you can see there are way less parameters in the depthwise-convolution (1/4), which simply means a depthwise-separable convolution runs much faster than a normal convolution.
If you want to dig deeper on depthwise-convolutions as well as performance, I recomend you to read the original Xception paper explaining all that https://arxiv.org/pdf/1610.02357v3.pdf