As a method of artificial intelligence, deep neural networks are contributing to solving various problems. In general, deep neural networks use gradient descent method as a learning algorithm. However, when the number of hidden layers increases, a vanishing gradient problem or an exploding gradient problem occurs, which is a big obstacle to learning.
This study classifies the vanishing gradient problem caused by differential coefficients into three types. We propose that the parametric activation function can alleviate the vanishing gradient problem caused by nonlinear transformation among the three classified types. The parametric activation function refers to an activation function that can change scale and location by reflecting the characteristics of input data. The parametric activation function can minimize the loss function without limiting the differential scale through the backpropagation process. Various parametric activation functions and existing activation functions were compared for the XOR problem with 10 hidden layers and the MNIST problem with 8 hidden layers. As a result of the experiment, it was confirmed that the parametric activation function has superior performance in alleviating gradient loss compared to the existing activation function.