Current location - Plastic Surgery and Aesthetics Network - Plastic surgery and beauty - What is deep learning and machine vision?
What is deep learning and machine vision?
the framework of deep learning, especially the framework based on artificial neural network, can be traced back to the new cognitive machine proposed by Fukushima Bunyan in 198, and the history of artificial neural network is even longer. In 1989, Yann LeCun and others began to apply the standard back propagation algorithm [3] proposed in 1974 to the deep neural network, which was used for handwritten postal code recognition. Although the algorithm can be successfully implemented, the calculation cost is very huge, and the training time of neural network reaches 3 days, so it can not be put into practical use [4]. Many factors lead to this slow training process, one of which is the problem of gradient disappearance put forward by Sepp Hochreiter, a student of Jürgen Schmidhuber, in 1991 [5][6]. At the same time, neural network has been challenged by other simpler models, such as support vector machine, which became more popular machine learning algorithms from the 199s to the early 21st century.

The concept of "deep learning" has been concerned since around 27. At that time, Geoffrey Hinton and Ruslan Salakhutdinov proposed an effective training algorithm in feedforward neural networks. In this algorithm, each layer in the network is regarded as an unsupervised restricted Boltzmann machine, and then the supervised back propagation algorithm is used for optimization [7]. Prior to this, in 1992, in a more general situation, schmid Huber also proposed a similar training method on recurrent neural network, and proved in experiments that this training method can effectively improve the execution speed of supervised learning [8][9].

Since the appearance of deep learning, it has become a part of various leading systems, especially in computer vision and speech recognition. Experiments on general data sets for inspection, such as TIMIT in speech recognition and ImageNet, Cifar1 in image recognition, show that deep learning can improve the recognition accuracy.

the progress of hardware is also an important factor for deep learning to regain its attention. The appearance of high-performance graphics processor has greatly improved the speed of numerical and matrix operations, which has significantly shortened the running time of machine learning algorithms [1][11].

basic concepts [edit ]

the foundation of deep learning is distributed representation in machine learning. Dispersion means that the observed values are assumed to be generated by the interaction of different factors. On this basis, deep learning further assumes that this interaction process can be divided into multiple levels, representing the multi-level abstraction of the observed values. Different levels and levels can be used for different levels of abstraction [1].

Deep learning uses this hierarchical abstraction, and higher-level concepts are learned from lower-level concepts. This hierarchical structure is often constructed layer by layer by greedy algorithm, and more effective features that are helpful to machine learning are selected from it [1].

Many deep learning algorithms appear in the form of unsupervised learning, so these algorithms can be applied to unlabeled data that other algorithms can't reach, which is richer and easier to obtain than tagged data. This has also won an important advantage for deep learning [1].

deep learning under artificial neural network [edit ]

Some of the most successful deep learning methods involve the application of artificial neural network. Artificial neural network was inspired by the theory put forward by Nobel Prize winners David H. Hubel and Torsten Wiesel in 1959. Huber and Wiesel found that there are two kinds of cells in the primary visual cortex of the brain: simple cells and complex cells, which undertake different levels of visual perception. Inspired by this, many neural network models are also designed as hierarchical models between different nodes [12].

The new cognitive machine proposed by Fukushima Bonhiko introduces a convolutional neural network trained by unsupervised learning. Yan Lecun applied the supervised back propagation algorithm to this framework [13]. In fact, since the back propagation algorithm was put forward in 197s, many researchers have tried to apply it to train supervised deep neural networks, but most of the initial attempts failed. Sepp Hochreiter attributed the failure to gradient disappearance in his doctoral thesis, which appeared in both deep feedforward neural networks and recursive neural networks, and the training process of the latter was similar to that of deep networks. In the process of hierarchical training, the error applied to modify the model parameters decreases exponentially with the increase of the number of layers, which leads to the inefficiency of model training [14][15].

in order to solve this problem, researchers have put forward some different methods. In 1992, Jürgen Schmidhuber proposed a multi-level network, which used unsupervised learning to train each layer of the deep neural network, and then used the back propagation algorithm to optimize it. In this model, each layer in the neural network represents a compressed representation of the observed variables, which is also transmitted to the next layer of network [8].

another method is the long short term memory (lstm) proposed by Sepp Hochreiter and Juergen schmid Huber [16]. In 29, in the Lian Bi Handwriting Recognition Competition held by ICDAR 29, the deep multidimensional long-term and short-term memory neural network won three of the competitions without any prior knowledge [17][18].

Sven Baker proposed a neural abstract pyramid model that only relies on gradient symbols in training, so as to solve the problems of image reconstruction and face location [19].

other methods also use unsupervised pre-training to construct neural networks to find effective features, and then use supervised back propagation to distinguish labeled data. The depth model proposed by Hinton et al. in 26 proposed a method of learning high-level representation by using multiple hidden variables. This method uses the restricted Boltzmann machine [2] proposed by smolenski in 1986 to model each layer containing high-level features. The model ensures that the lower bound of logarithmic likelihood of data increases with the increase of the number of layers. When enough layers are learned, this deep structure becomes a generation model, and the whole data set can be reconstructed by top-down sampling [21]. Hinton claims that this model can effectively extract features from high-dimensional structured data [22].

Andrew Ng and Jeff Dean's Google Brain team created a neural network to learn high-level concepts (such as cats) only through YouTube videos [23] [24].

other methods rely on the powerful computing power of modern electronic computers, especially GPU. In 21, in Juergen schmid Huber's research group in IDSIA, a Swiss artificial intelligence laboratory, Dan Ciresan and his colleagues demonstrated the existence of using GPU to directly execute the back propagation algorithm and ignoring the problem of gradient disappearance. This method outperforms other existing methods on the handwritten recognition MNIST data set given by Yan Lecun et al.

up to 211, the latest method in deep learning of feedforward neural networks is to alternately use convolutional layers and max-pooling layers, and add a simple classification layer as the top. There is no need to introduce unsupervised pre-training in the training process [25][26]. Since 211, the GPU implementation of this method [25] has won many kinds of pattern recognition competitions, including IJCNN 211 traffic sign recognition competition [27] and other competitions.

these deep learning algorithms are also the first algorithms to achieve the same competitiveness as human performance in some recognition tasks [28].

deep learning structure [edit ]

A deep neural network is a neural network with at least one hidden layer. Similar to shallow neural network, deep neural network can also provide modeling for complex nonlinear systems, but the extra layers provide a higher level of abstraction for the model, thus improving the ability of the model. Deep neural networks are usually feedforward neural networks, but there are also studies on language modeling and other aspects to extend them to recursive neural networks [29]. Covolutional Neuron Networks, CNN) has been successfully applied in the field of computer vision [3]. Since then, convolutional neural network has also been used as an auditory model in the field of automatic speech recognition, and better results have been obtained than previous methods [31].

deep neural network [edit ]

deep neuron networks, DNN) is a discriminant model, which can be trained by back propagation algorithm. The weight update can be solved by using the following formula:

where is the learning rate and the cost function. The choice of this function is related to the type of learning (such as supervised learning, unsupervised learning and reinforcement learning) and the activation function. For example, in order to supervise learning on a multi-classification problem, the usual choice is to use Softmax function as activation function and cross entropy as cost function. The Softmax function is defined as, where represents the probability of the category, and and respectively represent the inputs to the cells and. Cross entropy is defined as, which represents the target probability of the output unit and the probability output to the unit after the activation function is applied [32].

Problems of Deep Neural Network [Edit ]

Similar to other neural network models, if you only train simply, there may be many problems in Deep Neural Network. Two common problems are over-fitting and long operation time.

Deep neural networks are prone to over-fitting, because the added abstraction layer enables the model to model rare dependencies in training data. In this regard, methods such as decreasing weight (normalization) or sparse (-normalization) can be used in the training process to reduce the over-fitting phenomenon [33]. Another late normalization method for deep neural network training is "dropout" regularization, that is, a part of hidden layer units are randomly discarded in training to avoid modeling rare dependencies [34].

Because of their simple implementation, the back propagation algorithm and gradient descent method can converge to a better local optimal value compared with other methods, and become the common methods of neural network training. However, the calculation cost of these methods is very high, especially when training the deep neural network, because many parameters such as the scale (i.e. the number of layers and the number of nodes per layer), learning rate, initial weight and so on of the deep neural network need to be considered. It is not feasible to scan all parameters because of the time cost, so mini-batching, that is, training by combining multiple training samples instead of using only one sample at a time, is used to speed up model training [35]. The most significant speed improvement comes from GPU, because matrix and vector computing are very suitable for GPU implementation. However, it is still difficult to use large-scale clusters to train deep neural networks, so there is still room for improvement in the parallelization of training for deep neural networks.

deep belief network [edit ]

a restricted Boltzmann machine (RBM) that completely connects the visible layer and the hidden layer. Note that the interior of the visible layer unit and the hidden layer unit are not connected with each other.

deep belief networks (DBN) is a probability generation model with multiple hidden units, which can be regarded as a composite model composed of multiple simple learning models [36].

the deep belief network can be used as the pre-training part of the deep neural network, and provide the initial weight for the network, and then use back propagation or other decision algorithms as the means of optimization. This is valuable when training data is scarce, because improper initialization weights will significantly affect the performance of the final model, and the weights obtained by pre-training are closer to the optimal weights than random weights in the weight space. This not only improves the performance of the model, but also accelerates the convergence speed in the tuning stage [37].

each layer in the deep belief network is a typical restricted Boltzmann machine (RBM), which can be trained by an efficient unsupervised layer-by-layer training method. Restricted Boltzmann machine is an undirected energy-based generation model, which includes an input layer and a hidden layer. The edge of the pair in the graph only exists between the input layer and the hidden layer, but there is no edge inside the input layer node and the hidden layer node. The training method of single-layer RBM was first put forward by Jeffrey Hinton in the training of "expert product", which is called contrast divergence, CD). Contrastive divergence provides an approximation of maximum likelihood, which is ideally used to learn the weights of restricted Boltzmann machines [35]. When the single-layer RBM is trained, another layer of RBM can be stacked on the trained RBM to form a multi-layer model. At each stacking, the original multi-layer network input layer is initialized as a training sample, and the weight is the weight obtained by previous training. The output of this network is used as the input of a new RBM, and the new RBM repeats the previous single-layer training process, and the whole process can continue until it reaches a certain expected termination condition [38].

Although the approximation of the contrast bifurcation to the maximum likelihood is very rough (the contrast bifurcation is not in the gradient direction of any function), the empirical results prove that this method is an effective method to train the depth structure [35].

Convolutional neural network [edit ]

Main item: Convolutional neural network

convolutional neuron networks (CNN) consists of one or more convolution layers and a fully connected layer at the top (corresponding to the classical neural network), and also includes correlation weights and pooling layer. This structure enables convolutional neural networks to make use of the two-dimensional structure of input data. Compared with other deep learning structures, convolutional neural networks can give better results in image and speech recognition. This model can also be trained by back propagation algorithm. Compared with other deep and feedforward neural networks, convolutional neural networks need to estimate fewer parameters, making it an attractive deep learning structure [39].

Convolutional deep belief networks [edit ]

Convolutional deep belief networks (CDBN) is a relatively new branch in the field of deep learning. Structurally, the convolutional depth belief network is similar to the convolutional neural network. Therefore,