Daniel's Deep Study Notes, Deep Learning Quick Guide

This article is transferred from: Reading technology Zouxy finishing

Deep learning , Deep Learning, is a learning algorithm and is also an important branch of artificial intelligence. From rapid development to practical application, deep learning has overturned the algorithm design ideas in many fields such as speech recognition, image classification, and text understanding in just a few years and gradually formed a kind of end-to-end training data. The end-to-end model is then output directly to obtain a new model of the final result. So, how deep is deep learning? Did you learn a bit? This article will take you through the methods and processes behind the deep learning of high-end range.

I. Overview
Second, background three, human brain vision mechanism four, on the characteristics
4.1. Granularity of Feature Representation 4.2. Primary (Shallow) Feature Representation 4.3. Structural Feature Representation 4.4. How Many Characteristics Do We Need? V. The basic ideas of Deep Learning VI. Shallow Learning and Deep Learning 7. Deep Learning and Neural Network 8. Deep Learning Training Process 8.1. Traditional Neural Network Training Methods 8.2. Deep Learning Training Process Nine, commonly used models or methods of Deep Learning 9.1, AutoEncoder automatic encoder 9.2, Sparse Coding sparse coding 9.3, Restricted Boltzmann Machine (RBM) restrictions Boltzmann machine 9.4, Deep BeliefNetworks deep belief network 9.5, Convolutional Neural Networks convolutional nerve Network Ten, Summary and Outlook

| I. Overview

Artificial Intelligence, also known as human intelligence, is one of humankind's best dreams, just like immortality and interplanetary roaming. Although computer technology has made great progress, so far, there is not a computer that can generate "self" consciousness. Yes, with the help of humans and a large amount of ready-made data, the computer can perform very powerfully, but leaving the two, it can't even tell a comet and a Wangxing.

Turing (Turing, we all know it. The originators of computers and artificial intelligence, corresponding to their famous Turing machines and Turing tests, respectively) proposed in the 1950 paper that the Turing experiment envisaged Dialogue with the wall, you will not know whether to talk to you, people or computers. This undoubtedly gives computers, especially artificial intelligence, a preset high expectation. But half a century has passed and the progress of artificial intelligence is far from reaching the Turing test standard. This is not only disappointing people who have been waiting for years, but also believes that artificial intelligence is a flicker and related fields are "pseudoscience."

However, since 2006, the field of machine learning has made breakthrough progress. The Turing experiment was at least not as far-reaching as possible. As for technical means, not only rely on the ability of cloud computing for parallel processing of big data, but also rely on algorithms. The algorithm is, Deep Learning. With the help of the Deep Learning algorithm, humans finally found a way to deal with the ancient concept of "abstract concept."

In June 2012, the "New York Times" disclosed the Google Brain project and attracted wide public attention. The project was led by Andrew Ng, a renowned Stanford University professor of machine learning, and Jeff Dean, a world-leading expert in large-scale computer systems. He trained a so-called "deep neural network" (DNN) using a parallel computing platform of 16,000 CPU Cores. , Deep Neural Networks' machine learning model (having a total of 1 billion nodes internally. This network naturally cannot be compared with human neural networks. It should be noted that there are more than 15 billion neurons in the human brain, interconnected nodes That is, the number of synapses is more like the number of galactic sands. It has been estimated that if the axons and dendrites of all nerve cells in a personâ€™s brain are connected in sequence and pulled into a straight line, they can be connected from the earth to the moon. Returning to Earth from the Moon, it achieved great success in the fields of speech recognition and image recognition.

Andrew, one of the project leaders, said: "We didn't frame our boundaries like we usually do, but we put a lot of data directly into the algorithm, let the data speak for itself, and the system automatically learns from the data." Another person in charge Jeff said: "We never told the machine when we were training that: 'This is a cat.' The system actually invented or understood the concept of 'cat'."

In November 2012, Microsoft publicly demonstrated a fully automated simultaneous interpretation system at an event in Tianjin, China. The lecturer gave a speech in English. The computer in the background automatically completed speech recognition, English-Chinese machine translation, and Chinese speech synthesis. Very smooth. According to reports, the key technology behind the support is also DNN, or Deep Learning (DL, DeepLearning).

In January 2013, at Baidu's annual meeting, founder and CEO Robin Li announced a high-profile announcement of the establishment of Baidu Research Institute, the first of which was the Institute of Deep Learning (IDL).

Why do Internet companies with big data rush to invest heavily in research and development of deep learning technologies. It sounds like deeplearning like cows. What is deep learning? Why deep learning? How did it come from? What can you do? What are the current difficulties? The brief answers to these questions need to be taken slowly. Let's first understand the background of machine learning (the core of artificial intelligence).

| Second, background

Machine Learning is a discipline that specializes in how computers simulate or realize human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Can machines be as capable as humans? In 1959, Samuel of the United States designed a chess program. This program has the ability to learn. It can improve his chess skills in continuous chess. Four years later, this program defeated the designer himself. After another three years, this procedure defeated the United States, an undefeated champion who has maintained an eight-year history. This program shows people the ability of machine learning, and put forward many thought-provoking social issues and philosophical issues. (Oh, the normal track of artificial intelligence has not been greatly developed. What these philosophical ethics have developed very quickly. What is the future? Machines are more and more like people, people are more and more like machines. What machines are anti-human, ATM is the first shot, etc. The human mind is endless.)

Although machine learning has developed for decades, there are still many problems that are not well solved:

For example, image recognition, speech recognition, natural language understanding, weather prediction, gene expression, content recommendation, and the like. At the moment, we are thinking of solving these problems through machine learning (visual perception as an example):

Data is obtained from the beginning through a sensor (for example, CMOS). Then after preprocessing, feature extraction, feature selection, and then to reasoning, prediction or identification. The last part, that is, the part of machine learning, the vast majority of the work is done in this area, there are a lot of paper and research.

The middle three parts are summarized as feature expressions. The good feature expression plays a very key role in the accuracy of the final algorithm, and the system's main calculation and testing work consumes most of this. However, this practice is generally done manually. Rely on artificial extraction features.

As of now, there have been many characteristics of NB (good features should have invariance (size, scale, rotation, etc.) and distinguishability): For example, the emergence of Sift is a landmark in the field of local image feature descriptor research. work. Because SIFT is invariant to image changes such as scale, rotation and certain angles of view and illumination changes, and SIFT is very distinguishable, it does make it possible to solve many problems. But it is not everything.

However, the manual selection of features is a very laborious and heuristic (requiring professional knowledge) approach. Whether or not it can be selected depends largely on experience and luck, and its adjustment requires a lot of time. Since manual selection of features is not so good, can we learn some features automatically? The answer is yes! Deep Learning is used to do this. Look at it as an alias UnsupervisedFeature Learning, you can justify the name, Unsupervised means that people do not participate in the selection process.

How did it learn? How do you know which features are better or not? We say that machine learning is a discipline that specializes in how computers simulate or realize human learning behavior. Well, how does our human visual system work? Why can we find another one in the vast sea, the mortal beings, the red dust (because you exist in my deep mind, my dreams my heart my song ...). The human brain then NB, can we refer to the human brain and simulate the human brain? (It seems to be related to the characteristics of the human brain, ah, the algorithm is good, but I do not know whether it is artificially imposed, in order to make his work sacred and elegant.) In recent decades, cognitive neuroscience The development of disciplines such as biology and biology has made us no longer unfamiliar with our mysterious and magical brain. It also contributed to the development of artificial intelligence.

| Third, the human brain vision mechanism

The 1981 Nobel Prize in Medicine was awarded to David Hubel (American neurobiologist born in Canada) and Torsten Wiesel, and Roger Sperry. The main contributions of the first two are "discovery of information processing in the visual system": The visual cortex is graded:

Let's see what they have done. In 1958, David Hubel and Torsten Wiesel at John Hopkins University studied the correspondence between the pupil area and the cerebral cortical neurons. They opened a 3mm hole in the cat's hindbrain skull and inserted electrodes into the hole to measure the activity of the neurons.

Then, in front of the cat's eyes, they showed various shapes and various brightness objects. And, when presenting each object, it also changes the position and angle at which the object is placed. They hope that through this approach, the kitten's pupils will experience different types of different strengths and irritations.

The reason for doing this experiment is to prove a guess. There is a corresponding relationship between different visual neurons located in the posterior cortex and the stimulation of the pupil. Once the pupil is stimulated, a certain part of the neurons in the posterior cortex will become active. After many days of tedious experiments and sacrifices of poor kittens, David Hubel and Torsten Wiesel discovered a type of neuron called the Orientation Selective Cell. When the pupil finds the edge of the object in front of the eye and the edge points in a certain direction, the neuron cell becomes active.

This discovery inspired people to think further about the nervous system. The working process of the nerve-center-brain may be an iterative, continuous abstraction process. There are two keywords here, one is abstraction and the other is iteration. From the original signal, do low-level abstractions and gradually iterate to high-level abstractions. Human logical thinking often uses highly abstract concepts.

For example, starting from the original signal intake (pupil uptake pixel Pixels), then doing preliminary processing (cerebral cortex some cells find the edge and direction), then abstract (the brain determines that the shape of the object in front of the eyes, is round) Then further abstraction (the brain further determines that the object is a balloon).

The discovery of this physiology contributed to the breakthrough of computer artificial intelligence in 40 years.

In general, the information processing of the human visual system is hierarchical. From the low-level V1 area to extract edge features, to the shape of the V2 area or part of the target, etc., to the higher level, the entire target, the behavior of the target. In other words, the high-level features are the combination of low-level features. From the low-level to the high-level, feature representations become more and more abstract, and they can express semantics or intentions more and more. The higher the level of abstraction, the fewer possible guesses there are and the better it is for classification. For example, the correspondence between word sets and sentences is many-to-one, and the correspondence between sentences and semantics is many-to-one, and the correspondence between semantics and intentions is many-to-one. This is a hierarchical system.

Sensitive people notice the key word: stratification. Is Deep learning deep enough to indicate how many layers I have, and how deep is it? That's right. How does Deep learning learn from this process? After all, it is due to the computer to deal with. The problem is how to model this process?

Because we want to learn the expression of characteristics, then we need to understand more about the characteristics, or about the characteristics of the hierarchy. So before we say Deep Learning, it is necessary for us to write down the features. (Oh, actually seeing such a good explanation of the features is not a pity here, so we plug it here).

| IV. About Features

The feature is the raw material of the machine learning system, and the impact on the final model is beyond doubt. If the data is well expressed as a feature, the linear model can usually achieve satisfactory accuracy. What do we need to consider for features?

4.1, the granularity of the feature representation

What is the granularity of the learning algorithm's characteristics that can play a role? In terms of a picture, pixel-level features have no value at all. For example, the motorcycle below does not receive any information at all from the pixel level, and it cannot distinguish between motorcycles and non-motorcycles. If the feature is a structural (or meaningful) time, such as whether it has a handlebar and whether it has a wheel, it is easy to distinguish the motorcycle from the non-motorcycle, and the learning algorithm can work. .

4.2 Primary (Shallow) Feature Representation

Since pixel-level feature representation does not work, what kind of representation is useful?

Around 1995, two scholars, Bruno Olshausen and David Field, worked at Cornell University. They tried to use physiology and computers to study visual problems in a two-pronged approach. They collected a lot of black and white landscape photos. From these photos, 400 small pieces were extracted. The size of each photo was 16x16 pixels. Mark the 400 pieces as S[i], i = 0, .. 399. Next, from the black and white landscape photos, another fragment is randomly extracted, and the size is also 16x16 pixels. You may wish to mark this fragment as T.

The question they asked was how to select a set of fragments from the 400 fragments, S[k], and synthesize a new fragment by stacking. This new fragment should be randomly selected with the target fragment T , as similar as possible, while the number of S[k] is as small as possible. Described in the language of mathematics, is:

Sum_k (a[k]S[k]) --> T, where a[k] is the weight coefficient when the fragment S[k] is superimposed.

To solve this problem, Bruno Olshausen and David Field invented an algorithm, sparse coding (Sparse Coding).

Sparse coding is a repeated iterative process with two steps per iteration:

1) Select a set of S[k] and adjust a[k] so that Sum_k(a[k]S[k]) is closest to T.

2) Fix a[k]. Among the 400 fragments, select other more suitable fragments S'[k], replacing the original S[k], making Sum_k (a[k]S'[k]) the closest T.

After several iterations, the best S[k] combination was selected. Surprisingly, the selected S[k] is basically the edge line of the different objects on the photo. These line segments are similar in shape and differ in direction.

The algorithmic results of Bruno Olshausen and David Field coincide with the physiological findings of David Hubel and Torsten Wiesel!

In other words, complex graphics often consist of some basic structure. For example, the following figure shows a graph that can be represented linearly by using 64 orthogonal edges, which can be understood as orthogonal basic structures. For example, the sample x can be reconstructed by using the weights of 0.8, 0.3, and 0.5 from three out of 1-64 edges. The other basic edges do not contribute and are therefore 0.

In addition, the big cows also found that not only the law exists in the images, but also sounds exist. They have discovered 20 basic sound structures from unmarked sounds, and the rest of the sounds can be synthesized from these 20 basic structures.

4.3. Representation of structural features

Small pieces of graphics can be made up of basic edges, more structured, more complex, and how do the conceptual graphics be represented? This requires a higher level of feature representation, such as V2, V4. Therefore, V1 looks at the pixel level as a pixel level. V2 sees V1 as a pixel level. This is a hierarchical progression. High-level expressions are formed by a combination of underlying expressions. Professionalism is the basic basis. V1 assumes that the base is the edge, and then the V2 layer is the combination of these bases of the V1 layer. At this time, the V2 area obtains a higher level of base. That is, the result of the combination of the upper layers is the combination of the upper layer and the upper layer... (Therefore, Da Ng said that Deep Learning is the "base", because it is ugly, so its name is Deep Learning or Unsupervised Feature Learning. )

Intuitively speaking, it is to find the small patch of make sense and then combine it to get the upper layer of features, recursively upward learning feature.

When doing training on different objects, the resulting edgebasis is very similar, but the object parts and models will be completely different (that's how easy it is to distinguish car or face):

From the text, what does a doc mean? We describe one thing. What does it mean to be more appropriate? With a word, I don't see it. The word is pixel level. At least it should be a term. In other words, each doc is made up of a term, but the ability to express the concept is enough, and it may not be enough. One step, reaching the topic level, with the topic, and then to the doc is reasonable. However, there is a large gap between the levels of each level, such as the concept of doc -> topic (thousands of thousands - million) -> term (10 million) -> word (million level).

When a person is looking at a doc, what his eyes see is word. Words are automatically word-formed in the brain to form term. In the way of concept organization, prior learning, topic, and then high-level learning .

4.4 How many features do I need?

We know that there needs to be a hierarchy of feature construction, from shallow to deep, but how many features should there be in each layer?

The more features, the more reference information given and the accuracy will be improved. However, multiple features means that the calculation is complex and the exploration space is large. The data that can be used for training will be sparse in each feature and will bring about various problems. The more features, the better.

Well, at this point, we can finally talk about Deep learning. Above we talked about why there is Deep learning (which allows the machine to automatically learn good features, and eliminate the manual selection process and the reference layered visual processing system). We have come to the conclusion that Deep learning requires multiple layers to obtain more Abstract feature expression. How many layers are appropriate? What architecture is used to model it? How to conduct non-supervisory training?

| V. The basic idea of â€‹â€‹Deep Learning

Suppose we have a system S, which has n layers (S1,...Sn), whose input is I and whose output is O, which is visually represented as: I => S1 => S2 =>..... => Sn => O, if the output O is equal to the input I, that is, after the input I has undergone this system change, there is no information loss (Oh, Da Niu said, this is not possible. Information theory has a saying that "information is lost layer by layer" (information processing Inequality), suppose that processing a information to obtain b, and then b processing to obtain c, then we can prove: a and c mutual information will not exceed the mutual information of a and b. This shows that information processing will not increase information, most of the processing will Loss of information. Of course, if it is worthless to lose information that is useless, and it stays the same, this means that input I goes through every layer of Si without any loss of information, ie, at any level of Si, it It is another representation of the original information (ie input I). Now back to our theme Deep Learning, we need to learn features automatically. Suppose we have a bunch of input I (like a bunch of images or text). Suppose we have designed a system S (with n layers). We adjust the parameters in the system. So that its output is still input I, then we can automatically get a series of hierarchical features of the input I, namely S1, ..., Sn.

For deep learning, the idea is to stack multiple layers, that is, the output of this layer as the input to the next layer. In this way, it is possible to hierarchically express the input information.

In addition, the front is assuming that the output is strictly equal to the input. This limit is too strict. We can slightly relax this limit. For example, we only need to make the difference between input and output as small as possible. This relaxation will lead to another type of different Deep. Learning method. The above is the basic idea of â€‹â€‹Deep Learning.

| 6. Shallow Learning and Deep Learning

Shallow learning is the first wave of machine learning.

In the late 1980s, the invention of the back-propagation algorithm (also called Back Propagation algorithm or BP algorithm) for artificial neural networks brought hope to machine learning and set off an upsurge of machine learning based on statistical models. This boom continued until today. It has been found that using the BP algorithm allows an artificial neural network model to learn statistical rules from a large number of training samples and thus predict unknown events. This kind of statistics-based machine learning method has superiority in many aspects compared with the past based on artificial rules. Artificial neural network at this time, although also known as Multi-layer Perceptron, is actually a shallow model containing only one hidden layer node.

In the 1990s, a variety of shallow machine learning models were successively proposed, such as Support Vector Machines (SVM), Boosting, and Maximum Entropy methods (such as LR, Logistic Regression). The structure of these models can basically be seen as a hidden layer node (such as SVM, Boosting), or no hidden layer nodes (such as LR). These models have achieved great success both in theoretical analysis and in their application. In contrast, due to the difficulty of theoretical analysis, training methods also require a lot of experience and skills. During this period, shallow artificial neural networks are relatively quiet.

Deep learning is the second wave of machine learning.

In 2006, Professor Geoffrey Hinton and his student Ruslan Salakhutdinov of the University of Toronto, Canada, and his student Ruslan Salakhutdinov published an article in Science that opened the wave of deep learning in academia and industry. This article has two main viewpoints: 1) The multi-hidden layer artificial neural network has excellent feature learning capabilities, and the learned features have a more characterization of the data, which is conducive to visualization or classification; 2) deep neural networks Difficulties in training can be effectively overcome by layer-wise pre-training. In this article, layer-by-layer initialization is achieved through unsupervised learning.

Currently, most of the learning methods such as classification and regression are shallow structure algorithms. The limitation is that the ability to represent complex functions is limited in the case of finite samples and computational units, and the generalization ability of complex classification problems is restricted. Deep learning can learn a deep nonlinear network structure, realize the approximation of complex functions, represent the distributed representation of input data, and demonstrate a strong ability to learn the essential characteristics of data sets from a few sample sets. (The advantage of multi-layer is that you can express complex functions with fewer parameters.)

The essence of deep learning is to learn more useful features by constructing a machine learning model with a lot of hidden layers and massive training data, so as to ultimately improve the accuracy of classification or prediction. Therefore, "deep model" is the means, and "characteristic learning" is the purpose. Different from traditional shallow learning, the difference in deep learning is that: 1) The depth of the model structure is emphasized, usually 5, 6 or even 10 layers of hidden layer nodes; 2) The importance of feature learning is clearly highlighted That is to say, through layer-by-layer feature transformation, the feature representation of the sample in the original space is transformed into a new feature space, thereby making classification or prediction easier. Compared with the method of constructing features by artificial rules, using big data to learn features makes it possible to describe the rich internal information of data.

| VII, Deep learning and Neural Network

Deep learning is a new field in machine learning research. Its motivation lies in building and simulating the neural network of the human brain for analytical learning. It imitates the mechanism of the human brain to interpret data such as images, sounds, and texts. Deep learning is a kind of unsupervised learning.

The concept of deep learning stems from the study of artificial neural networks. A multilayer sensor with multiple hidden layers is a deep learning structure. Deep learning creates more abstract high-level representation attribute categories or features by combining low-level features to discover distributed representations of data.

Deep learning itself is a branch of machine learning. Simple can be understood as the development of neural networks. About two or three decades ago, neural network was once a particularly hot direction in the ML field, but it has since slowly faded out. The reasons include the following aspects:

1) It is easier to overfit, the parameters are more difficult to tune, and many tricks are needed;

2) The training speed is slow, and the effect is not better than other methods when the level is relatively small (less than or equal to 3);

So for about 20 years in the middle, the neural network was little noticed. This time is basically the world of SVM and boosting algorithms. However, an infatuated old Mr. Hinton, he persisted, and finally (and others Bengio, Yann.lecun, etc.) into a practical deep learning framework.

There are many differences between Deep learning and traditional neural networks.

The difference between the two is that deep learning adopts a similar hierarchical structure of neural networks. The system consists of a multi-layer network consisting of input layer, hidden layer (multi-layer), and output layer. Only the adjacent layer nodes have connections, the same layer. And cross-layer nodes are not connected to each other, each layer can be seen as a logistic regression model; this hierarchical structure is closer to the structure of the human brain.

In order to overcome the problems in neural network training, DL adopts a very different training mechanism from neural networks. In the traditional neural network, the method of back propagation is adopted. In simple terms, an iterative algorithm is used to train the entire network, the initial value is set at random, the output of the current network is calculated, and then the difference between the current output and the label is used. Change the parameters of the previous layers until convergence (the whole is a gradient descent method). Deep learning as a whole is a layer-wise training mechanism. The reason for this is because, if the back propagation mechanism is used, for a deep network (more than 7 layers), the residual spread to the frontmost layer has become too small, with the so-called gradient diffusion. We will discuss this issue next.

| Eight, deep learning training process

8.1. Why can't traditional neural network training methods be used in deep neural networks?

BP algorithm is a typical algorithm for traditional training multi-layer networks. In fact, it only contains several layers of networks. This training method is already very unsatisfactory. The ubiquitous local minimum in the non-convex target cost function of the deep structure (involving multiple nonlinear processing unit layers) is the main source of training difficulties.

Problems with the BP algorithm:

(1) Gradient is becoming sparse: the error correction signal is getting smaller and smaller from the top down;

(2) Convergence to local minimums: especially when starting from far away from the optimal region (initialization of random values â€‹â€‹will cause this to happen);

(3) In general, we can only train with tagged data: but most of the data is unlabeled, and the brain can learn from untagged data;

8.2, deep learning training process

If you train all layers at the same time, the time complexity will be too high; if you train one layer at a time, the deviation will be passed layer by layer. This will face the opposite problem of supervised learning above, and it will seriously under-fit (because the depth of the network has too many neurons and parameters).

In 2006, Hinton proposed an effective method for building multi-layer neural networks on unsupervised data. In simple terms, there are two steps. One is to train one network at a time, and the other is to tune the original representation x upwards. The high-level representation r and the high-level representation r are as consistent as possible. the way is:

1) First build a single layer of neurons layer by layer, so that each time you train a single-layer network.

2) After all layers have been trained, Hinton uses the wake-sleep algorithm for tuning.

Turn the weights of the layers except the topmost layer into bidirectional, so that the top layer is still a single-layer neural network, and other layers become the graph model. The upward weight is used for "cognitive" and the downward weight is used for "generating." Then use the Wake-Sleep algorithm to adjust all the weights. The consensus between cognition and generation is to ensure that the generated top-level representation can restore the underlying nodes as correctly as possible. For example, if a node at the top level represents the face, then the image of all faces should activate the node, and the resulting downward-looking image should be able to appear as a general face image. Wake-Sleep algorithm is divided into wake and sleep.

1) Wake phase: The cognitive process generates an abstract representation of each layer (node â€‹â€‹state) through external features and upward weights (cognitive weights), and uses gradient descent to modify the downlink weight between layers (generate weights). That is, "If the reality is different from what I have imagined, changing my weight makes my imagination something like this."

2) sleep stage: the generation process, through the top-level representation (concept learned when awake) and down the weight, to generate the underlying state, while modifying the upward weight between layers. That is, "if the dream scene is not a corresponding concept in my mind, changing my cognitive weight makes this scene seem to me the concept."

The deep learning training process is as follows:

1) Use non-supervised learning from the bottom up (that is, start from the ground up, layer by layer to top level training):

Using non-calibrated data (with calibration data also available) to stratify parameters at each level, this step can be seen as an unsupervised training process, which is the most distinct part from the traditional neural network (this process can be seen as a feature learning process)

Specifically, the first layer is first trained with no calibration data, and the parameters of the first layer are learned first (this layer can be seen as a hidden layer of a three-layer neural network that minimizes the difference between output and input). The limitation of capacity and the sparsity constraint make the obtained model able to learn the structure of the data itself, so as to obtain features that are more capable of expressing than the input; after learning to obtain the n-1th layer, the output of the n-1 layer is taken as the first The n-layer input trains the n-th layer, from which each layer's parameters are obtained;

2) Top-down supervised learning (that is, training with tagged data, error propagation from the top, fine-tuning the network):

Based on the parameters obtained in the first step to further fine-tune the parameters of the entire multi-layer model, this step is a supervised training process; the first step is similar to the neural network's random initialization initial value process, since the first step of DL is not random Initialization, but obtained by learning the structure of the input data, so that the initial value is closer to the global optimum, so that better results can be achieved; so the deep learning effect is largely due to the first step of the feature learning process.

| Nine, Deep Learning common model or method

9.1, AutoEncoder Automatic Encoder

One of the simplest methods of Deep Learning is to use the characteristics of artificial neural networks. An artificial neural network (ANN) is itself a system with a hierarchical structure. If a neural network is given, we assume that its output and input are the same, and then training adjustments. Its parameters get the weight in each layer. Naturally, we get several different representations of input I (each layer represents a representation), and these representations are features. An automatic encoder is a neural network that reproduces the input signal as much as possible. In order to achieve this kind of reproduction, the automatic encoder must capture the most important factor that can represent the input data, just like the PCA, find the main component that can represent the original information.

The specific process is briefly described as follows:

1) Given unlabeled data, learn features using unsupervised learning:

In our previous neural network, as in the first diagram, the input sample is labeled, ie, (input, target), so that we change the previous layers according to the difference between the current output and the target(label). Parameters until convergence. But now we only have unlabeled data, which is the figure on the right. How can this error be obtained?

As shown above, we will input an input encoder encoder, you will get a code, this code is a representation of the input, then how do we know this code is input it? We add a decoder decoder. At this time, the decoder will output a message. If the output information is similar to the input signal input (ideally, it is the same), it is obvious that we have a reason. I believe this code is reliable.æ‰€ä»¥ï¼Œæˆ‘ä»¬å°±é€šè¿‡è°ƒæ•´encoderå’Œdecoderçš„å‚æ•°ï¼Œä½¿å¾—é‡æž„è¯¯å·®æœ€å°ï¼Œè¿™æ—¶å€™æˆ‘ä»¬å°±å¾—åˆ°äº†è¾“å…¥inputä¿¡å·çš„ç¬¬ä¸€ä¸ªè¡¨ç¤ºäº†ï¼Œä¹Ÿå°±æ˜¯ç¼–ç codeäº†ã€‚å› ä¸ºæ˜¯æ— æ ‡ç¾æ•°æ®ï¼Œæ‰€ä»¥è¯¯å·®çš„æ¥æºå°±æ˜¯ç›´æŽ¥é‡æž„åŽä¸ŽåŽŸè¾“å…¥ç›¸æ¯”å¾—åˆ°ã€‚

2ï¼‰é€šè¿‡ç¼–ç å™¨äº§ç”Ÿç‰¹å¾ï¼Œç„¶åŽè®ç»ƒä¸‹ä¸€å±‚ã€‚è¿™æ ·é€å±‚è®ç»ƒï¼š

é‚£ä¸Šé¢æˆ‘ä»¬å°±å¾—åˆ°ç¬¬ä¸€å±‚çš„codeï¼Œæˆ‘ä»¬çš„é‡æž„è¯¯å·®æœ€å°è®©æˆ‘ä»¬ç›¸ä¿¡è¿™ä¸ªcodeå°±æ˜¯åŽŸè¾“å…¥ä¿¡å·çš„è‰¯å¥½è¡¨è¾¾äº†ï¼Œæˆ–è€…ç‰µå¼ºç‚¹è¯´ï¼Œå®ƒå’ŒåŽŸä¿¡å·æ˜¯ä¸€æ¨¡ä¸€æ ·çš„ï¼ˆè¡¨è¾¾ä¸ä¸€æ ·ï¼Œåæ˜ çš„æ˜¯ä¸€ä¸ªä¸œè¥¿ï¼‰ã€‚é‚£ç¬¬äºŒå±‚å’Œç¬¬ä¸€å±‚çš„è®ç»ƒæ–¹å¼å°±æ²¡æœ‰å·®åˆ«äº†ï¼Œæˆ‘ä»¬å°†ç¬¬ä¸€å±‚è¾“å‡ºçš„codeå½“æˆç¬¬äºŒå±‚çš„è¾“å…¥ä¿¡å·ï¼ŒåŒæ ·æœ€å°åŒ–é‡æž„è¯¯å·®ï¼Œå°±ä¼šå¾—åˆ°ç¬¬äºŒå±‚çš„å‚æ•°ï¼Œå¹¶ä¸”å¾—åˆ°ç¬¬äºŒå±‚è¾“å…¥çš„codeï¼Œä¹Ÿå°±æ˜¯åŽŸè¾“å…¥ä¿¡æ¯çš„ç¬¬äºŒä¸ªè¡¨è¾¾äº†ã€‚å…¶ä»–å±‚å°±åŒæ ·çš„æ–¹æ³•ç‚®åˆ¶å°±è¡Œäº†ï¼ˆè®ç»ƒè¿™ä¸€å±‚ï¼Œå‰é¢å±‚çš„å‚æ•°éƒ½æ˜¯å›ºå®šçš„ï¼Œå¹¶ä¸”ä»–ä»¬çš„decoderå·²ç»æ²¡ç”¨äº†ï¼Œéƒ½ä¸éœ€è¦äº†ï¼‰ã€‚

3ï¼‰æœ‰ç›‘ç£å¾®è°ƒï¼š

ç»è¿‡ä¸Šé¢çš„æ–¹æ³•ï¼Œæˆ‘ä»¬å°±å¯ä»¥å¾—åˆ°å¾ˆå¤šå±‚äº†ã€‚è‡³äºŽéœ€è¦å¤šå°‘å±‚ï¼ˆæˆ–è€…æ·±åº¦éœ€è¦å¤šå°‘ï¼Œè¿™ä¸ªç›®å‰æœ¬èº«å°±æ²¡æœ‰ä¸€ä¸ªç§‘å¦çš„è¯„ä»·æ–¹æ³•ï¼‰éœ€è¦è‡ªå·±è¯•éªŒè°ƒäº†ã€‚æ¯ä¸€å±‚éƒ½ä¼šå¾—åˆ°åŽŸå§‹è¾“å…¥çš„ä¸åŒçš„è¡¨è¾¾ã€‚å½“ç„¶äº†ï¼Œæˆ‘ä»¬è§‰å¾—å®ƒæ˜¯è¶ŠæŠ½è±¡è¶Šå¥½äº†ï¼Œå°±åƒäººçš„è§†è§‰ç³»ç»Ÿä¸€æ ·ã€‚

åˆ°è¿™é‡Œï¼Œè¿™ä¸ªAutoEncoderè¿˜ä¸èƒ½ç”¨æ¥åˆ†ç±»æ•°æ®ï¼Œå› ä¸ºå®ƒè¿˜æ²¡æœ‰å¦ä¹ å¦‚ä½•åŽ»è¿žç»“ä¸€ä¸ªè¾“å…¥å’Œä¸€ä¸ªç±»ã€‚å®ƒåªæ˜¯å¦ä¼šäº†å¦‚ä½•åŽ»é‡æž„æˆ–è€…å¤çŽ°å®ƒçš„è¾“å…¥è€Œå·²ã€‚æˆ–è€…è¯´ï¼Œå®ƒåªæ˜¯å¦ä¹ èŽ·å¾—äº†ä¸€ä¸ªå¯ä»¥è‰¯å¥½ä»£è¡¨è¾“å…¥çš„ç‰¹å¾ï¼Œè¿™ä¸ªç‰¹å¾å¯ä»¥æœ€å¤§ç¨‹åº¦ä¸Šä»£è¡¨åŽŸè¾“å…¥ä¿¡å·ã€‚é‚£ä¹ˆï¼Œä¸ºäº†å®žçŽ°åˆ†ç±»ï¼Œæˆ‘ä»¬å°±å¯ä»¥åœ¨AutoEncoderçš„æœ€é¡¶çš„ç¼–ç å±‚æ·»åŠ ä¸€ä¸ªåˆ†ç±»å™¨ï¼ˆä¾‹å¦‚ç½—æ°æ–¯ç‰¹å›žå½’ã€SVMç‰ï¼‰ï¼Œç„¶åŽé€šè¿‡æ ‡å‡†çš„å¤šå±‚ç¥žç»ç½‘ç»œçš„ç›‘ç£è®ç»ƒæ–¹æ³•ï¼ˆæ¢¯åº¦ä¸‹é™æ³•ï¼‰åŽ»è®ç»ƒã€‚

ä¹Ÿå°±æ˜¯è¯´ï¼Œè¿™æ—¶å€™ï¼Œæˆ‘ä»¬éœ€è¦å°†æœ€åŽå±‚çš„ç‰¹å¾codeè¾“å…¥åˆ°æœ€åŽçš„åˆ†ç±»å™¨ï¼Œé€šè¿‡æœ‰æ ‡ç¾æ ·æœ¬ï¼Œé€šè¿‡ç›‘ç£å¦ä¹ è¿›è¡Œå¾®è°ƒï¼Œè¿™ä¹Ÿåˆ†ä¸¤ç§ï¼Œä¸€ä¸ªæ˜¯åªè°ƒæ•´åˆ†ç±»å™¨ï¼ˆé»‘è‰²éƒ¨åˆ†ï¼‰ï¼š

å¦ä¸€ç§ï¼šé€šè¿‡æœ‰æ ‡ç¾æ ·æœ¬ï¼Œå¾®è°ƒæ•´ä¸ªç³»ç»Ÿï¼šï¼ˆå¦‚æžœæœ‰è¶³å¤Ÿå¤šçš„æ•°æ®ï¼Œè¿™ä¸ªæ˜¯æœ€å¥½çš„ã€‚end-to-end learningç«¯å¯¹ç«¯å¦ä¹ ï¼‰

ä¸€æ—¦ç›‘ç£è®ç»ƒå®Œæˆï¼Œè¿™ä¸ªç½‘ç»œå°±å¯ä»¥ç”¨æ¥åˆ†ç±»äº†ã€‚ç¥žç»ç½‘ç»œçš„æœ€é¡¶å±‚å¯ä»¥ä½œä¸ºä¸€ä¸ªçº¿æ€§åˆ†ç±»å™¨ï¼Œç„¶åŽæˆ‘ä»¬å¯ä»¥ç”¨ä¸€ä¸ªæ›´å¥½æ€§èƒ½çš„åˆ†ç±»å™¨åŽ»å–ä»£å®ƒã€‚åœ¨ç ”ç©¶ä¸å¯ä»¥å‘çŽ°ï¼Œå¦‚æžœåœ¨åŽŸæœ‰çš„ç‰¹å¾ä¸åŠ å…¥è¿™äº›è‡ªåŠ¨å¦ä¹ å¾—åˆ°çš„ç‰¹å¾å¯ä»¥å¤§å¤§æé«˜ç²¾ç¡®åº¦ï¼Œç”šè‡³åœ¨åˆ†ç±»é—®é¢˜ä¸æ¯”ç›®å‰æœ€å¥½çš„åˆ†ç±»ç®—æ³•æ•ˆæžœè¿˜è¦å¥½ï¼

AutoEncoderå˜åœ¨ä¸€äº›å˜ä½“ï¼Œè¿™é‡Œç®€è¦ä»‹ç»ä¸‹ä¸¤ä¸ªï¼š

Sparse AutoEncoderç¨€ç–è‡ªåŠ¨ç¼–ç å™¨ï¼š

å½“ç„¶ï¼Œæˆ‘ä»¬è¿˜å¯ä»¥ç»§ç»åŠ ä¸Šä¸€äº›çº¦æŸæ¡ä»¶å¾—åˆ°æ–°çš„Deep Learningæ–¹æ³•ï¼Œå¦‚ï¼šå¦‚æžœåœ¨AutoEncoderçš„åŸºç¡€ä¸ŠåŠ ä¸ŠL1çš„Regularityé™åˆ¶ï¼ˆL1ä¸»è¦æ˜¯çº¦æŸæ¯ä¸€å±‚ä¸çš„èŠ‚ç‚¹ä¸å¤§éƒ¨åˆ†éƒ½è¦ä¸º0ï¼Œåªæœ‰å°‘æ•°ä¸ä¸º0ï¼Œè¿™å°±æ˜¯Sparseåå—çš„æ¥æºï¼‰ï¼Œæˆ‘ä»¬å°±å¯ä»¥å¾—åˆ°Sparse AutoEncoderæ³•ã€‚

å¦‚ä¸Šå›¾ï¼Œå…¶å®žå°±æ˜¯é™åˆ¶æ¯æ¬¡å¾—åˆ°çš„è¡¨è¾¾codeå°½é‡ç¨€ç–ã€‚å› ä¸ºç¨€ç–çš„è¡¨è¾¾å¾€å¾€æ¯”å…¶ä»–çš„è¡¨è¾¾è¦æœ‰æ•ˆï¼ˆäººè„‘å¥½åƒä¹Ÿæ˜¯è¿™æ ·çš„ï¼ŒæŸä¸ªè¾“å…¥åªæ˜¯åˆºæ¿€æŸäº›ç¥žç»å…ƒï¼Œå…¶ä»–çš„å¤§éƒ¨åˆ†çš„ç¥žç»å…ƒæ˜¯å—åˆ°æŠ‘åˆ¶çš„ï¼‰ã€‚

Denoising AutoEncodersé™å™ªè‡ªåŠ¨ç¼–ç å™¨ï¼š

é™å™ªè‡ªåŠ¨ç¼–ç å™¨DAæ˜¯åœ¨è‡ªåŠ¨ç¼–ç å™¨çš„åŸºç¡€ä¸Šï¼Œè®ç»ƒæ•°æ®åŠ å…¥å™ªå£°ï¼Œæ‰€ä»¥è‡ªåŠ¨ç¼–ç å™¨å¿…é¡»å¦ä¹ åŽ»åŽ»é™¤è¿™ç§å™ªå£°è€ŒèŽ·å¾—çœŸæ£çš„æ²¡æœ‰è¢«å™ªå£°æ±¡æŸ“è¿‡çš„è¾“å…¥ã€‚å› æ¤ï¼Œè¿™å°±è¿«ä½¿ç¼–ç å™¨åŽ»å¦ä¹ è¾“å…¥ä¿¡å·çš„æ›´åŠ é²æ£’çš„è¡¨è¾¾ï¼Œè¿™ä¹Ÿæ˜¯å®ƒçš„æ³›åŒ–èƒ½åŠ›æ¯”ä¸€èˆ¬ç¼–ç å™¨å¼ºçš„åŽŸå› ã€‚DAå¯ä»¥é€šè¿‡æ¢¯åº¦ä¸‹é™ç®—æ³•åŽ»è®ç»ƒã€‚

9.2ã€Sparse Codingç¨€ç–ç¼–ç

å¦‚æžœæˆ‘ä»¬æŠŠè¾“å‡ºå¿…é¡»å’Œè¾“å…¥ç›¸ç‰çš„é™åˆ¶æ”¾æ¾ï¼ŒåŒæ—¶åˆ©ç”¨çº¿æ€§ä»£æ•°ä¸åŸºçš„æ¦‚å¿µï¼Œå³O = a1Î¦1 + a2Î¦2+â€¦.+ anÎ¦nï¼Œ Î¦iæ˜¯åŸºï¼Œaiæ˜¯ç³»æ•°ï¼Œæˆ‘ä»¬å¯ä»¥å¾—åˆ°è¿™æ ·ä¸€ä¸ªä¼˜åŒ–é—®é¢˜ï¼š

Min |I â€“ O|ï¼Œå…¶ä¸Iè¡¨ç¤ºè¾“å…¥ï¼ŒOè¡¨ç¤ºè¾“å‡ºã€‚

é€šè¿‡æ±‚è§£è¿™ä¸ªæœ€ä¼˜åŒ–å¼åï¼Œæˆ‘ä»¬å¯ä»¥æ±‚å¾—ç³»æ•°aiå’ŒåŸºÎ¦iï¼Œè¿™äº›ç³»æ•°å’ŒåŸºå°±æ˜¯è¾“å…¥çš„å¦å¤–ä¸€ç§è¿‘ä¼¼è¡¨è¾¾ã€‚

å› æ¤ï¼Œå®ƒä»¬å¯ä»¥ç”¨æ¥è¡¨è¾¾è¾“å…¥Iï¼Œè¿™ä¸ªè¿‡ç¨‹ä¹Ÿæ˜¯è‡ªåŠ¨å¦ä¹ å¾—åˆ°çš„ã€‚å¦‚æžœæˆ‘ä»¬åœ¨ä¸Šè¿°å¼åä¸ŠåŠ ä¸ŠL1çš„Regularityé™åˆ¶ï¼Œå¾—åˆ°ï¼š

Min |I â€“ O| + u(|a1| + |a2| + â€¦ + |an |)

è¿™ç§æ–¹æ³•è¢«ç§°ä¸ºSparse Codingã€‚é€šä¿—çš„è¯´ï¼Œå°±æ˜¯å°†ä¸€ä¸ªä¿¡å·è¡¨ç¤ºä¸ºä¸€ç»„åŸºçš„çº¿æ€§ç»„åˆï¼Œè€Œä¸”è¦æ±‚åªéœ€è¦è¾ƒå°‘çš„å‡ ä¸ªåŸºå°±å¯ä»¥å°†ä¿¡å·è¡¨ç¤ºå‡ºæ¥ã€‚â€œç¨€ç–æ€§â€å®šä¹‰ä¸ºï¼šåªæœ‰å¾ˆå°‘çš„å‡ ä¸ªéžé›¶å…ƒç´ æˆ–åªæœ‰å¾ˆå°‘çš„å‡ ä¸ªè¿œå¤§äºŽé›¶çš„å…ƒç´ ã€‚è¦æ±‚ç³»æ•°ai æ˜¯ç¨€ç–çš„æ„æ€å°±æ˜¯è¯´ï¼šå¯¹äºŽä¸€ç»„è¾“å…¥å‘é‡ï¼Œæˆ‘ä»¬åªæƒ³æœ‰å°½å¯èƒ½å°‘çš„å‡ ä¸ªç³»æ•°è¿œå¤§äºŽé›¶ã€‚é€‰æ‹©ä½¿ç”¨å…·æœ‰ç¨€ç–æ€§çš„åˆ†é‡æ¥è¡¨ç¤ºæˆ‘ä»¬çš„è¾“å…¥æ•°æ®æ˜¯æœ‰åŽŸå› çš„ï¼Œå› ä¸ºç»å¤§å¤šæ•°çš„æ„Ÿå®˜æ•°æ®ï¼Œæ¯”å¦‚è‡ªç„¶å›¾åƒï¼Œå¯ä»¥è¢«è¡¨ç¤ºæˆå°‘é‡åŸºæœ¬å…ƒç´ çš„å åŠ ï¼Œåœ¨å›¾åƒä¸è¿™äº›åŸºæœ¬å…ƒç´ å¯ä»¥æ˜¯é¢æˆ–è€…çº¿ã€‚åŒæ—¶ï¼Œæ¯”å¦‚ä¸Žåˆçº§è§†è§‰çš®å±‚çš„ç±»æ¯”è¿‡ç¨‹ä¹Ÿå› æ¤å¾—åˆ°äº†æå‡ï¼ˆäººè„‘æœ‰å¤§é‡çš„ç¥žç»å…ƒï¼Œä½†å¯¹äºŽæŸäº›å›¾åƒæˆ–è€…è¾¹ç¼˜åªæœ‰å¾ˆå°‘çš„ç¥žç»å…ƒå…´å¥‹ï¼Œå…¶ä»–éƒ½å¤„äºŽæŠ‘åˆ¶çŠ¶æ€ï¼‰ã€‚

ç¨€ç–ç¼–ç ç®—æ³•æ˜¯ä¸€ç§æ— ç›‘ç£å¦ä¹ æ–¹æ³•ï¼Œå®ƒç”¨æ¥å¯»æ‰¾ä¸€ç»„â€œè¶…å®Œå¤‡â€åŸºå‘é‡æ¥æ›´é«˜æ•ˆåœ°è¡¨ç¤ºæ ·æœ¬æ•°æ®ã€‚è™½ç„¶å½¢å¦‚ä¸»æˆåˆ†åˆ†æžæŠ€æœ¯ï¼ˆPCAï¼‰èƒ½ä½¿æˆ‘ä»¬æ–¹ä¾¿åœ°æ‰¾åˆ°ä¸€ç»„â€œå®Œå¤‡â€åŸºå‘é‡ï¼Œä½†æ˜¯è¿™é‡Œæˆ‘ä»¬æƒ³è¦åšçš„æ˜¯æ‰¾åˆ°ä¸€ç»„â€œè¶…å®Œå¤‡â€åŸºå‘é‡æ¥è¡¨ç¤ºè¾“å…¥å‘é‡ï¼ˆä¹Ÿå°±æ˜¯è¯´ï¼ŒåŸºå‘é‡çš„ä¸ªæ•°æ¯”è¾“å…¥å‘é‡çš„ç»´æ•°è¦å¤§ï¼‰ã€‚è¶…å®Œå¤‡åŸºçš„å¥½å¤„æ˜¯å®ƒä»¬èƒ½æ›´æœ‰æ•ˆåœ°æ‰¾å‡ºéšå«åœ¨è¾“å…¥æ•°æ®å†…éƒ¨çš„ç»“æž„ä¸Žæ¨¡å¼ã€‚ç„¶è€Œï¼Œå¯¹äºŽè¶…å®Œå¤‡åŸºæ¥è¯´ï¼Œç³»æ•°aiä¸å†ç”±è¾“å…¥å‘é‡å”¯ä¸€ç¡®å®šã€‚å› æ¤ï¼Œåœ¨ç¨€ç–ç¼–ç ç®—æ³•ä¸ï¼Œæˆ‘ä»¬å¦åŠ äº†ä¸€ä¸ªè¯„åˆ¤æ ‡å‡†â€œç¨€ç–æ€§â€æ¥è§£å†³å› è¶…å®Œå¤‡è€Œå¯¼è‡´çš„é€€åŒ–ï¼ˆdegeneracyï¼‰é—®é¢˜ã€‚

æ¯”å¦‚åœ¨å›¾åƒçš„Feature Extractionçš„æœ€åº•å±‚è¦åšEdge Detectorçš„ç”Ÿæˆï¼Œé‚£ä¹ˆè¿™é‡Œçš„å·¥ä½œå°±æ˜¯ä»ŽNatural Imagesä¸randomlyé€‰å–ä¸€äº›å°patchï¼Œé€šè¿‡è¿™äº›patchç”Ÿæˆèƒ½å¤Ÿæè¿°ä»–ä»¬çš„â€œåŸºâ€ï¼Œä¹Ÿå°±æ˜¯å³è¾¹çš„88=64ä¸ªbasisç»„æˆçš„basisï¼Œç„¶åŽç»™å®šä¸€ä¸ªtest patch, æˆ‘ä»¬å¯ä»¥æŒ‰ç…§ä¸Šé¢çš„å¼åé€šè¿‡basisçš„çº¿æ€§ç»„åˆå¾—åˆ°ï¼Œè€Œsparse matrixå°±æ˜¯aï¼Œä¸‹å›¾ä¸çš„aä¸æœ‰64ä¸ªç»´åº¦ï¼Œå…¶ä¸éžé›¶é¡¹åªæœ‰3ä¸ªï¼Œæ•…ç§°â€œsparseâ€ã€‚

è¿™é‡Œå¯èƒ½å¤§å®¶ä¼šæœ‰ç–‘é—®ï¼Œä¸ºä»€ä¹ˆæŠŠåº•å±‚ä½œä¸ºEdge Detectorå‘¢ï¼Ÿä¸Šå±‚åˆæ˜¯ä»€ä¹ˆå‘¢ï¼Ÿè¿™é‡Œåšä¸ªç®€å•è§£é‡Šå¤§å®¶å°±ä¼šæ˜Žç™½ï¼Œä¹‹æ‰€ä»¥æ˜¯Edge Detectoræ˜¯å› ä¸ºä¸åŒæ–¹å‘çš„Edgeå°±èƒ½å¤Ÿæè¿°å‡ºæ•´å¹…å›¾åƒï¼Œæ‰€ä»¥ä¸åŒæ–¹å‘çš„Edgeè‡ªç„¶å°±æ˜¯å›¾åƒçš„basisäº†â€¦â€¦è€Œä¸Šä¸€å±‚çš„basisç»„åˆçš„ç»“æžœï¼Œä¸Šä¸Šå±‚åˆæ˜¯ä¸Šä¸€å±‚çš„ç»„åˆbasisâ€¦â€¦ï¼ˆå°±æ˜¯ä¸Šé¢ç¬¬å››éƒ¨åˆ†çš„æ—¶å€™å’±ä»¬è¯´çš„é‚£æ ·ï¼‰

Sparse codingåˆ†ä¸ºä¸¤ä¸ªéƒ¨åˆ†ï¼š

1ï¼‰Trainingé˜¶æ®µï¼šç»™å®šä¸€ç³»åˆ—çš„æ ·æœ¬å›¾ç‰‡[x1, x 2, â€¦]ï¼Œæˆ‘ä»¬éœ€è¦å¦ä¹ å¾—åˆ°ä¸€ç»„åŸº[Î¦1, Î¦2, â€¦]ï¼Œä¹Ÿå°±æ˜¯å—å…¸ã€‚

ç¨€ç–ç¼–ç æ˜¯k-meansç®—æ³•çš„å˜ä½“ï¼Œå…¶è®ç»ƒè¿‡ç¨‹ä¹Ÿå·®ä¸å¤šï¼ˆEMç®—æ³•çš„æ€æƒ³ï¼šå¦‚æžœè¦ä¼˜åŒ–çš„ç›®æ ‡å‡½æ•°åŒ…å«ä¸¤ä¸ªå˜é‡ï¼Œå¦‚L(W, B)ï¼Œé‚£ä¹ˆæˆ‘ä»¬å¯ä»¥å…ˆå›ºå®šWï¼Œè°ƒæ•´Bä½¿å¾—Læœ€å°ï¼Œç„¶åŽå†å›ºå®šBï¼Œè°ƒæ•´Wä½¿Læœ€å°ï¼Œè¿™æ ·è¿ä»£äº¤æ›¿ï¼Œä¸æ–å°†LæŽ¨å‘æœ€å°å€¼ã€‚

è®ç»ƒè¿‡ç¨‹å°±æ˜¯ä¸€ä¸ªé‡å¤è¿ä»£çš„è¿‡ç¨‹ï¼ŒæŒ‰ä¸Šé¢æ‰€è¯´ï¼Œæˆ‘ä»¬äº¤æ›¿çš„æ›´æ”¹aå’ŒÎ¦ä½¿å¾—ä¸‹é¢è¿™ä¸ªç›®æ ‡å‡½æ•°æœ€å°ã€‚

æ¯æ¬¡è¿ä»£åˆ†ä¸¤æ¥ï¼š

aï¼‰å›ºå®šå—å…¸Î¦[k]ï¼Œç„¶åŽè°ƒæ•´a[k]ï¼Œä½¿å¾—ä¸Šå¼ï¼Œå³ç›®æ ‡å‡½æ•°æœ€å°ï¼ˆå³è§£LASSOé—®é¢˜ï¼‰ã€‚

bï¼‰ç„¶åŽå›ºå®šä½a [k]ï¼Œè°ƒæ•´Î¦ [k]ï¼Œä½¿å¾—ä¸Šå¼ï¼Œå³ç›®æ ‡å‡½æ•°æœ€å°ï¼ˆå³è§£å‡¸QPé—®é¢˜ï¼‰ã€‚

ä¸æ–è¿ä»£ï¼Œç›´è‡³æ”¶æ•›ã€‚è¿™æ ·å°±å¯ä»¥å¾—åˆ°ä¸€ç»„å¯ä»¥è‰¯å¥½è¡¨ç¤ºè¿™ä¸€ç³»åˆ—xçš„åŸºï¼Œä¹Ÿå°±æ˜¯å—å…¸ã€‚

2ï¼‰Codingé˜¶æ®µï¼šç»™å®šä¸€ä¸ªæ–°çš„å›¾ç‰‡xï¼Œç”±ä¸Šé¢å¾—åˆ°çš„å—å…¸ï¼Œé€šè¿‡è§£ä¸€ä¸ªLASSOé—®é¢˜å¾—åˆ°ç¨€ç–å‘é‡aã€‚è¿™ä¸ªç¨€ç–å‘é‡å°±æ˜¯è¿™ä¸ªè¾“å…¥å‘é‡xçš„ä¸€ä¸ªç¨€ç–è¡¨è¾¾äº†ã€‚

E.g:

9.3ã€Restricted Boltzmann Machine (RBM)é™åˆ¶æ³¢å°”å…¹æ›¼æœº

å‡è®¾æœ‰ä¸€ä¸ªäºŒéƒ¨å›¾ï¼Œæ¯ä¸€å±‚çš„èŠ‚ç‚¹ä¹‹é—´æ²¡æœ‰é“¾æŽ¥ï¼Œä¸€å±‚æ˜¯å¯è§†å±‚ï¼Œå³è¾“å…¥æ•°æ®å±‚ï¼ˆv)ï¼Œä¸€å±‚æ˜¯éšè—å±‚(h)ï¼Œå¦‚æžœå‡è®¾æ‰€æœ‰çš„èŠ‚ç‚¹éƒ½æ˜¯éšæœºäºŒå€¼å˜é‡èŠ‚ç‚¹ï¼ˆåªèƒ½å–0æˆ–è€…1å€¼ï¼‰ï¼ŒåŒæ—¶å‡è®¾å…¨æ¦‚çŽ‡åˆ†å¸ƒp(v,h)æ»¡è¶³Boltzmann åˆ†å¸ƒï¼Œæˆ‘ä»¬ç§°è¿™ä¸ªæ¨¡åž‹æ˜¯Restricted BoltzmannMachine (RBM)ã€‚

ä¸‹é¢æˆ‘ä»¬æ¥çœ‹çœ‹ä¸ºä»€ä¹ˆå®ƒæ˜¯Deep Learningæ–¹æ³•ã€‚é¦–å…ˆï¼Œè¿™ä¸ªæ¨¡åž‹å› ä¸ºæ˜¯äºŒéƒ¨å›¾ï¼Œæ‰€ä»¥åœ¨å·²çŸ¥vçš„æƒ…å†µä¸‹ï¼Œæ‰€æœ‰çš„éšè—èŠ‚ç‚¹ä¹‹é—´æ˜¯æ¡ä»¶ç‹¬ç«‹çš„ï¼ˆå› ä¸ºèŠ‚ç‚¹ä¹‹é—´ä¸å˜åœ¨è¿žæŽ¥ï¼‰ï¼Œå³p(h|v)=p(h1|v)â€¦p(hn|v)ã€‚åŒç†ï¼Œåœ¨å·²çŸ¥éšè—å±‚hçš„æƒ…å†µä¸‹ï¼Œæ‰€æœ‰çš„å¯è§†èŠ‚ç‚¹éƒ½æ˜¯æ¡ä»¶ç‹¬ç«‹çš„ã€‚åŒæ—¶åˆç”±äºŽæ‰€æœ‰çš„vå’Œhæ»¡è¶³Boltzmann åˆ†å¸ƒï¼Œå› æ¤ï¼Œå½“è¾“å…¥vçš„æ—¶å€™ï¼Œé€šè¿‡p(h|v) å¯ä»¥å¾—åˆ°éšè—å±‚hï¼Œè€Œå¾—åˆ°éšè—å±‚hä¹‹åŽï¼Œé€šè¿‡p(v|h)åˆèƒ½å¾—åˆ°å¯è§†å±‚ï¼Œé€šè¿‡è°ƒæ•´å‚æ•°ï¼Œæˆ‘ä»¬å°±æ˜¯è¦ä½¿å¾—ä»Žéšè—å±‚å¾—åˆ°çš„å¯è§†å±‚v1ä¸ŽåŽŸæ¥çš„å¯è§†å±‚vå¦‚æžœä¸€æ ·ï¼Œé‚£ä¹ˆå¾—åˆ°çš„éšè—å±‚å°±æ˜¯å¯è§†å±‚å¦å¤–ä¸€ç§è¡¨è¾¾ï¼Œå› æ¤éšè—å±‚å¯ä»¥ä½œä¸ºå¯è§†å±‚è¾“å…¥æ•°æ®çš„ç‰¹å¾ï¼Œæ‰€ä»¥å®ƒå°±æ˜¯ä¸€ç§Deep Learningæ–¹æ³•ã€‚

å¦‚ä½•è®ç»ƒå‘¢ï¼Ÿä¹Ÿå°±æ˜¯å¯è§†å±‚èŠ‚ç‚¹å’ŒéšèŠ‚ç‚¹é—´çš„æƒå€¼æ€Žä¹ˆç¡®å®šå‘¢ï¼Ÿæˆ‘ä»¬éœ€è¦åšä¸€äº›æ•°å¦åˆ†æžã€‚ä¹Ÿå°±æ˜¯æ¨¡åž‹äº†ã€‚

è”åˆç»„æ€ï¼ˆjointconfigurationï¼‰çš„èƒ½é‡å¯ä»¥è¡¨ç¤ºä¸ºï¼š

è€ŒæŸä¸ªç»„æ€çš„è”åˆæ¦‚çŽ‡åˆ†å¸ƒå¯ä»¥é€šè¿‡Boltzmann åˆ†å¸ƒï¼ˆå’Œè¿™ä¸ªç»„æ€çš„èƒ½é‡ï¼‰æ¥ç¡®å®šï¼š

å› ä¸ºéšè—èŠ‚ç‚¹ä¹‹é—´æ˜¯æ¡ä»¶ç‹¬ç«‹çš„ï¼ˆå› ä¸ºèŠ‚ç‚¹ä¹‹é—´ä¸å˜åœ¨è¿žæŽ¥ï¼‰ï¼Œå³ï¼š

ç„¶åŽæˆ‘ä»¬å¯ä»¥æ¯”è¾ƒå®¹æ˜“ï¼ˆå¯¹ä¸Šå¼è¿›è¡Œå› ååˆ†è§£Factorizesï¼‰å¾—åˆ°åœ¨ç»™å®šå¯è§†å±‚vçš„åŸºç¡€ä¸Šï¼Œéšå±‚ç¬¬jä¸ªèŠ‚ç‚¹ä¸º1æˆ–è€…ä¸º0çš„æ¦‚çŽ‡ï¼š

åŒç†ï¼Œåœ¨ç»™å®šéšå±‚hçš„åŸºç¡€ä¸Šï¼Œå¯è§†å±‚ç¬¬iä¸ªèŠ‚ç‚¹ä¸º1æˆ–è€…ä¸º0çš„æ¦‚çŽ‡ä¹Ÿå¯ä»¥å®¹æ˜“å¾—åˆ°ï¼š

ç»™å®šä¸€ä¸ªæ»¡è¶³ç‹¬ç«‹åŒåˆ†å¸ƒçš„æ ·æœ¬é›†ï¼šD={v(1), v(2),â€¦, v(N)}ï¼Œæˆ‘ä»¬éœ€è¦å¦ä¹ å‚æ•°Î¸={W,a,b}ã€‚

æˆ‘ä»¬æœ€å¤§åŒ–ä»¥ä¸‹å¯¹æ•°ä¼¼ç„¶å‡½æ•°ï¼ˆæœ€å¤§ä¼¼ç„¶ä¼°è®¡ï¼šå¯¹äºŽæŸä¸ªæ¦‚çŽ‡æ¨¡åž‹ï¼Œæˆ‘ä»¬éœ€è¦é€‰æ‹©ä¸€ä¸ªå‚æ•°ï¼Œè®©æˆ‘ä»¬å½“å‰çš„è§‚æµ‹æ ·æœ¬çš„æ¦‚çŽ‡æœ€å¤§ï¼‰ï¼š

ä¹Ÿå°±æ˜¯å¯¹æœ€å¤§å¯¹æ•°ä¼¼ç„¶å‡½æ•°æ±‚å¯¼ï¼Œå°±å¯ä»¥å¾—åˆ°Læœ€å¤§æ—¶å¯¹åº”çš„å‚æ•°Wäº†ã€‚

å¦‚æžœï¼Œæˆ‘ä»¬æŠŠéšè—å±‚çš„å±‚æ•°å¢žåŠ ï¼Œæˆ‘ä»¬å¯ä»¥å¾—åˆ°Deep Boltzmann Machine(DBM)ï¼›å¦‚æžœæˆ‘ä»¬åœ¨é è¿‘å¯è§†å±‚çš„éƒ¨åˆ†ä½¿ç”¨è´å¶æ–¯ä¿¡å¿µç½‘ç»œï¼ˆå³æœ‰å‘å›¾æ¨¡åž‹ï¼Œå½“ç„¶è¿™é‡Œä¾ç„¶é™åˆ¶å±‚ä¸èŠ‚ç‚¹ä¹‹é—´æ²¡æœ‰é“¾æŽ¥ï¼‰ï¼Œè€Œåœ¨æœ€è¿œç¦»å¯è§†å±‚çš„éƒ¨åˆ†ä½¿ç”¨Restricted Boltzmann Machineï¼Œæˆ‘ä»¬å¯ä»¥å¾—åˆ°DeepBelief Netï¼ˆDBNï¼‰ã€‚

9.4ã€Deep Belief Networksæ·±ä¿¡åº¦ç½‘ç»œ

DBNsæ˜¯ä¸€ä¸ªæ¦‚çŽ‡ç”Ÿæˆæ¨¡åž‹ï¼Œä¸Žä¼ ç»Ÿçš„åˆ¤åˆ«æ¨¡åž‹çš„ç¥žç»ç½‘ç»œç›¸å¯¹ï¼Œç”Ÿæˆæ¨¡åž‹æ˜¯å»ºç«‹ä¸€ä¸ªè§‚å¯Ÿæ•°æ®å’Œæ ‡ç¾ä¹‹é—´çš„è”åˆåˆ†å¸ƒï¼Œå¯¹P(Observation|Label)å’ŒP(Label|Observation)éƒ½åšäº†è¯„ä¼°ï¼Œè€Œåˆ¤åˆ«æ¨¡åž‹ä»…ä»…è€Œå·²è¯„ä¼°äº†åŽè€…ï¼Œä¹Ÿå°±æ˜¯P(Label|Observation)ã€‚å¯¹äºŽåœ¨æ·±åº¦ç¥žç»ç½‘ç»œåº”ç”¨ä¼ ç»Ÿçš„BPç®—æ³•çš„æ—¶å€™ï¼ŒDBNsé‡åˆ°äº†ä»¥ä¸‹é—®é¢˜ï¼š

ï¼ˆ1ï¼‰éœ€è¦ä¸ºè®ç»ƒæä¾›ä¸€ä¸ªæœ‰æ ‡ç¾çš„æ ·æœ¬é›†ï¼›

ï¼ˆ2ï¼‰å¦ä¹ è¿‡ç¨‹è¾ƒæ…¢ï¼›

DBNsç”±å¤šä¸ªé™åˆ¶çŽ»å°”å…¹æ›¼æœºï¼ˆRestricted Boltzmann Machinesï¼‰å±‚ç»„æˆï¼Œä¸€ä¸ªå…¸åž‹çš„ç¥žç»ç½‘ç»œç±»åž‹å¦‚å›¾ä¸‰æ‰€ç¤ºã€‚è¿™äº›ç½‘ç»œè¢«â€œé™åˆ¶â€ä¸ºä¸€ä¸ªå¯è§†å±‚å’Œä¸€ä¸ªéšå±‚ï¼Œå±‚é—´å˜åœ¨è¿žæŽ¥ï¼Œä½†å±‚å†…çš„å•å…ƒé—´ä¸å˜åœ¨è¿žæŽ¥ã€‚éšå±‚å•å…ƒè¢«è®ç»ƒåŽ»æ•æ‰åœ¨å¯è§†å±‚è¡¨çŽ°å‡ºæ¥çš„é«˜é˜¶æ•°æ®çš„ç›¸å…³æ€§ã€‚

é¦–å…ˆï¼Œå…ˆä¸è€ƒè™‘æœ€é¡¶æž„æˆä¸€ä¸ªè”æƒ³è®°å¿†ï¼ˆassociative memoryï¼‰çš„ä¸¤å±‚ï¼Œä¸€ä¸ªDBNçš„è¿žæŽ¥æ˜¯é€šè¿‡è‡ªé¡¶å‘ä¸‹çš„ç”Ÿæˆæƒå€¼æ¥æŒ‡å¯¼ç¡®å®šçš„ï¼ŒRBMså°±åƒä¸€ä¸ªå»ºç‘å—ä¸€æ ·ï¼Œç›¸æ¯”ä¼ ç»Ÿå’Œæ·±åº¦åˆ†å±‚çš„sigmoidä¿¡å¿µç½‘ç»œï¼Œå®ƒèƒ½æ˜“äºŽè¿žæŽ¥æƒå€¼çš„å¦ä¹ ã€‚

æœ€å¼€å§‹çš„æ—¶å€™ï¼Œé€šè¿‡ä¸€ä¸ªéžç›‘ç£è´ªå©ªé€å±‚æ–¹æ³•åŽ»é¢„è®ç»ƒèŽ·å¾—ç”Ÿæˆæ¨¡åž‹çš„æƒå€¼ï¼Œéžç›‘ç£è´ªå©ªé€å±‚æ–¹æ³•è¢«Hintonè¯æ˜Žæ˜¯æœ‰æ•ˆçš„ï¼Œå¹¶è¢«å…¶ç§°ä¸ºå¯¹æ¯”åˆ†æ§ï¼ˆcontrastive divergenceï¼‰ã€‚

åœ¨è¿™ä¸ªè®ç»ƒé˜¶æ®µï¼Œåœ¨å¯è§†å±‚ä¼šäº§ç”Ÿä¸€ä¸ªå‘é‡vï¼Œé€šè¿‡å®ƒå°†å€¼ä¼ é€’åˆ°éšå±‚ã€‚åè¿‡æ¥ï¼Œå¯è§†å±‚çš„è¾“å…¥ä¼šè¢«éšæœºçš„é€‰æ‹©ï¼Œä»¥å°è¯•åŽ»é‡æž„åŽŸå§‹çš„è¾“å…¥ä¿¡å·ã€‚æœ€åŽï¼Œè¿™äº›æ–°çš„å¯è§†çš„ç¥žç»å…ƒæ¿€æ´»å•å…ƒå°†å‰å‘ä¼ é€’é‡æž„éšå±‚æ¿€æ´»å•å…ƒï¼ŒèŽ·å¾—hï¼ˆåœ¨è®ç»ƒè¿‡ç¨‹ä¸ï¼Œé¦–å…ˆå°†å¯è§†å‘é‡å€¼æ˜ å°„ç»™éšå•å…ƒï¼›ç„¶åŽå¯è§†å•å…ƒç”±éšå±‚å•å…ƒé‡å»ºï¼›è¿™äº›æ–°å¯è§†å•å…ƒå†æ¬¡æ˜ å°„ç»™éšå•å…ƒï¼Œè¿™æ ·å°±èŽ·å–æ–°çš„éšå•å…ƒã€‚æ‰§è¡Œè¿™ç§åå¤æ¥éª¤å«åšå‰å¸ƒæ–¯é‡‡æ ·ï¼‰ã€‚è¿™äº›åŽé€€å’Œå‰è¿›çš„æ¥éª¤å°±æ˜¯æˆ‘ä»¬ç†Ÿæ‚‰çš„Gibbsé‡‡æ ·ï¼Œè€Œéšå±‚æ¿€æ´»å•å…ƒå’Œå¯è§†å±‚è¾“å…¥ä¹‹é—´çš„ç›¸å…³æ€§å·®åˆ«å°±ä½œä¸ºæƒå€¼æ›´æ–°çš„ä¸»è¦ä¾æ®ã€‚

è®ç»ƒæ—¶é—´ä¼šæ˜¾è‘—çš„å‡å°‘ï¼Œå› ä¸ºåªéœ€è¦å•ä¸ªæ¥éª¤å°±å¯ä»¥æŽ¥è¿‘æœ€å¤§ä¼¼ç„¶å¦ä¹ ã€‚å¢žåŠ è¿›ç½‘ç»œçš„æ¯ä¸€å±‚éƒ½ä¼šæ”¹è¿›è®ç»ƒæ•°æ®çš„å¯¹æ•°æ¦‚çŽ‡ï¼Œæˆ‘ä»¬å¯ä»¥ç†è§£ä¸ºè¶Šæ¥è¶ŠæŽ¥è¿‘èƒ½é‡çš„çœŸå®žè¡¨è¾¾ã€‚è¿™ä¸ªæœ‰æ„ä¹‰çš„æ‹“å±•ï¼Œå’Œæ— æ ‡ç¾æ•°æ®çš„ä½¿ç”¨ï¼Œæ˜¯ä»»ä½•ä¸€ä¸ªæ·±åº¦å¦ä¹ åº”ç”¨çš„å†³å®šæ€§çš„å› ç´ ã€‚

åœ¨æœ€é«˜ä¸¤å±‚ï¼Œæƒå€¼è¢«è¿žæŽ¥åˆ°ä¸€èµ·ï¼Œè¿™æ ·æ›´ä½Žå±‚çš„è¾“å‡ºå°†ä¼šæä¾›ä¸€ä¸ªå‚è€ƒçš„çº¿ç´¢æˆ–è€…å…³è”ç»™é¡¶å±‚ï¼Œè¿™æ ·é¡¶å±‚å°±ä¼šå°†å…¶è”ç³»åˆ°å®ƒçš„è®°å¿†å†…å®¹ã€‚è€Œæˆ‘ä»¬æœ€å…³å¿ƒçš„ï¼Œæœ€åŽæƒ³å¾—åˆ°çš„å°±æ˜¯åˆ¤åˆ«æ€§èƒ½ï¼Œä¾‹å¦‚åˆ†ç±»ä»»åŠ¡é‡Œé¢ã€‚

åœ¨é¢„è®ç»ƒåŽï¼ŒDBNå¯ä»¥é€šè¿‡åˆ©ç”¨å¸¦æ ‡ç¾æ•°æ®ç”¨BPç®—æ³•åŽ»å¯¹åˆ¤åˆ«æ€§èƒ½åšè°ƒæ•´ã€‚åœ¨è¿™é‡Œï¼Œä¸€ä¸ªæ ‡ç¾é›†å°†è¢«é™„åŠ åˆ°é¡¶å±‚ï¼ˆæŽ¨å¹¿è”æƒ³è®°å¿†ï¼‰ï¼Œé€šè¿‡ä¸€ä¸ªè‡ªä¸‹å‘ä¸Šçš„ï¼Œå¦ä¹ åˆ°çš„è¯†åˆ«æƒå€¼èŽ·å¾—ä¸€ä¸ªç½‘ç»œçš„åˆ†ç±»é¢ã€‚è¿™ä¸ªæ€§èƒ½ä¼šæ¯”å•çº¯çš„BPç®—æ³•è®ç»ƒçš„ç½‘ç»œå¥½ã€‚è¿™å¯ä»¥å¾ˆç›´è§‚çš„è§£é‡Šï¼ŒDBNsçš„BPç®—æ³•åªéœ€è¦å¯¹æƒå€¼å‚æ•°ç©ºé—´è¿›è¡Œä¸€ä¸ªå±€éƒ¨çš„æœç´¢ï¼Œè¿™ç›¸æ¯”å‰å‘ç¥žç»ç½‘ç»œæ¥è¯´ï¼Œè®ç»ƒæ˜¯è¦å¿«çš„ï¼Œè€Œä¸”æ”¶æ•›çš„æ—¶é—´ä¹Ÿå°‘ã€‚

DBNsçš„çµæ´»æ€§ä½¿å¾—å®ƒçš„æ‹“å±•æ¯”è¾ƒå®¹æ˜“ã€‚ä¸€ä¸ªæ‹“å±•å°±æ˜¯å·ç§¯DBNsï¼ˆConvolutional Deep Belief Networks(CDBNs)ï¼‰ã€‚DBNså¹¶æ²¡æœ‰è€ƒè™‘åˆ°å›¾åƒçš„2ç»´ç»“æž„ä¿¡æ¯ï¼Œå› ä¸ºè¾“å…¥æ˜¯ç®€å•çš„ä»Žä¸€ä¸ªå›¾åƒçŸ©é˜µä¸€ç»´å‘é‡åŒ–çš„ã€‚è€ŒCDBNså°±æ˜¯è€ƒè™‘åˆ°äº†è¿™ä¸ªé—®é¢˜ï¼Œå®ƒåˆ©ç”¨é‚»åŸŸåƒç´ çš„ç©ºåŸŸå…³ç³»ï¼Œé€šè¿‡ä¸€ä¸ªç§°ä¸ºå·ç§¯RBMsçš„æ¨¡åž‹åŒºè¾¾åˆ°ç”Ÿæˆæ¨¡åž‹çš„å˜æ¢ä¸å˜æ€§ï¼Œè€Œä¸”å¯ä»¥å®¹æ˜“å¾—å˜æ¢åˆ°é«˜ç»´å›¾åƒã€‚DBNså¹¶æ²¡æœ‰æ˜Žç¡®åœ°å¤„ç†å¯¹è§‚å¯Ÿå˜é‡çš„æ—¶é—´è”ç³»çš„å¦ä¹ ä¸Šï¼Œè™½ç„¶ç›®å‰å·²ç»æœ‰è¿™æ–¹é¢çš„ç ”ç©¶ï¼Œä¾‹å¦‚å †å æ—¶é—´RBMsï¼Œä»¥æ¤ä¸ºæŽ¨å¹¿ï¼Œæœ‰åºåˆ—å¦ä¹ çš„dubbed temporal convolutionmachinesï¼Œè¿™ç§åºåˆ—å¦ä¹ çš„åº”ç”¨ï¼Œç»™è¯éŸ³ä¿¡å·å¤„ç†é—®é¢˜å¸¦æ¥äº†ä¸€ä¸ªè®©äººæ¿€åŠ¨çš„æœªæ¥ç ”ç©¶æ–¹å‘ã€‚

ç›®å‰ï¼Œå’ŒDBNsæœ‰å…³çš„ç ”ç©¶åŒ…æ‹¬å †å è‡ªåŠ¨ç¼–ç å™¨ï¼Œå®ƒæ˜¯é€šè¿‡ç”¨å †å è‡ªåŠ¨ç¼–ç å™¨æ¥æ›¿æ¢ä¼ ç»ŸDBNsé‡Œé¢çš„RBMsã€‚è¿™å°±ä½¿å¾—å¯ä»¥é€šè¿‡åŒæ ·çš„è§„åˆ™æ¥è®ç»ƒäº§ç”Ÿæ·±åº¦å¤šå±‚ç¥žç»ç½‘ç»œæž¶æž„ï¼Œä½†å®ƒç¼ºå°‘å±‚çš„å‚æ•°åŒ–çš„ä¸¥æ ¼è¦æ±‚ã€‚ä¸ŽDBNsä¸åŒï¼Œè‡ªåŠ¨ç¼–ç å™¨ä½¿ç”¨åˆ¤åˆ«æ¨¡åž‹ï¼Œè¿™æ ·è¿™ä¸ªç»“æž„å°±å¾ˆéš¾é‡‡æ ·è¾“å…¥é‡‡æ ·ç©ºé—´ï¼Œè¿™å°±ä½¿å¾—ç½‘ç»œæ›´éš¾æ•æ‰å®ƒçš„å†…éƒ¨è¡¨è¾¾ã€‚ä½†æ˜¯ï¼Œé™å™ªè‡ªåŠ¨ç¼–ç å™¨å´èƒ½å¾ˆå¥½çš„é¿å…è¿™ä¸ªé—®é¢˜ï¼Œå¹¶ä¸”æ¯”ä¼ ç»Ÿçš„DBNsæ›´ä¼˜ã€‚å®ƒé€šè¿‡åœ¨è®ç»ƒè¿‡ç¨‹æ·»åŠ éšæœºçš„æ±¡æŸ“å¹¶å †å äº§ç”Ÿåœºæ³›åŒ–æ€§èƒ½ã€‚è®ç»ƒå•ä¸€çš„é™å™ªè‡ªåŠ¨ç¼–ç å™¨çš„è¿‡ç¨‹å’ŒRBMsè®ç»ƒç”Ÿæˆæ¨¡åž‹çš„è¿‡ç¨‹ä¸€æ ·ã€‚

| åã€æ€»ç»“ä¸Žå±•æœ›

1ï¼‰Deep learningæ€»ç»“

æ·±åº¦å¦ä¹ æ˜¯å…³äºŽè‡ªåŠ¨å¦ä¹ è¦å»ºæ¨¡çš„æ•°æ®çš„æ½œåœ¨ï¼ˆéšå«ï¼‰åˆ†å¸ƒçš„å¤šå±‚ï¼ˆå¤æ‚ï¼‰è¡¨è¾¾çš„ç®—æ³•ã€‚æ¢å¥è¯æ¥è¯´ï¼Œæ·±åº¦å¦ä¹ ç®—æ³•è‡ªåŠ¨çš„æå–åˆ†ç±»éœ€è¦çš„ä½Žå±‚æ¬¡æˆ–è€…é«˜å±‚æ¬¡ç‰¹å¾ã€‚é«˜å±‚æ¬¡ç‰¹å¾ï¼Œä¸€æ˜¯æŒ‡è¯¥ç‰¹å¾å¯ä»¥åˆ†çº§ï¼ˆå±‚æ¬¡ï¼‰åœ°ä¾èµ–å…¶ä»–ç‰¹å¾ï¼Œä¾‹å¦‚ï¼šå¯¹äºŽæœºå™¨è§†è§‰ï¼Œæ·±åº¦å¦ä¹ ç®—æ³•ä»ŽåŽŸå§‹å›¾åƒåŽ»å¦ä¹ å¾—åˆ°å®ƒçš„ä¸€ä¸ªä½Žå±‚æ¬¡è¡¨è¾¾ï¼Œä¾‹å¦‚è¾¹ç¼˜æ£€æµ‹å™¨ï¼Œå°æ³¢æ»¤æ³¢å™¨ç‰ï¼Œç„¶åŽåœ¨è¿™äº›ä½Žå±‚æ¬¡è¡¨è¾¾çš„åŸºç¡€ä¸Šå†å»ºç«‹è¡¨è¾¾ï¼Œä¾‹å¦‚è¿™äº›ä½Žå±‚æ¬¡è¡¨è¾¾çš„çº¿æ€§æˆ–è€…éžçº¿æ€§ç»„åˆï¼Œç„¶åŽé‡å¤è¿™ä¸ªè¿‡ç¨‹ï¼Œæœ€åŽå¾—åˆ°ä¸€ä¸ªé«˜å±‚æ¬¡çš„è¡¨è¾¾ã€‚

Deep learningèƒ½å¤Ÿå¾—åˆ°æ›´å¥½åœ°è¡¨ç¤ºæ•°æ®çš„featureï¼ŒåŒæ—¶ç”±äºŽæ¨¡åž‹çš„å±‚æ¬¡ã€å‚æ•°å¾ˆå¤šï¼Œcapacityè¶³å¤Ÿï¼Œå› æ¤ï¼Œæ¨¡åž‹æœ‰èƒ½åŠ›è¡¨ç¤ºå¤§è§„æ¨¡æ•°æ®ï¼Œæ‰€ä»¥å¯¹äºŽå›¾åƒã€è¯éŸ³è¿™ç§ç‰¹å¾ä¸æ˜Žæ˜¾ï¼ˆéœ€è¦æ‰‹å·¥è®¾è®¡ä¸”å¾ˆå¤šæ²¡æœ‰ç›´è§‚ç‰©ç†å«ä¹‰ï¼‰çš„é—®é¢˜ï¼Œèƒ½å¤Ÿåœ¨å¤§è§„æ¨¡è®ç»ƒæ•°æ®ä¸Šå–å¾—æ›´å¥½çš„æ•ˆæžœã€‚æ¤å¤–ï¼Œä»Žæ¨¡å¼è¯†åˆ«ç‰¹å¾å’Œåˆ†ç±»å™¨çš„è§’åº¦ï¼Œdeep learningæ¡†æž¶å°†featureå’Œåˆ†ç±»å™¨ç»“åˆåˆ°ä¸€ä¸ªæ¡†æž¶ä¸ï¼Œç”¨æ•°æ®åŽ»å¦ä¹ featureï¼Œåœ¨ä½¿ç”¨ä¸å‡å°‘äº†æ‰‹å·¥è®¾è®¡featureçš„å·¨å¤§å·¥ä½œé‡ï¼ˆè¿™æ˜¯ç›®å‰å·¥ä¸šç•Œå·¥ç¨‹å¸ˆä»˜å‡ºåŠªåŠ›æœ€å¤šçš„æ–¹é¢ï¼‰ï¼Œå› æ¤ï¼Œä¸ä»…ä»…æ•ˆæžœå¯ä»¥æ›´å¥½ï¼Œè€Œä¸”ï¼Œä½¿ç”¨èµ·æ¥ä¹Ÿæœ‰å¾ˆå¤šæ–¹ä¾¿ä¹‹å¤„ï¼Œå› æ¤ï¼Œæ˜¯ååˆ†å€¼å¾—å…³æ³¨çš„ä¸€å¥—æ¡†æž¶ï¼Œæ¯ä¸ªåšMLçš„äººéƒ½åº”è¯¥å…³æ³¨äº†è§£ä¸€ä¸‹ã€‚

å½“ç„¶ï¼Œdeep learningæœ¬èº«ä¹Ÿä¸æ˜¯å®Œç¾Žçš„ï¼Œä¹Ÿä¸æ˜¯è§£å†³ä¸–é—´ä»»ä½•MLé—®é¢˜çš„åˆ©å™¨ï¼Œä¸åº”è¯¥è¢«æ”¾å¤§åˆ°ä¸€ä¸ªæ— æ‰€ä¸èƒ½çš„ç¨‹åº¦ã€‚

2ï¼‰Deep learningæœªæ¥

æ·±åº¦å¦ä¹ ç›®å‰ä»æœ‰å¤§é‡å·¥ä½œéœ€è¦ç ”ç©¶ã€‚ç›®å‰çš„å…³æ³¨ç‚¹è¿˜æ˜¯ä»Žæœºå™¨å¦ä¹ çš„é¢†åŸŸå€Ÿé‰´ä¸€äº›å¯ä»¥åœ¨æ·±åº¦å¦ä¹ ä½¿ç”¨çš„æ–¹æ³•ç‰¹åˆ«æ˜¯é™ç»´é¢†åŸŸã€‚ä¾‹å¦‚ï¼šç›®å‰ä¸€ä¸ªå·¥ä½œå°±æ˜¯ç¨€ç–ç¼–ç ï¼Œé€šè¿‡åŽ‹ç¼©æ„ŸçŸ¥ç†è®ºå¯¹é«˜ç»´æ•°æ®è¿›è¡Œé™ç»´ï¼Œä½¿å¾—éžå¸¸å°‘çš„å…ƒç´ çš„å‘é‡å°±å¯ä»¥ç²¾ç¡®çš„ä»£è¡¨åŽŸæ¥çš„é«˜ç»´ä¿¡å·ã€‚å¦ä¸€ä¸ªä¾‹åå°±æ˜¯åŠç›‘ç£æµè¡Œå¦ä¹ ï¼Œé€šè¿‡æµ‹é‡è®ç»ƒæ ·æœ¬çš„ç›¸ä¼¼æ€§ï¼Œå°†é«˜ç»´æ•°æ®çš„è¿™ç§ç›¸ä¼¼æ€§æŠ•å½±åˆ°ä½Žç»´ç©ºé—´ã€‚å¦å¤–ä¸€ä¸ªæ¯”è¾ƒé¼“èˆžäººå¿ƒçš„æ–¹å‘å°±æ˜¯evolutionary programming approachesï¼ˆé—ä¼ ç¼–ç¨‹æ–¹æ³•ï¼‰ï¼Œå®ƒå¯ä»¥é€šè¿‡æœ€å°åŒ–å·¥ç¨‹èƒ½é‡åŽ»è¿›è¡Œæ¦‚å¿µæ€§è‡ªé€‚åº”å¦ä¹ å’Œæ”¹å˜æ ¸å¿ƒæž¶æž„ã€‚

Deep learningè¿˜æœ‰å¾ˆå¤šæ ¸å¿ƒçš„é—®é¢˜éœ€è¦è§£å†³ï¼š

ï¼ˆ1ï¼‰å¯¹äºŽä¸€ä¸ªç‰¹å®šçš„æ¡†æž¶ï¼Œå¯¹äºŽå¤šå°‘ç»´çš„è¾“å…¥å®ƒå¯ä»¥è¡¨çŽ°å¾—è¾ƒä¼˜ï¼ˆå¦‚æžœæ˜¯å›¾åƒï¼Œå¯èƒ½æ˜¯ä¸Šç™¾ä¸‡ç»´ï¼‰ï¼Ÿ

ï¼ˆ2ï¼‰å¯¹æ•æ‰çŸæ—¶æˆ–è€…é•¿æ—¶é—´çš„æ—¶é—´ä¾èµ–ï¼Œå“ªç§æž¶æž„æ‰æ˜¯æœ‰æ•ˆçš„ï¼Ÿ

ï¼ˆ4ï¼‰æœ‰ä»€ä¹ˆæ£ç¡®çš„æœºç†å¯ä»¥åŽ»å¢žå¼ºä¸€ä¸ªç»™å®šçš„æ·±åº¦å¦ä¹ æž¶æž„ï¼Œä»¥æ”¹è¿›å…¶é²æ£’æ€§å’Œå¯¹æ‰æ›²å’Œæ•°æ®ä¸¢å¤±çš„ä¸å˜æ€§ï¼Ÿ

ï¼ˆ5ï¼‰æ¨¡åž‹æ–¹é¢æ˜¯å¦æœ‰å…¶ä»–æ›´ä¸ºæœ‰æ•ˆä¸”æœ‰ç†è®ºä¾æ®çš„æ·±åº¦æ¨¡åž‹å¦ä¹ ç®—æ³•ï¼Ÿ

æŽ¢ç´¢æ–°çš„ç‰¹å¾æå–æ¨¡åž‹æ˜¯å€¼å¾—æ·±å…¥ç ”ç©¶çš„å†…å®¹ã€‚æ¤å¤–æœ‰æ•ˆçš„å¯å¹¶è¡Œè®ç»ƒç®—æ³•ä¹Ÿæ˜¯å€¼å¾—ç ”ç©¶çš„ä¸€ä¸ªæ–¹å‘ã€‚å½“å‰åŸºäºŽæœ€å°æ‰¹å¤„ç†çš„éšæœºæ¢¯åº¦ä¼˜åŒ–ç®—æ³•å¾ˆéš¾åœ¨å¤šè®¡ç®—æœºä¸è¿›è¡Œå¹¶è¡Œè®ç»ƒã€‚é€šå¸¸åŠžæ³•æ˜¯åˆ©ç”¨å›¾å½¢å¤„ç†å•å…ƒåŠ é€Ÿå¦ä¹ è¿‡ç¨‹ã€‚ç„¶è€Œå•ä¸ªæœºå™¨GPUå¯¹å¤§è§„æ¨¡æ•°æ®è¯†åˆ«æˆ–ç›¸ä¼¼ä»»åŠ¡æ•°æ®é›†å¹¶ä¸é€‚ç”¨ã€‚åœ¨æ·±åº¦å¦ä¹ åº”ç”¨æ‹“å±•æ–¹é¢ï¼Œå¦‚ä½•åˆç†å……åˆ†åˆ©ç”¨æ·±åº¦å¦ä¹ åœ¨å¢žå¼ºä¼ ç»Ÿå¦ä¹ ç®—æ³•çš„æ€§èƒ½ä»æ˜¯ç›®å‰å„é¢†åŸŸçš„ç ”ç©¶é‡ç‚¹ã€‚

æœ¬æ–‡è½¬è‡ªé˜…é¢ç§‘æŠ€ä¸“æ³¨æ·±åº¦å¦ä¹ å’ŒåµŒå…¥å¼è§†è§‰çš„äººå·¥æ™ºèƒ½å¹³å°ï¼Œå¦‚éœ€è½¬è½½è¯·è”ç³»åŽŸä½œè€…ã€‚

Connectors overmolding

Overmolding the Connectors offers significant opportunities for cable improvements with higher pull strength and waterproof issue for those parts, which without these characteristic by conventional types.Such as jst jwpf connector. Just be free to contact us if you need any wire-harness solutions or partner for your products. Our professional and experienced team would support you by satisfied skill and service.

Molded Connectors,Molded Waterproof Connector,Molded Straight Wire Connector,Jst Jwpf Connector

ETOP WIREHARNESS LIMITED , https://www.etopwireharness.com