This article is the notes part of teacher Wu Enda's deep learning course .
Author: Huang Haiguang
Main writers : Huang Haiguang, Lin Xingmu (all the manuscripts of the fourth lesson, the first two weeks of the fifth lesson, and the first three quarters of the third week), Zhu Yansen: (all the manuscripts of the third lesson), He Zhiyao (the third week of the fifth lesson), Wang Xiang, Hu Hanwen, Yu Xiao, Zheng Hao, Li Huaisong, Zhu Yuepeng, Chen Weihe, Cao Yue, Lu Haoxiang, Qiu Muchen, Tang Tianze, Zhang Hao, Chen Zhihao, You Ren, Ze Lin, Shen Weichen, Jia Hongshun, Shi Chao, Chen Zhe, Zhao Yifan , Hu Xiaoyang, Duan Xi, Yu Chong, Zhang Xinqian
Participating editors : Huang Haiguang, Chen Kangkai, Shi Qinglu, Zhong Boyan, Xiang Wei, Yan Fenglong, Liu Cheng, He Zhiyao, Duan Xi, Chen Yao, Lin Jiayong, Wang Xiang, Xie Shichen, Jiang Peng
Note: Notes, assignments (including data, original assignment files), and videos are all downloaded in github.
I will successively post the course notes on the public account "Machine Learning Beginners", so stay tuned.
The first course Neural Networks and Deep Learning (Neural Networks and Deep Learning)
Week 1: Introduction to Deep Learning
The first video mainly talks about what deep learning is and what deep learning can do. The following are the original words of Teacher Wu Enda:
Deep learning has changed traditional Internet services, such as Internet search and advertising. But deep learning has also enabled many new products and companies to help people in many ways to get better health care.
One aspect that deep learning does very well is to read X-ray images, to personalized education in life, to precision agriculture, and even to driving cars and other aspects. If you want to learn these tools of deep learning and apply them to these suffocating operations, this course will help you do that. When you complete this series of special courses on cousera , you will be able to continue the road of deep learning more confidently. In the next ten years, I think all of us have the opportunity to create an amazing world and society. This is the power of AI (artificial intelligence). I hope you can play an important role in creating an AI (artificial intelligence) society.
I think AI is the latest of electricity, about a hundred years ago, changed the electrification of our society every major industry, from the transportation industry to aspects of manufacturing, health care, communications and so on, I think now we see AI obvious Surprising energy has brought about the same tremendous change. Obviously, among the various branches of AI, the fastest growing is deep learning. So now, deep learning is a technique that is very popular in the technological world.
Through this course, as well as the courses following this course, you will acquire and master those skills.
Here is what you will learn:
In this series of cousera , also called special courses, in the first course ( neural networks and deep learning ), you will learn the basics of neural networks, and you will learn neural networks and deep learning. This course will last for four weeks. Each course in the special course will last 2 to 4 weeks.
But in the first course, you will learn how to build neural networks (including a deep neural network) and how to train them on data. At the end of this course, you will use a deep neural network to identify cats.
For some reason, the first course will use cats as object recognition.
Next in the second course, we will use three weeks. You will practice deep learning, learn how to build a neural network rigorously, how to really make it perform well, so you will learn hyperparameter adjustment, regularization, diagnostic bias and variance, and some advanced optimization algorithms, such as Momentum and Adam algorithms. Based on the way you build a network like black magic. The second course only has three weeks of study time.
In the third course, we will use two weeks to learn how to structure your machine learning project. Facts have proved that the strategy of building a machine learning system has changed the mistakes of deep learning.
For example: the way you split the data, split into training set, comparison set or changed validation set, and test set, changed the error of deep learning.
So what is the best practice?
Different contributions from your training set and test set have a great influence in deep learning, so what should you do?
If you have heard of end-to-end deep learning, you will also learn more in the third course, and then understand whether you need to use it. The information in the third lesson is relatively unique, and I will share it with you . We have learned about the establishment and improvement of many deep learning problems in all the hot fields. These popular materials today are not taught by most universities in their deep learning classrooms. I think it will provide you with help to make the deep learning system work better.
In the fourth course, we will mention Convolutional Neural Networks ( CNN(s) ), which are often used in the image field. You will learn how to build such a model in the fourth course.
Finally, in the fifth course, you will learn sequence models, how to apply them to natural language processing, and other problems.
The model of sequence model includes cyclic neural network ( RNN ), the full name is long short-term memory network ( LSTM ). You will understand the meaning of the period in Course 5 and have the ability to apply it to natural language processing ( NLP ) problems.
In short, you will learn about these models in Lesson 5 and be able to apply them to sequence data. For example, natural language is a sequence of words. You will also be able to understand how these models can be applied to speech recognition or composition and other issues.
Therefore, through these courses, you will learn these tools of deep learning, you will be able to use them to do some magical things, and to improve your career.
1.2 What is a neural network? (What is a Neural Network)
We often use the term deep learning to refer to the process of training neural networks. Sometimes it refers to very large-scale neural network training. So what exactly is a neural network? In this video, I will explain some intuitive basics.
Let's start with an example of housing price prediction.
Suppose you have a data set that contains information about six houses. So, you know how many square feet or square meters the house is, and you know the price of the house. At this time, you want to fit a function that predicts housing prices based on the area of the house.
If you are familiar with linear regression, you might say: "Well, let's fit a straight line with these data." Then you might get such a straight line.
But it s a bit strange that you may have also discovered that we know that prices will never be negative. Therefore, in order to replace a straight line that might make the price negative, we bend the straight line a bit and let it end at zero. This thick blue line is ultimately your function, used to predict the price based on the size of the house. Some parts are zero, and the straight part fits well. You might think that this function only fits house prices.
As a neural network, this is almost probably the simplest neural network. We use the area of the house as the input of the neural network (let's call it), through a node (a small circle), and finally output the price (which we use to denote). In fact, this small circle is a single neuron. Then your network implements the function of the function on the left.
In the literature about neural networks, you often see this function. It starts from approaching zero and then becomes a straight line. This function is called the ReLU activation function, and its full name is Rectified Linear Unit . Rectify (correction) can be understood as, this is why you get a function of this shape.
You don't have to worry about not understanding the ReLU function now, you will see it again later in this course.
If this is a single neuron network, regardless of scale, it is formed by superimposing these individual neurons together. If you think of these neurons as separate Lego blocks, you can build a larger neural network by building blocks.
Let's take a look at an example. We not only use the area of a house to predict its price, but now you have some other characteristics about the house, such as the number of bedrooms. Perhaps there is a very important factor, and the number of households will also Affecting the price of the house, can this house live in a family or a family of four or five people? And this is really based on the size of the house and the number of bedrooms that really determines whether a house can fit the size of your family.
On another topic, you may know that postal codes may be used as a feature to tell you the degree of pedestrianization. For example, is the neighborhood highly pedestrianized, whether you can walk to the grocery store or school, and whether you need to drive a car. Some people like to live in areas that are predominantly pedestrian, and according to the postal code, it is also related to the degree of wealth (this is the case in the United States). But in other countries, it may also reflect how good the nearby schools are.
Each small circle drawn on the graph can be a part of ReLU , that is, a modified linear unit, or other slightly non-linear function. Based on the size of the house and the number of bedrooms, the family population can be estimated, and based on the zip code, the degree of pedestrianization or the quality of the school can be estimated. In the end, you might think this way, these determine how much money people are willing to spend.
For a house, these are all things that are closely related to it. In this scenario, the family size, the degree of walkability, and the quality of the school can all help you predict the price of a house. Take this as an example. It is all the four inputs and the price you are trying to predict. By adding these individual neurons together, we have a slightly larger neural network. This shows the magic of the neural network, although I have described a neural network, it can require you to get the size of the house, the degree of pedestrianization and the quality of the school, or other factors that affect the price.
Part of the magic of neural networks is that when you implement it, all you have to do is input and you can get output. Because it can calculate the number of samples in your training set and all the intermediate processes by itself. So, what you actually have to do is: there are four inputs of the neural network, the input characteristics may be the size of the house, the number of bedrooms, the zip code and the degree of wealth of the area. Given the characteristics of these inputs, the job of the neural network is to predict the corresponding price. Also notice that these are called hidden unit circles. In a neural network, each of them obtains its own input from the four features of the input. For example, the first node represents the family population, and the family population only depends on and Features, in other words, in a neural network, you decide what you want in this node, and then use all four inputs to calculate what you want. Therefore, we say that the input layer and the middle layer are closely connected.
It is worth noting that the neural network has given enough data about the sum and enough training samples for the sum. Neural networks are very good at calculating accurate mapping functions from to.
This is a basic neural network. You may find that your own neural network is so effective and powerful in a supervised learning environment, that is to say, as long as you try to input one, it can be mapped into it, just like we saw in the example of housing price prediction just now. effect.
In the next video, let's review more examples of supervised learning. Some examples will make you think your network will be very useful, and you will use it in practice as well.
1.3 Supervised Learning with Neural Networks
There are many types of neural networks. Considering their use effects, some are just right to use, but the facts show that almost all the economic value created by neural networks so far cannot be separated from a kind of machine learning called supervised learning. Category, let's take a look at an example.
In supervised learning, you have some inputs, and you want to learn a function to map to some outputs. For example, in the housing price prediction example we mentioned earlier, you only need to input some characteristics of the house and try to output or estimate the price. We cite some other examples to illustrate that neural networks have been efficiently applied to other places.
One of the areas where deep learning is most profitable today is online advertising. This may not be the most inspiring, but it is really profitable. Specifically, by entering information about an advertisement on the website, because the user's information is also entered, the website will consider whether to show you an advertisement.
Neural networks are already very good at predicting whether you will click on this ad. By showing users the ads that are most likely to click on, this is an application of neural networks that incredibly increase profits in many companies. Because of this ability to show you the ads you are most likely to click, and this change in click behavior will directly affect the revenue of some large online advertising companies.
Computer vision has also made great progress in the past few years, thanks to deep learning. You can input an image, and then want to output an index, ranging from 1 to 1000 to try to tell you this photo, it may be, for example, any of 1000 different images, so you may choose to use it Come tag the photos.
The recent advances in deep learning in speech recognition are also very exciting. You can now feed audio clips into a neural network and let it output text records. Thanks to deep learning, machine translation has also made great progress. You can use neural networks to input English sentences, and then output a Chinese sentence.
In autonomous driving technology, you can input an image, just like an information radar showing what is in front of the car. Based on this, you can train a neural network to tell the specific position of the car on the road. This is how the neural network is automatically A key component in the driving system.
Then the deep learning system can already create so much value. Through intelligent choice, what to do and what to do, to target your current problems, and then fit the supervised learning part, which is often a larger system, such as autonomous driving. This shows that the slightly different types of neural networks can also produce different applications. For example, when applied to the real estate field we mentioned in the previous video, don't we use a common standard neural network architecture?
Perhaps it is a relatively standard neural network for real estate and online advertising, as we have seen before. For image applications, we often use Convolutional Neural Network on neural networks , usually abbreviated as CNN . For sequence data, such as audio, there is a time component. As time goes by, the audio is played out, so audio is the most natural expression. As a one-dimensional time series (two English terms one-dimensional time series/temporal sequence ). For sequence data, RNN , a recurrent neural network ( Recurrent Neural Network ), language, English and Chinese alphabets or words are often used It appears one by one, so the language is also the most natural sequence data, so more complex versions of RNNs are often used in these applications.
For more complex applications such as autonomous driving, you have a picture that may show more CNN convolutional neural network structures. The radar information in it is completely different. You may have a more customized one, or some more complex ones. Hybrid neural network structure. So in order to explain more specifically what the standard CNN and RNN structures are, you may have seen such pictures in the literature. This is a standard neural network.
You may also have seen such a picture, this is an example of a convolutional neural network.
We will understand the principle and implementation of this picture in a later course. Convolutional networks ( CNN ) are usually used for image data.
You may also see pictures like this, and you will learn how to implement it in future courses.
Recurrent Neural Network ( RNN ) is very suitable for this one-dimensional sequence, and the data may be a time component.
You may also have heard of the application of machine learning to structured and unstructured data. Structured data means the basic database of data. For example, in housing price forecasting, you may have a database with several columns of data that tell you the size and number of bedrooms. This is structured data. Or predict whether the user will click on the advertisement, you may get information about the user, such as age and some information about the advertisement, and then classify and label your prediction. This is structured data, which means each feature, such as the size of the house. The number of bedrooms, or the age of a user, has a good definition.
On the contrary, unstructured data refers to content such as audio, raw audio, or images or text that you want to recognize. The feature here may be the pixel value in the image or a single word in the text.
From historical experience, it is difficult to process unstructured data. Compared with structured data, it is difficult for computers to understand unstructured data. Humans have evolved to be very good at understanding audio signals and images. Text is a more recent one. Invention, but people are really good at deciphering unstructured data.
The rise of neural networks is one of the most exciting things in this way. Thanks to deep learning and neural networks, computers are now able to better interpret unstructured data. This is a result compared to a few years ago, which created for us Opportunity. Many new and exciting applications are used, speech recognition, image recognition, natural language word processing, and maybe even more than two or three years ago. Because people are born with the ability to understand unstructured data, you may have heard that neural networks are more successful in media unstructured data. When a neural network recognizes a cat, it s really cool. We all know that. What it means.
But the results also show that the creation of many short-term economic value of neural networks is also based on structured data. For example, better advertising systems, better profit recommendations, and better ability to handle big data. Many companies have to make accurate predictions based on neural networks.
So in this course, many of the techniques we will discuss will be applicable, whether it is for structured or unstructured data. In order to explain the algorithm, we will draw a little more picture in the example of using unstructured data, but as you think, through the use of neural networks in your own team, I hope you can find that the neural network algorithm is effective for structured and unstructured data. Structured data is useful.
Neural networks have changed supervised learning and are creating huge economic value. Facts have proved that most of the technical ideas behind basic neural networks are not far away from us, some decades, so why are they just getting started now, and the effect is so Well, in the next video we will discuss why the recent neural network has become a powerful tool you can use.
1.4 Why does deep learning arise? (Why is Deep Learning taking off?)
This section of the video mainly talks about the main factors that make deep learning so popular. Including data scale, calculation volume and algorithm innovation.
The basic technical ideas before deep learning and neural networks have been around for decades. Why are they suddenly becoming popular now? This lesson focuses on some of the main driving factors that make deep learning so popular, which will help you find the best time to apply these things in your organization.
In the past few years, many people have asked me why deep learning can be so effective. When I answer this question, I usually draw them a graph, draw a shape on the horizontal axis, plot the data volume of all tasks here, and draw the performance of the machine learning algorithm on the vertical axis. For example, the accuracy rate is reflected in the spam filtering or ad click prediction, or the accuracy of the neural network in judging the position in the self-driving car. According to the image, you can find that if you draw the performance of a traditional machine learning algorithm as the amount of data A function of, you may get a curved line, just like in the figure, its performance will increase when more data is added at first, but after a period of change, its performance will be like a plateau. Assuming that your horizontal axis is very long and long, they don't know how to deal with huge-scale data. In the past ten years, many of the problems we encountered in society only had relatively small amounts of data.
Thanks to the advent of the digital society, the amount of data nowadays is huge. We have spent a lot of time in these digital fields, such as on computer websites, mobile phone software, and other digital services. They can all create data and are cheap at the same time. Our cameras are deployed in mobile phones, as well as accelerometers and various sensors. At the same time, we have collected more and more data in the field of Internet of Things. For many applications in the past 20 years alone, we have collected a large amount of data, far exceeding the scale at which machine learning algorithms can effectively utilize their advantages.
What the neural network shows is that if you train a small neural network, then the performance may be as shown by the yellow curve in the figure below; if you train a slightly larger neural network, such as a medium-scale neural network (below Figure blue curve), its performance on some data will be better; if you train a very large neural network, it will become like the green curve in the figure below, and keep getting better and better. Therefore, two points can be noticed: if you want to achieve higher performance, then you have two conditions to complete. The first is that you need to train a neural network with a large enough scale to take advantage of the huge amount of data. , In addition, you need to be able to draw to this position of the axis, so you need a lot of data. Therefore, we often say that scale has been driving the progress of deep learning. The scale here also refers to the scale of the neural network. We need a neural network with many hidden units, as well as many parameters and correlations, just as large The data of the scale is the same. In fact, the most reliable way to achieve better performance on neural networks today is often to either train a larger neural network or invest in more data . This can only work to a certain extent, because in the end you consume Exhausted data, or in the end your network is so large that it will take too long to train, but just increasing the scale really allows us to explore a lot of time in the world of deep learning. In order to make this graph more technically more accurate, I have added a label amount to the amount of data that I have written under the axis. By adding this amount of label, it means that when training samples, we also Input and label, and then introduce a little symbol, use lowercase letters to indicate the size of the training set, or the number of training samples, this lowercase letter is combined with some other details on the horizontal axis into this image.
In this small training set, the priority of various algorithms is actually not very clearly defined, so if you don't have a large training set, the effect will depend on your feature engineering ability, which will determine the final performance. Suppose that some people have trained an SVM (support vector machine) to perform closer to the correct features, but some people have trained on a larger scale, and the SVM algorithm may be better in this small training set . Therefore, you know that on the left side of this graphic area, the priority between various algorithms is not very clearly defined. The final performance depends more on your ability to select features in engineering and some details of algorithm processing. , It s just that in some large-scale training sets with very large data, that is, when the one on the right will be very large, we can continue to see larger other methods controlled by neural networks, so if any of your My friend asks you why neural networks are so popular, and I would encourage you to draw such a graph for them.
So it can be said that in the early days of deep learning, the scale of data and the amount of calculations are limited to our ability to train a particularly large neural network. Whether it is on CPU or GPU, it has made us make huge progress. . But gradually, especially in recent years, we have also witnessed great innovations in algorithms. Many algorithmic innovations have been trying to make neural networks run faster.
As a concrete example, a huge breakthrough in neural networks is the conversion from a sigmoid function to a ReLU function, which we mentioned in the previous course.
If you can t understand some of the details I just mentioned, don t worry. One of the problems you can know about using the sigmoid function and machine learning is that in this area, the gradient of the sigmoid function will be close to zero, so the learning speed will be It becomes very slow, because when you achieve gradient descent and the gradient is close to zero, the parameters will be updated very slowly, so the learning rate will also become very slow, and by changing this thing called the activation function, the neural network changes Using this function, a function called ReLU (modified linear unit), the gradient of ReLU is zero for all the negative values of the input, so the gradient will not tend to gradually decrease to zero. The gradient here, the slope of this line on the left is zero, just by converting the Sigmod function into the ReLU function, it can make a gradient called gradient descent ( gradient descent).) Algorithm runs faster. This is an example of a relatively simple algorithm innovation. But fundamentally the impact of algorithm innovation is actually the optimization brought about by computing, so there are many examples like this. We change the algorithm to make the code run faster, which also allows us to train on a larger scale. A neural network, or a multi-port network. Even if we have a large-scale neural network from all the data, another reason why fast calculation is more important is that the process of training your neural network is often based on intuition, and often you have an idea of the neural network architecture. Idea, so you try to write code to realize your idea, and then let you run a test environment to tell you how good your neural network is. By referring to this result and returning to modify some details of your neural network, then You keep repeating the above operations. When your neural network takes a long time to train, it takes a long time to repeat this cycle. There is a big difference here. Build a more efficient neural network according to your production efficiency. . When you can have an idea, give it a try and see how it works. In 10 minutes, or maybe a whole day, if you train your neural network for a month, sometimes this happens, it is worth it, because you get a result quickly. In 10 minutes or a day, you should try more ideas. It is very likely that your neural network will work better and faster in your application. It will really help in increasing the speed. You can get your experimental results faster. This also helps neural network experimenters and related project researchers to iterate faster in the work of deep learning, and to improve your ideas faster, all of which make the entire deep learning research community change Such prosperity, including the incredible invention of new algorithms and uninterrupted progress, is what pioneers are doing. These forces have made deep learning continue to grow.
The good news is that these forces are currently working normally and continuously, making deep learning better and better. Research shows that our society is still throwing out more and more digital data, or using some special hardware for calculations, such as GPU , and faster network connection with various hardware. I am very confident that we can build a super large-scale neural network, and the computing capabilities will be further improved, and the algorithm-relative learning and research community continues to produce extraordinary innovations at the forefront of algorithms. Based on these we can answer optimistically, while maintaining an optimistic attitude towards deep learning, it will get better and better in the next few years.
1.5 About this course
Your study progress is approaching the end of the first week of the first course of this special course. First of all, quickly introduce the content of next week:
As mentioned in the first video, there are five courses in this special project, and it is currently in the first course: Neural Networks and Deep Learning. In this course, you will be taught the most important basic knowledge. By the end of the first course, you will learn how to build a deep neural network and make it work.
Here are some details about the first course, which has four weeks of study materials:
Week 1: An introduction to deep learning. At the end of each week, there will be ten multiple choice questions to test your understanding of the material;
Week 2: About the programming knowledge of neural network, understand the structure of neural network, gradually improve the algorithm and think about how to make the neural network efficient. Do some programming training (paid project) from the second week, and implement the algorithm by yourself;
Week 3: After learning the framework of neural network programming, you will be able to write a hidden layer neural network, so you need to learn all the necessary key concepts to achieve neural network work;
Week 4: Build a deep neural network.
This video is about to end. I hope that after this video, you can check your understanding by looking at the ten multiple-choice questions on the course website. You don t have to review the previous knowledge. Some knowledge is something you don t know now, so you can keep trying. Until everything is done right to understand all concepts.
1.6 Course Resources
I hope you like this course. In order to help you complete the course, this course will list some course resources.
First of all, if you have any questions, or want to discuss problems with other students, or want to discuss any problems with teaching staff including me, or want to file a mistake, the forum is the best place to go, me and others The teaching staff will pay attention to the content of the forum regularly. The forum is also a good place for you to get answers to questions from classmates. If you want to answer questions from classmates, you can come to the forum from the course homepage:
Click the forum tab to enter the forum
On the forum is the best way to ask questions, but for some reasons, you may want to contact us directly. You can send emails to this address. We will try our best to read every email and try to solve common questions. Due to the large number of emails, it is not always possible to reply to every email quickly. In addition, some companies will try to provide deep learning training for their employees. If you want to be responsible to your employees and hire experts to train hundreds or more of employees in deep learning, please contact us using your corporate email. We are in the initial stage of university academic development. If you are a university leader or administrator and want to open a deep learning course in your school, please contact us via university email. The email address is as follows, good luck to you!
Contact us: firstname.lastname@example.org
Deep learning courses: mooc.study.163.com/university/