Matlab uses Bayesian optimized deep learning: convolutional neural network CNN

Matlab uses Bayesian optimized deep learning: convolutional neural network CNN

Original link: tecdat.cn/?p=7954

Original source: Tuoduan Data Tribe Official Account

 

This example shows how to apply Bayesian optimization to deep learning and how to find the best network hyperparameters and training options for convolutional neural networks.

To train a deep neural network, you must specify the neural network architecture and options for training algorithms. Selecting and adjusting these hyperparameters can be difficult and takes time. Bayesian optimization is an algorithm that is very suitable for optimizing the hyperparameters of classification and regression models. 

 

Prepare data

Download the CIFAR-10 data set [1]. The data set contains 60,000 images, each with a size of 32 x 32 and three color channels (RGB). The size of the entire data set is 175 MB. 

Load the CIFAR-10 dataset as training images and labels, and test the images and labels. 

[XTrain,YTrain,XTest,YTest] = loadCIFARData(datadir); idx = randperm(numel(YTest),5000); XValidation = XTest(:,:,:,idx); XTest(:,:,:,idx) = []; YValidation = YTest(idx); YTest(idx) = []; Copy code

You can use the following code to display a sample of the training image.

figure; idx = randperm(numel(YTrain),20); for i = 1:numel(idx) subplot(4,5,i); imshow(XTrain(:,:,:,idx(i))); end Copy code

Select the variable to be optimized

Select the variables to be optimized using Bayesian optimization, and specify the range to be searched. In addition, specify whether the variable is an integer and whether to search for intervals in logarithmic space. Optimize the following variables:

  • The depth of the network part. This parameter controls the depth of the network. The network has three parts, each part has
    SectionDepth
    The same convolutional layer. Therefore, the total number of convolutional layers is
    3*SectionDepth
    . The objective function behind the script will be proportional to the number of convolution filters in each layer
    1/sqrt(SectionDepth)
    . As a result, for different section depths, the number of parameters and the amount of calculation required for each iteration are roughly the same.
  •  The optimal learning rate depends on your data and the network you are training.
  • Stochastic gradient descent momentum. 
  • L2 regularization strength. 
optimVars = [ optimizableVariable('SectionDepth', [1 3],'Type','integer') optimizableVariable('InitialLearnRate', [1e-2 1],'Transform','log') optimizableVariable('Momentum', [0.8 0.98]) optimizableVariable('L2Regularization',[1e-10 1e-2],'Transform','log')]; Copy code

Perform Bayesian optimization

Use training and validation data as input to create an objective function for the Bayesian optimizer. The objective function trains the convolutional neural network and returns the classification error on the validation set. 

ObjFcn = makeObjFcn (XTrain, YTrain, XValidation, YValidation); copy the code

Perform Bayesian optimization by minimizing the classification error on the validation set. In order to take full advantage of the features of Bayesian optimization, you should perform at least 30 objective function evaluations. 

After each network is trained,

bayesopt
Output the results to the command window. Then the function returns the file name in
BayesObject.UserDataTrace
. The objective function saves the network and returns the file name to
bayesopt
.

|================================================ ================================================== ================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | SectionDepth | InitialLearn-| Momentum | L2Regulariza-| | | result | | runtime | (observed) | (estim.) | | Rate | | tion | |================================================ ================================================== ================================| | 1 | Best | 0.19 | 2201 | 0.19 | 0.19 | 3 | 0.012114 | 0.8354 | 0.0010624 | Copy code
| 2 | Accept | 0.3224 | 1734.1 | 0.19 | 0.19636 | 1 | 0.066481 | 0.88231 | 0.0026626 | Copy code
| 3 | Accept | 0.2076 | 1688.7 | 0.19 | 0.19374 | 2 | 0.022346 | 0.91149 | 8.242e-10 | Copy code
| 4 | Accept | 0.1908 | 2167.2 | 0.19 | 0.1904 | 3 | 0.97586 | 0.83613 | 4.5143e-08 | Copy code
| 5 | Accept | 0.1972 | 2157.4 | 0.19 | 0.19274 | 3 | 0.21193 | 0.97995 | 1.4691e-05 | Copy code
| 6 | Accept | 0.2594 | 2152.8 | 0.19 | 0.19 | 3 | 0.98723 | 0.97931 | 2.4847e-10 | Copy code
| 7 | Best | 0.1882 | 2257.5 | 0.1882 | 0.18819 | 3 | 0.1722 | 0.8019 | 4.2149e-06 | Copy code
| 8 | Accept | 0.8116 | 1989.7 | 0.1882 | 0.18818 | 3 | 0.42085 | 0.95355 | 0.0092026 | Copy code
| 9 | Accept | 0.1986 | 1836 | 0.1882 | 0.18821 | 2 | 0.030291 | 0.94711 | 2.5062e-05 | Copy code
| 10 | Accept | 0.2146 | 1909.4 | 0.1882 | 0.18816 | 2 | 0.013379 | 0.8785 | 7.6354e-09 | Copy code
| 11 | Accept | 0.2194 | 1562 | 0.1882 | 0.18815 | 1 | 0.14682 | 0.86272 | 8.6242e-09 | Copy code
| 12 | Accept | 0.2246 | 1591.2 | 0.1882 | 0.18813 | 1 | 0.70438 | 0.82809 | 1.0102e-06 | Copy code
| 13 | Accept | 0.2648 | 1621.8 | 0.1882 | 0.18824 | 1 | 0.010109 | 0.89989 | 1.0481e-10 | Copy code
| 14 | Accept | 0.2222 | 1562 | 0.1882 | 0.18812 | 1 | 0.11058 | 0.97432 | 2.4101e-07 | Copy code
| 15 | Accept | 0.2364 | 1625.7 | 0.1882 | 0.18813 | 1 | 0.079381 | 0.8292 | 2.6722e-05 | Copy code
| 16 | Accept | 0.26 | 1706.2 | 0.1882 | 0.18815 | 1 | 0.010041 | 0.96229 | 1.1066e-05 | Copy code
| 17 | Accept | 0.1986 | 2188.3 | 0.1882 | 0.18635 | 3 | 0.35949 | 0.97824 | 3.153e-07 | Copy code
| 18 | Accept | 0.1938 | 2169.6 | 0.1882 | 0.18817 | 3 | 0.024365 | 0.88464 | 0.00024507 | Copy code
| 19 | Accept | 0.3588 | 1713.7 | 0.1882 | 0.18216 | 1 | 0.010177 | 0.89427 | 0.0090342 | Copy code
| 20 | Accept | 0.2224 | 1721.4 | 0.1882 | 0.18193 | 1 | 0.09804 | 0.97947 | 1.0727e-10 | Copy code
|================================================ ================================================== ================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | SectionDepth | InitialLearn-| Momentum | L2Regulariza-| | | result | | runtime | (observed) | (estim.) | | Rate | | tion | |================================================ ================================================== ================================| | 21 | Accept | 0.1904 | 2184.7 | 0.1882 | 0.18498 | 3 | 0.017697 | 0.95057 | 0.00022247 | Copy code
| 22 | Accept | 0.1928 | 2184.4 | 0.1882 | 0.18527 | 3 | 0.06813 | 0.9027 | 1.3521e-09 | Copy code
| 23 | Accept | 0.1934 | 2183.6 | 0.1882 | 0.1882 | 3 | 0.018269 | 0.90432 | 0.0003573 | Copy code
| 24 | Accept | 0.303 | 1707.9 | 0.1882 | 0.18809 | 1 | 0.010157 | 0.88226 | 0.00088737 | Copy code
| 25 | Accept | 0.194 | 2189.1 | 0.1882 | 0.18808 | 3 | 0.019354 | 0.94156 | 9.6197e-07 | Copy code
| 26 | Accept | 0.2192 | 1752.2 | 0.1882 | 0.18809 | 1 | 0.99324 | 0.91165 | 1.1521e-08 | Copy code
| 27 | Accept | 0.1918 | 2185 | 0.1882 | 0.18813 | 3 | 0.05292 | 0.8689 | 1.2449e-05 | Copy code
__________________________________________________________Copy code

Optimization completed. MaxTime of 50400 seconds reached. Total function evaluations: 27 Total elapsed time: 51962.3666 seconds. Total objective function evaluation time: 51942.8833 Best observed feasible point: SectionDepth InitialLearnRate Momentum L2Regularization ____________ ________________ ________ ________________ 3 0.1722 0.8019 4.2149e-06 Observed objective function value = 0.1882 Estimated objective function value = 0.18813 Function evaluation time = 2257.4627 Best estimated feasible point (according to models): SectionDepth InitialLearnRate Momentum L2Regularization ____________ ________________ ________ ________________ 3 0.1722 0.8019 4.2149e-06 Estimated objective function value = 0.18813 Estimated function evaluation time = 2166.2402 Copy code

Evaluate the final network

Load the best network found in optimization and its verification accuracy.

valError = 0.1882 duplicated code

Predict the label of the test set and calculate the test error. The classification of each image in the test set is regarded as an independent event with a certain probability of success, which means that the number of misclassified images follows a binomial distribution. Use it to calculate the standard error (

testErrorSE
)with
testError95CI
Approximately 95% confidence interval for the generalized error rate (). This method is usually called the Wald method

testError = 0.1864 duplicated code
testError95CI = 1 2 0.1756 0.1972 Copy code

Plot the confusion matrix to obtain test data. Display the precision and recall rate of each class by using column and row summaries.

  Copy code

 

You can use the following code to display the test image and its predicted classes and the probabilities of these classes.

Optimize the objective function

Define the objective function for optimization. 

Define the convolutional neural network architecture.

  • Padded on the convolutional layer so that the spatial output size is always the same as the input size.
  • Each time the maximum pooling layer is used to downsample the spatial dimension by 2 times, the number of filters is increased by 2 times. Doing so can ensure that the amount of calculation required for each convolutional layer is approximately the same.
  • Choose the number of filters that is proportional to
    1/sqrt(SectionDepth)
    Make networks of different depths have roughly the same number of parameters, and the amount of calculation required for each iteration is roughly the same. To increase the number of network parameters and overall network flexibility, please increase
    numF
    . To train a deeper network, please change
    SectionDepth
    The scope of the variable.
  • use
    convBlock(filterSize,numFilters,numConvLayers)
    Block created
    numConvLayers
    Convolutional layers, each with a specified
    filterSize
    with
    numFilters
    Filter, and each subsequent normalization layer and RELU layer in batches. The
    convBlock
    The function is defined at the end of this example.

 Specify verification data and select one

'ValidationFrequency'
Value in order to
trainNetwork
The network is verified once every period. Train for a fixed number of epochs, and reduce the learning rate by 10 times in the last epoch. This reduces the noise of parameter updates and makes the settlement of network parameters closer to the minimum value of the loss function.

Use data augmentation to randomly flip the training images along the vertical axis and convert them to four pixels randomly horizontally and vertically.

Train the network and plot the training progress during training.

 

Evaluate the trained network on the validation set, calculate the predicted image labels, and calculate the error rate on the validation data.

Create a file name containing validation errors, and then save the network, validation errors, and training options to disk. Objective function

fileName
As output parameter
bayesopt
Return, and return all file names in
BayesObject.UserDataTrace
.

The

convBlock
Function to create a
numConvLayers
Convolutional layer block, each convolutional layer has a designated
filterSize
with
numFilters
Filter, each convolutional layer is followed by a batch normalization layer and a ReLU layer.

  Copy code

references

[1] Krizhevsky, Alex. "Learning multi-layer functions from tiny images." (2009). www.cs.toronto.edu/~kriz/learn...


Most popular insights

1. R language uses neural network to improve the nelson-siegel model to fit the yield curve analysis

2. R language realizes fitting neural network prediction and result visualization

3. Python uses genetic algorithm-neural network-fuzzy logic control algorithm to analyze lottery

4. Python for nlp: Multi-label text lstm neural network classification using keras

5. Use R language to implement neural network prediction stock examples

6. R language is based on Keras's small data set deep learning image classification

7. Example of seq2seq model used for NLP realize neural machine translation with Keras

8. Analyze sugar based on a deep learning model optimized by a grid search algorithm in python

9. Matlab uses Bayesian optimized deep learning