## Original link: tecdat.cn/?p=7954

## Original source: Tuoduan Data Tribe Official Account

This example shows how to apply Bayesian optimization to deep learning and how to find the best network hyperparameters and training options for convolutional neural networks.

To train a deep neural network, you must specify the neural network architecture and options for training algorithms. Selecting and adjusting these hyperparameters can be difficult and takes time. Bayesian optimization is an algorithm that is very suitable for optimizing the hyperparameters of classification and regression models.

### Prepare data

Download the CIFAR-10 data set [1]. The data set contains 60,000 images, each with a size of 32 x 32 and three color channels (RGB). The size of the entire data set is 175 MB.

Load the CIFAR-10 dataset as training images and labels, and test the images and labels.

[XTrain,YTrain,XTest,YTest] = loadCIFARData(datadir); idx = randperm(numel(YTest),5000); XValidation = XTest(:,:,:,idx); XTest(:,:,:,idx) = []; YValidation = YTest(idx); YTest(idx) = []; Copy code

You can use the following code to display a sample of the training image.

figure; idx = randperm(numel(YTrain),20); for i = 1:numel(idx) subplot(4,5,i); imshow(XTrain(:,:,:,idx(i))); end Copy code

### Select the variable to be optimized

Select the variables to be optimized using Bayesian optimization, and specify the range to be searched. In addition, specify whether the variable is an integer and whether to search for intervals in logarithmic space. Optimize the following variables:

- The depth of the network part. This parameter controls the depth of the network. The network has three parts, each part hasSectionDepthThe same convolutional layer. Therefore, the total number of convolutional layers is3*SectionDepth. The objective function behind the script will be proportional to the number of convolution filters in each layer1/sqrt(SectionDepth). As a result, for different section depths, the number of parameters and the amount of calculation required for each iteration are roughly the same.
- The optimal learning rate depends on your data and the network you are training.
- Stochastic gradient descent momentum.
- L2 regularization strength.

optimVars = [ optimizableVariable('SectionDepth', [1 3],'Type','integer') optimizableVariable('InitialLearnRate', [1e-2 1],'Transform','log') optimizableVariable('Momentum', [0.8 0.98]) optimizableVariable('L2Regularization',[1e-10 1e-2],'Transform','log')]; Copy code

### Perform Bayesian optimization

Use training and validation data as input to create an objective function for the Bayesian optimizer. The objective function trains the convolutional neural network and returns the classification error on the validation set.

ObjFcn = makeObjFcn (XTrain, YTrain, XValidation, YValidation); copy the code

Perform Bayesian optimization by minimizing the classification error on the validation set. In order to take full advantage of the features of Bayesian optimization, you should perform at least 30 objective function evaluations.

After each network is trained,

|================================================ ================================================== ================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | SectionDepth | InitialLearn-| Momentum | L2Regulariza-| | | result | | runtime | (observed) | (estim.) | | Rate | | tion | |================================================ ================================================== ================================| | 1 | Best | 0.19 | 2201 | 0.19 | 0.19 | 3 | 0.012114 | 0.8354 | 0.0010624 | Copy code

| 2 | Accept | 0.3224 | 1734.1 | 0.19 | 0.19636 | 1 | 0.066481 | 0.88231 | 0.0026626 | Copy code

| 3 | Accept | 0.2076 | 1688.7 | 0.19 | 0.19374 | 2 | 0.022346 | 0.91149 | 8.242e-10 | Copy code

| 4 | Accept | 0.1908 | 2167.2 | 0.19 | 0.1904 | 3 | 0.97586 | 0.83613 | 4.5143e-08 | Copy code

| 5 | Accept | 0.1972 | 2157.4 | 0.19 | 0.19274 | 3 | 0.21193 | 0.97995 | 1.4691e-05 | Copy code

| 6 | Accept | 0.2594 | 2152.8 | 0.19 | 0.19 | 3 | 0.98723 | 0.97931 | 2.4847e-10 | Copy code

| 7 | Best | 0.1882 | 2257.5 | 0.1882 | 0.18819 | 3 | 0.1722 | 0.8019 | 4.2149e-06 | Copy code

| 8 | Accept | 0.8116 | 1989.7 | 0.1882 | 0.18818 | 3 | 0.42085 | 0.95355 | 0.0092026 | Copy code

| 9 | Accept | 0.1986 | 1836 | 0.1882 | 0.18821 | 2 | 0.030291 | 0.94711 | 2.5062e-05 | Copy code

| 10 | Accept | 0.2146 | 1909.4 | 0.1882 | 0.18816 | 2 | 0.013379 | 0.8785 | 7.6354e-09 | Copy code

| 11 | Accept | 0.2194 | 1562 | 0.1882 | 0.18815 | 1 | 0.14682 | 0.86272 | 8.6242e-09 | Copy code

| 12 | Accept | 0.2246 | 1591.2 | 0.1882 | 0.18813 | 1 | 0.70438 | 0.82809 | 1.0102e-06 | Copy code

| 13 | Accept | 0.2648 | 1621.8 | 0.1882 | 0.18824 | 1 | 0.010109 | 0.89989 | 1.0481e-10 | Copy code

| 14 | Accept | 0.2222 | 1562 | 0.1882 | 0.18812 | 1 | 0.11058 | 0.97432 | 2.4101e-07 | Copy code

| 15 | Accept | 0.2364 | 1625.7 | 0.1882 | 0.18813 | 1 | 0.079381 | 0.8292 | 2.6722e-05 | Copy code

| 16 | Accept | 0.26 | 1706.2 | 0.1882 | 0.18815 | 1 | 0.010041 | 0.96229 | 1.1066e-05 | Copy code

| 17 | Accept | 0.1986 | 2188.3 | 0.1882 | 0.18635 | 3 | 0.35949 | 0.97824 | 3.153e-07 | Copy code

| 18 | Accept | 0.1938 | 2169.6 | 0.1882 | 0.18817 | 3 | 0.024365 | 0.88464 | 0.00024507 | Copy code

| 19 | Accept | 0.3588 | 1713.7 | 0.1882 | 0.18216 | 1 | 0.010177 | 0.89427 | 0.0090342 | Copy code

| 20 | Accept | 0.2224 | 1721.4 | 0.1882 | 0.18193 | 1 | 0.09804 | 0.97947 | 1.0727e-10 | Copy code

|================================================ ================================================== ================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | SectionDepth | InitialLearn-| Momentum | L2Regulariza-| | | result | | runtime | (observed) | (estim.) | | Rate | | tion | |================================================ ================================================== ================================| | 21 | Accept | 0.1904 | 2184.7 | 0.1882 | 0.18498 | 3 | 0.017697 | 0.95057 | 0.00022247 | Copy code

| 22 | Accept | 0.1928 | 2184.4 | 0.1882 | 0.18527 | 3 | 0.06813 | 0.9027 | 1.3521e-09 | Copy code

| 23 | Accept | 0.1934 | 2183.6 | 0.1882 | 0.1882 | 3 | 0.018269 | 0.90432 | 0.0003573 | Copy code

| 24 | Accept | 0.303 | 1707.9 | 0.1882 | 0.18809 | 1 | 0.010157 | 0.88226 | 0.00088737 | Copy code

| 25 | Accept | 0.194 | 2189.1 | 0.1882 | 0.18808 | 3 | 0.019354 | 0.94156 | 9.6197e-07 | Copy code

| 26 | Accept | 0.2192 | 1752.2 | 0.1882 | 0.18809 | 1 | 0.99324 | 0.91165 | 1.1521e-08 | Copy code

| 27 | Accept | 0.1918 | 2185 | 0.1882 | 0.18813 | 3 | 0.05292 | 0.8689 | 1.2449e-05 | Copy code

__________________________________________________________Copy code

Optimization completed. MaxTime of 50400 seconds reached. Total function evaluations: 27 Total elapsed time: 51962.3666 seconds. Total objective function evaluation time: 51942.8833 Best observed feasible point: SectionDepth InitialLearnRate Momentum L2Regularization ____________ ________________ ________ ________________ 3 0.1722 0.8019 4.2149e-06 Observed objective function value = 0.1882 Estimated objective function value = 0.18813 Function evaluation time = 2257.4627 Best estimated feasible point (according to models): SectionDepth InitialLearnRate Momentum L2Regularization ____________ ________________ ________ ________________ 3 0.1722 0.8019 4.2149e-06 Estimated objective function value = 0.18813 Estimated function evaluation time = 2166.2402 Copy code

### Evaluate the final network

Load the best network found in optimization and its verification accuracy.

valError = 0.1882 duplicated code

Predict the label of the test set and calculate the test error. The classification of each image in the test set is regarded as an independent event with a certain probability of success, which means that the number of misclassified images follows a binomial distribution. Use it to calculate the standard error (

*Wald method*.

testError = 0.1864 duplicated code

testError95CI = 1 2 0.1756 0.1972 Copy code

Plot the confusion matrix to obtain test data. Display the precision and recall rate of each class by using column and row summaries.

Copy code

You can use the following code to display the test image and its predicted classes and the probabilities of these classes.

### Optimize the objective function

Define the objective function for optimization.

Define the convolutional neural network architecture.

- Padded on the convolutional layer so that the spatial output size is always the same as the input size.
- Each time the maximum pooling layer is used to downsample the spatial dimension by 2 times, the number of filters is increased by 2 times. Doing so can ensure that the amount of calculation required for each convolutional layer is approximately the same.
- Choose the number of filters that is proportional to1/sqrt(SectionDepth)Make networks of different depths have roughly the same number of parameters, and the amount of calculation required for each iteration is roughly the same. To increase the number of network parameters and overall network flexibility, please increasenumF. To train a deeper network, please changeSectionDepthThe scope of the variable.
- useconvBlock(filterSize,numFilters,numConvLayers)Block creatednumConvLayersConvolutional layers, each with a specifiedfilterSizewithnumFiltersFilter, and each subsequent normalization layer and RELU layer in batches. TheconvBlockThe function is defined at the end of this example.

Specify verification data and select one

Use data augmentation to randomly flip the training images along the vertical axis and convert them to four pixels randomly horizontally and vertically.

Train the network and plot the training progress during training.

Evaluate the trained network on the validation set, calculate the predicted image labels, and calculate the error rate on the validation data.

Create a file name containing validation errors, and then save the network, validation errors, and training options to disk. Objective function

The

Copy code

## references

[1] Krizhevsky, Alex. "Learning multi-layer functions from tiny images." (2009). www.cs.toronto.edu/~kriz/learn...

Most popular insights

1. R language uses neural network to improve the nelson-siegel model to fit the yield curve analysis

2. R language realizes fitting neural network prediction and result visualization

3. Python uses genetic algorithm-neural network-fuzzy logic control algorithm to analyze lottery

4. Python for nlp: Multi-label text lstm neural network classification using keras

5. Use R language to implement neural network prediction stock examples

6. R language is based on Keras's small data set deep learning image classification

7. Example of seq2seq model used for NLP realize neural machine translation with Keras

8. Analyze sugar based on a deep learning model optimized by a grid search algorithm in python