Prediction interval of deep learning neural network

Prediction interval of deep learning neural network

The prediction interval provides a measure of uncertainty for the prediction of regression problems.

For example, a 95% prediction interval means 95 out of 100 times, and the true value will fall between the lower limit and the upper limit of the range. This is different from a simple point forecast that may represent the center of the uncertainty interval. There is no standard technique for calculating the prediction interval of deep learning neural networks for regression prediction modeling problems. However, a set of models can be used to estimate fast and dirty prediction intervals, which in turn provide the distribution of point predictions from which the interval can be calculated.

In this tutorial, you will discover how to calculate the prediction interval of a deep learning neural network. After completing this tutorial, you will know:

  • The prediction interval provides a measure of uncertainty for regression prediction modeling problems.
  • How to develop and evaluate a simple multilayer perceptron neural network on standard regression problems.
  • How to use neural network model integration to calculate and report prediction intervals.

Tutorial overview

This tutorial is divided into three parts: They are:

  • Prediction interval
  • Recurrent neural network
  • Neural network prediction interval

Prediction interval

Usually, the predictive model (ie, predictive value) used for regression problems makes point predictions. This means that they can predict a single value, but cannot provide any indication about the uncertainty of that prediction. By definition, predictions are estimates or approximations, and contain some uncertainty. The uncertainty comes from the error of the model itself and the noise in the input data. The model is an approximation of the relationship between input variables and output variables. The forecast interval is a quantification of forecast uncertainty. It provides the upper and lower probability limits for the estimation of outcome variables.

The prediction interval is the time interval most commonly used when making predictions or predictions in the regression model of the predicted quantity. The prediction interval revolves around the predictions made by the model and hopes to cover the range of real results. For more information on general prediction intervals, see the tutorial:

"The prediction interval of machine learning":

https://machinelearningmastery.com/prediction-intervals-for-machine-learning/

Now that we are familiar with the prediction interval, we can consider how to calculate the interval of the neural network. First define a regression problem and a neural network model to solve this problem.

Recurrent neural network

In this section, we will define the regression predictive modeling problem and the neural network model to solve the problem. 1. let us introduce a standard regression data set. We will use the housing data set. The housing data set is a standard machine learning data set, including 506 rows of data, which contains 13 digital input variables and a digital target variable.

Using a test tool with three replicates of repeated stratification 10-fold cross-validation, a naive model can achieve a mean absolute error (MAE) of about 6.6. On the same test tool at about 1.9, the highest performance model can achieve MAE. This provides a bound on expected performance for this data set. This data set includes predicting housing prices based on detailed information about the suburbs of Boston, the United States.

Housing data set (housing.csv):

https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv

House description (house name):

https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.names

There is no need to download the dataset; as part of our working example, we will download it automatically.

The following example downloads and loads the dataset as a Pandas DataFrame and outlines the shape of the dataset and the first five rows of the data.

# load and summarize the housing dataset from pandas  import  read_csv from matplotlib  import  pyplot # load dataset url =  'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' dataframe = read_csv(url, header=None) # summarize shape print (dataframe.shape) # summarize first few lines Print (dataframe.head ()) copying the code

The running example will confirm 506 rows of data and 13 input variables and a numerical target variable (14 in total). We can also see that all input variables are numbers.

( 50614 )          0      1      2    3       4       5    ...   8       9      10       11     12     13 0   0.00632   18.0   2.31    0   0.538   6.575   ...    1   296.0   15.3   396.90   4.98   24.0 1   0.02731    0.0   7.07    0   0.469   6.421   ...    2   242.0   17.8   396.90   9.14   21.6 2   0.02729    0.0   7.07    0   0.469  7.185   ...    2   242.0   17.8   392.83   4.03   34.7 3   0.03237    0.0   2.18    0   0.458   6.998   ...    3   222.0   18.7   394.63   2.94   33.4 4   0.06905    0.0   2.18    0   0.458   7.147   ...    3   222.0   18.7   396.90   5.33   36.2   [ 5  rows x  14  columns] Copy code

Next, we can prepare the data set for modeling. 1. the data set can be split into input and output columns, and then the rows can be split into training and test data sets. In this case, we will use about 67% of the rows to train the model, while the remaining 33% of the rows are used to estimate the performance of the model.

# split into input and output values X, y = values[:,: -1 ], values[:, -1 ] # split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 ) copying the code

You can learn more about training test splits in this tutorial: Train test splits to evaluate machine learning algorithms. Then, we scale all input columns (variables) to the 0-1 range, which is called data normalization, This is a good habit when using neural network models.

# scale input data scaler = MinMaxScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) Copy code

You can learn more about using MinMaxScaler to standardize input data in this tutorial: "How to use StandardScaler and MinMaxScaler conversion in Python":

https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/

A complete example of the data prepared for modeling is listed below.

# load and prepare the dataset  for  modeling from pandas  import  read_csv from sklearn.model_selection  import  train_test_split from sklearn.preprocessing  import  MinMaxScaler # load dataset url =  'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' dataframe = read_csv(url, header=None) values = dataframe.values # split into input and output values X, y = values[:,: -1 ], values[:, -1 ] # split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 ) # scale input data scaler = MinMaxScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # summarize Print (X_train.shape, X_test.shape, y_train.shape, y_test.shape) copy the code

The running example loads the data set as before, then splits the columns into input and output elements, splits the rows into training and test sets, and finally scales all input variables to the range of [0,1]. The shape of the training image and the test set are printed, showing that we have 339 rows for training the model and 167 rows for evaluating the model.

( 33913 ) ( 16713 ) ( 339 ,) ( 167 ,) copy the code

Next, we can define, train and evaluate a multi-layer perceptron (MLP) model in the data set. We will define a simple model with two hidden layers and an output layer for predicting values. We will use the ReLU activation function and "he" weight initialization, which is a good habit. After some trial and error, the number of nodes in each hidden layer was selected.

# define neural network model features = X_train.shape[ 1 ] model = Sequential() model.add(Dense( 20 , kernel_initializer = 'he_normal' , activation = 'relu' , input_dim=features)) model.add(Dense( 5 , kernel_initializer = 'he_normal' , activation = 'relu' )) model.add (the Dense ( . 1 )) to copy the code

We will use an efficient Adam's version of Stochastic Gradient Descent method with close to the default learning rate and momentum value, and use the mean square error (MSE) loss function (standard for regression predictive modeling problems) to fit the model.

# compile the model and specify loss and optimizer opt = Adam(learning_rate = 0.01 , beta_1 = 0.85 , beta_2 = 0.999 ) model.compile (opt = Optimizer, Loss = 'MSE' ) copying the code

You can learn more about Adam's optimization algorithm in this tutorial:

"Write code from scratch Adam gradient descent optimization"

https://machinelearningmastery.com/adam-optimization-from-scratch/

Then, the model will fit 300 epochs with a batch size of 16 samples. After some trial and error, this configuration was chosen.

# fit the model on the training dataset model.fit (X_train, y_train, verbose = 2 , epochs = 300 , the batch_size = 16 ) copying the code

You can learn more about batches and epochs in this tutorial:

"Differences between batches and periods in neural networks"

https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/

Finally, the model can be used to make predictions on the test data set. We can evaluate the prediction by comparing the predicted value with the expected value in the test set, and calculate the mean absolute error (MAE), which is a useful measure of model performance .

make  predictions on the test set yhat = model.predict(X_test, verbose = 0 ) # calculate the average error in the predictions mae = mean_absolute_error(y_test, yhat) print ( 'MAE: .3f%'  % Mae) copying the code

The complete example is as follows:

# train and evaluate a multilayer perceptron neural network on the housing regression dataset from pandas  import  read_csv from sklearn.model_selection import  train_test_split from sklearn.metrics  import  mean_absolute_error from sklearn.preprocessing import  MinMaxScaler from tensorflow.keras.models  import  Sequential from tensorflow.keras.layers  import  Dense from tensorflow.keras.optimizers  import  Adam # load dataset url =  'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' dataframe = read_csv(url, header=None) values = dataframe.values # split into input and output values X, y = values[:,: -1 ], values[:, -1 ] # split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 , random_state = 1 ) # scale input data scaler = MinMaxScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # define neural network model features = X_train.shape[ 1 ] model = Sequential() model.add(Dense( 20 , kernel_initializer = 'he_normal' , activation = 'relu' , input_dim=features)) model.add(Dense( 5 , kernel_initializer = 'he_normal' , activation = 'relu' )) model.add(Dense( 1 )) # compile the model and specify loss and optimizer opt = Adam(learning_rate = 0.01 , beta_1 = 0.85 , beta_2 = 0.999 ) model.compile(optimizer=opt, loss= 'mse' ) # fit the model on the training dataset model.fit(X_train, y_train, verbose = 2 , epochs = 300 , batch_size = 16 ) #  make  predictions on the test set yhat = model.predict(X_test, verbose = 0 ) # calculate the average error in the predictions mae = mean_absolute_error(y_test, yhat) Print ( 'MAE: .3f%'  % Mae) copying the code

Running the example will load and prepare the data set, define and fit the MLP model on the training data set, and evaluate its performance on the test set.

Note: Due to the randomness of the algorithm or evaluation procedure, or the difference in numerical precision, your results may be different. Consider running the example several times and comparing the average results.

In this case, we can see that the average absolute error achieved by the model is about 2.3, which is better than the naive model and close to the best model.

There is no doubt that by further adjusting the model, we can achieve near-optimal performance, but this is enough for us to study the prediction interval.

Epoch  296/300 22 is/22 is  -  0 S - Loss:  7.1741 Epoch  297/300 22 is/22 is  -  0 S - Loss:  6.8044 Epoch  298/300 22 is/22 is  -  0 S - Loss:  6.8623 Epoch  299/300 22 is/22 is  -  0 S - Loss:  7.7010 Epoch  300/300 22 is/22 is  - 0 S - Loss:  6.5374 MAE:  2.300 copy the code

Next, let's see how to calculate the prediction interval using the MLP model on the housing dataset.

Neural network prediction interval

In this section, we will use the regression problem and model developed in the previous section to develop the prediction interval.

Compared with linear methods like linear regression (prediction interval calculation is very simple), the prediction interval calculation of nonlinear regression algorithms like neural networks is challenging. There is no standard technology. There are many ways to calculate the effective prediction interval for a neural network model. I suggest some papers listed in the "More Reading" section to learn more.

In this tutorial, we will use a very simple method, which has a lot of room for expansion. I call it "fast and dirty" because it is fast and easy to calculate, but it has certain limitations. It involves fitting multiple final models (for example, 10 to 30). The distribution of the point predictions from the set members is then used to calculate the point prediction and the prediction interval.

For example, the point prediction can be taken as the average of the point predictions from the set members, and the 95% prediction interval can be taken as the 1.96 standard deviation from the average. This is a simple Gaussian prediction interval, although alternative methods can be used, such as minimum and maximum point predictions. Alternatively, the bootstrap method can be used to train each ensemble member on different bootstrap samples, and the 2.5th percentile and 97.5th percentile of the point prediction can be used as the prediction interval.

For more information about the bootstrap method, see the tutorial:

"A brief introduction to the Bootstrap method"

https://machinelearningmastery.com/a-gentle-introduction-to-the-bootstrap-method/

These extensions are reserved as exercises; we will stick to simple Gaussian prediction intervals.

Suppose that the training data set defined in the previous section is the entire data set, and we are training one or more final models on this entire data set. Then, we can use the prediction interval on the test set to make predictions and evaluate the effectiveness of the interval in the future.

We can simplify the code by dividing the elements developed in the previous section into functions. 1. let us define a function to load and prepare the regression data set defined by the URL.

# load and prepare the dataset def load_dataset(url):  dataframe = read_csv(url, header=None)  values = dataframe.values  # split into input and output values  X, y = values[:,: -1 ], values[:, -1 ]  # split into train and test sets  X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 , random_state = 1 )  # scale input data  scaler = MinMaxScaler()  scaler.fit(X_train)  X_train = scaler.transform(X_train)  X_test = scaler.transform(X_test)  return  X_train, X_test, y_train, android.permission.FACTOR. Copy the code

Next, we can define a function that will define and train the MLP model given the training data set, and then return a fitted model suitable for prediction.

# define and fit the model def fit_model(X_train, y_train):  # define neural network model  features = X_train.shape[ 1 ]  model = Sequential()  model.add(Dense( 20 , kernel_initializer = 'he_normal' , activation = 'relu' , input_dim=features))  model.add(Dense( 5 , kernel_initializer = 'he_normal' , activation = 'relu' ))  model.add(Dense( 1 ))  # compile the model and specify loss and optimizer  opt = Adam(learning_rate = 0.01 , beta_1 = 0.85 , beta_2 = 0.999 )  model.compile(optimizer=opt, loss= 'mse' )  # fit the model on the training dataset  model.fit (X_train, y_train, verbose = 0 , = epochs 300 , the batch_size = 16 )   return  Model duplicated code

We need multiple models to make point predictions. These models will define the distribution of point predictions from which the interval can be estimated.

Therefore, we need to fit multiple models on the training data set. Each model must be different in order to make different predictions. This can be achieved when the training MLP model has randomness, random initial weights, and the use of stochastic gradient descent optimization algorithms. The more models there are, the better point prediction will estimate the function of the model. I recommend using at least 10 models, and more than 30 models may not bring much benefit. The following function fits the whole model and stores it in the returned list. Out of interest, each fitted model was also evaluated on the test set, and the test set was reported after each model was fitted. We hope that the estimated performance of each model on the retention test set will be slightly different, and the reported scores will help us confirm this expectation.

# fit an ensemble of models def fit_ensemble(n_members, X_train, X_test, y_train, y_test):  ensemble = list()  for  i in  range (n_members):   # define and fit the model on the training set   model = fit_model(X_train, y_train)   # evaluate model on the test set   yhat = model.predict(X_test, verbose = 0 )   mae = mean_absolute_error(y_test, yhat)   print ( '>%d, MAE: %.3f'  % (i+ 1 , mae))   # store the model   Ensemble. the append (Model)   return  Ensemble duplicated code

Finally, we can use a well-trained collection of models to make point predictions, which can be summarized as a prediction interval.

The following function achieves this. 1. each model performs point prediction on the input data, then calculates the 95% prediction interval, and returns the lower limit, average, and upper limit of the interval.

This function is designed to take a single line as input, but can be easily adapted to multiple lines.

make  predictions with the ensemble and calculate a prediction interval def predict_with_pi(ensemble, X):  #  make  predictions  yhat = [model.predict(X, verbose= 0for  model in ensemble]  yhat = asarray(yhat)  # calculate  95 % gaussian prediction interval  interval =  1.96  * yhat.std()  lower, upper = yhat.mean()-interval, yhat.mean() + interval  return  Lower, yhat.mean (), Upper copy the code

Finally, we can call these functions. 1. load and prepare the data set, and then define and fit the set.

# load dataset url =  'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' X_train, X_test, y_train, y_test = load_dataset(url) # fit ensemble n_members =  30 ensemble = fit_ensemble(n_members, X_train, X_test, y_train, y_test) Copy code

Then, we can use a row of data in the test set and make predictions at the prediction interval, and then report the results.

We also report the expected expected value, which will be covered within the forecast interval (probably close to 95% of the time; this is not entirely accurate, but a rough approximation).

make  predictions with prediction interval newX = asarray([X_test[ 0 , :]]) lower, mean, upper = predict_with_pi(ensemble, newX) print ( 'Point prediction: %.3f'  % mean) print ( '95%% prediction interval: [%.3f, %.3f]'  % (lower, upper)) print ( 'True value: %.3f'  % y_test[ 0 ]) Copy code

In summary, the following is a complete example of using a multilayer perceptron neural network to make predictions at prediction intervals.

# prediction interval  for  mlps on the housing regression dataset from numpy  import  asarray from pandas  import  read_csv from sklearn.model_selection  import  train_test_split from sklearn.metrics  import  mean_absolute_error from sklearn.preprocessing  import  MinMaxScaler from tensorflow.keras.models  import  Sequential from tensorflow.keras.layers  import  Dense from tensorflow.keras.optimizers  import  Adam   # load and prepare the dataset def load_dataset(url):  dataframe = read_csv(url, header=None)  values = dataframe.values  # split into input and output values  X, y = values[:,: -1 ], values[:, -1 ]  # split into train and test sets  X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 , random_state = 1 )  # scale input data  scaler = MinMaxScaler()  scaler.fit(X_train)  X_train = scaler.transform(X_train)  X_test = scaler.transform(X_test)  return  X_train, X_test, y_train, y_test   # define and fit the model def fit_model(X_train, y_train):  # define neural network model  features = X_train.shape[ 1 ]  model = Sequential()  model.add(Dense( 20 , kernel_initializer = 'he_normal' , activation = 'relu' , input_dim=features))  model.add(Dense( 5 , kernel_initializer = 'he_normal' , activation = 'relu' ))  model.add(Dense( 1 ))  # compile the model and specify loss and optimizer  opt = Adam(learning_rate = 0.01 , beta_1 = 0.85 , beta_2 = 0.999 )  model.compile(optimizer=opt, loss= 'mse' )  # fit the model on the training dataset  model.fit(X_train, y_train, verbose = 0 , epochs = 300 , batch_size = 16 )   return  model   # fit an ensemble of models def fit_ensemble(n_members, X_train, X_test, y_train, y_test):  ensemble = list()  for  i in  range (n_members):   # define and fit the model on the training set   model = fit_model(X_train, y_train)   # evaluate model on the test set   yhat = model.predict(X_test, verbose = 0 )   mae = mean_absolute_error(y_test, yhat)   print ( '>%d, MAE: %.3f'  % (i+ 1 , mae))   # store the model   ensemble. append (model)   return  ensemble   #  make  predictions with the ensemble and calculate a prediction interval def predict_with_pi(ensemble, X):  #  make  predictions  yhat = [model.predict(X, verbose= 0for  model in ensemble]  yhat = asarray(yhat)  # calculate  95 % gaussian prediction interval  interval =  1.96  * yhat.std()  lower, upper = yhat.mean()-interval, yhat.mean() + interval  return  lower, yhat.mean(), upper   # load dataset url =  'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' X_train, X_test, y_train, y_test = load_dataset(url) # fit ensemble n_members =  30 ensemble = fit_ensemble(n_members, X_train, X_test, y_train, y_test) #  make  predictions with prediction interval newX = asarray([X_test[ 0 , :]]) lower, mean, upper = predict_with_pi(ensemble, newX) print ( 'Point prediction: %.3f'  % mean) print ( '95%% prediction interval: [%.3f, %.3f]'  % (lower, upper)) print ( 'True value: %.3f'  % y_test[ 0 ]) Copy code

The running example is adapted to each set member in turn, and its estimated performance is reported on the reserved test set; finally, a prediction with a prediction interval is made and predicted.

Note: Due to the randomness of the algorithm or evaluation procedure, or the difference in numerical precision, your results may be different. Consider running the example several times and comparing the average results.

In this case, we can see that the performance of each model is slightly different, which confirms that our expectations of the model are indeed different.

Finally, we can see that the ensemble made a prediction of about 30.5 points with a 95% prediction interval [26.287, 34.822]. We can also see that the true value is 28.2, and the interval does capture this value, which is very good.

> 1 , MAE:  2.259 > 2 , MAE:  2.144 > 3 , MAE:  2.732 > 4 , MAE:  2.628 > 5 , MAE:  2.483 > 6 , MAE:  2.551 > 7 , MAE:  2.505 > 8 , MAE:  2.299 > 9 , MAE:  2.706 > 10 , MAE:  2.145 > 11 , MAE:  2.765 > 12 , MAE:  3.244 > 13 , MAE: 2.385 > 14 , MAE:  2.592 > 15 , MAE:  2.418 > 16 , MAE:  2.493 > 17 , MAE:  2.367 > 18 , MAE:  2.569 > 19 , MAE:  2.664 > 20 , MAE:  2.233 > 21 , MAE:  2.228 > 22 is , MAE:  2.646 > 23 is , MAE:  2.641 > 24 , MAE:  2.492 > 25 , MAE:  2.558 >26 , MAE:  2.416 > 27 , MAE:  2.328 > 28 , MAE:  2.383 > 29 , MAE:  2.215 > 30 , MAE:  2.408 Point prediction:  30.555 95 % prediction interval: [ 26.28734.822 ] Value True:  28.200 Copy the code

As mentioned above, this is a fast and dirty technique for neural network prediction with prediction intervals. There are some simple extensions, such as applying guided methods to point predictions that may be more reliable, and the more advanced techniques described in some of the papers I suggest you explore below.

Author: Yishui Hancheng, CSDN blog expert, personal research direction: machine learning, deep learning, NLP, CV

Blog:  yishuihancheng.blog.csdn.net