Time series prediction MATLAB implements CNN-LSTM (convolutional long short-term memory neural network) time series prediction

Time Series Forecasting | [MATLAB] Real CNN-LSTM (Convolutional Long Short-Term Memory Neural Network) Time Series Forecasting

Table of contents

basic introduction

The running test environment MATLAB2020b

  • Deep learning methods have developed rapidly in recent years because of their strong data feature extraction and fitting capabilities. Common deep learning models include deep belief networks (DBN), depth restricted Boltzmann machines (restricted). to beboltzmann machines, RBM), convolutional neural networks (CNN), long-short-term memory (LSTM) networks, etc. At present, deep learning has achieved great success in image processing, speech recognition and other fields. Deep learning also has significant advantages in data processing related to time and space, especially for solving various prediction problems.
  • A CNN-LSTM deep neural network model including convolutional neural network and long short-term memory network is proposed. The data is first processed, then the processed data and selected historical data are input into the CNN-LSTM network for training, the model parameters are determined, and finally the prediction of the time series is realized.

[CNN] -LSTM model

CNN network [architecture]

  • The CNN network can extract the spatial structure relationship of multi-dimensional time series data. It is mainly composed of convolutional layers and pooling layers. It uses local connections, weight sharing and other features to greatly reduce the number of model parameters, extract data features, and speed up training. , to improve generalization performance.
  • A typical convolutional neural network structure is shown in the figure. The convolutional layer consists of multiple feature surfaces, each feature surface consists of multiple neurons, and each neuron is connected to the local feature surface area of ​​the previous layer through a convolution kernel. The layer is to extract different features of different time series data through such convolution operations.
  • After the convolutional layer, the pooling layer is also composed of multiple feature surfaces, and each feature surface corresponds to a feature surface of the previous layer, so the number of feature surfaces will not be changed. The role of the pooling layer is the secondary extraction of data features and the dimensionality reduction of the data. Commonly used pooling methods include maximum pooling and mean pooling.
  • Convolutional neural networks are divided into one-dimensional convolution, two-dimensional convolution and three-dimensional convolution, each of which has its own applicable scenarios.
  • Among them, the one-dimensional convolutional neural network is mainly used in time series data. Assuming that its first layer is a convolutional layer, the calculation of one-dimensional convolution is:
  • For the pooling layer, this paper adopts the maximum pooling method

[LSTM] network architecture

  • The internal structure of the hidden layer of traditional RNN is simple, and the network does not screen the input at the current moment and the input state at the previous moment, which may prevent the transmission of key data information in the long-term series, thereby causing a large deviation in the prediction results. The LSTM network adds an input gate, an output gate, and a forgetting gate to the hidden layer, and adds a unit for storing memory. The internal structure of a typical LSTM network hidden layer is shown in the figure. The functional relationship between the variables is:
  • It can be seen that when the input is input to the hidden layer of the LSTM network, it is firstly transformed by the input gate and then superimposed with the memory cell state processed by the forget gate to form a new memory cell state. After the linear function is processed, the output of the hidden layer can be obtained by dot-multiplying the current information state processed by the nonlinear function.
  • Similar to the structure of the LSTM network, the gated recurrent unit (GRU) reduces one gate unit and hidden state compared to the internal structure of the LSTM network, and replaces it with a reset gate and an update gate, and the internal structure is simpler. , but it is not as effective as the LSTM network for processing large amounts of data.

CNN-LSTM network

  • In this paper, historical data is used as the input of the prediction model, and the predicted value is used as the output.
  • So a 1-dimensional vector is input into the CNN-LSTM network. Similar network structure, after testing, the parameters of the CNN-LSTM network model constructed in this paper are:
    1) One-dimensional convolutional neural network: CNN network includes convolution layer and pooling layer, each layer has convolution kernel and pooling kernel , the size sets the dimension vector; the activation functions are all ReLu or eLu.
    2) Long short-term memory neural network: The LSTM network includes layer units, and the number of hidden neurons in each layer is set accordingly; the activation function can choose ReLu or eLu.
    3) Fully connected layer: a deep neural network with a single hidden layer is used as the output layer of the CNNLSTM network model to fit and predict the data, and the output result is the predicted value at time t.
  • The whole CNN-LSTM network wind power training prediction model is shown in the figure. As can be seen from Figure 4, the CNN-LSTM network model is mainly composed of two parts: first, the input data passes through the CNN network, and through convolution and pooling operations, the extraction and dimension reduction of data features are realized; the data processed by the CNN network Input to the LSTM network, the forgetting gate, input gate and output gate in the LSTM network adjust their own parameters through continuous iterative training of a large amount of data, so that it can learn the time fitting relationship between the data from the data information extracted by the CNN network, so as to adjust the time fitting relationship between the data. Predict the time series input and output data for effective dynamic modeling, and finally fit the trained data through the CNN-LSTM network to output the predicted value through the fully connected neural network. The entire prediction process needs to be trained through data to determine the parameters of the network model.
  • For the training of the wind power prediction model of the entire CNN-LSTM network, this paper adopts the Backward Error Propagation (BPTT) algorithm expanded by time, that is, the neural network is expanded into a deep network in time order, and then the error back propagation is used. , BP) algorithm to train the unrolled network. Traditional gradient optimization algorithms such as stochastic gradient descent (SGD) are simple and easy to implement, but have shortcomings such as gradient disappearance, slow convergence, and difficulty in converging to the global minimum. In view of the shortcomings of the SGD algorithm, many improved optimization algorithms have been proposed, such as momentum algorithm, AdaGrad, Adam and so on. This article uses Adam’s algorithm.
  • It is found by the simulation experiments in this paper that only increasing the number of CNN and LSTM network layers in the CNN-LSTM network model cannot effectively improve the prediction accuracy. As the number of neural network layers increases, the internal parameters increase sharply, while the predicted input data structure is relatively simple.
  • Therefore, after training with a large amount of data, the network model tends to simulate the characteristics of the training data and lack prediction, resulting in overfitting.
  • This paper uses the Dropout technique to reduce the overfitting phenomenon of the model. During the training process, the Dropout technique clears the input weights of each hidden node connected to it with a probability of 1-p, so that the weights connected between this part of the input and the neurons do not participate in the forward direction during the training process. With backpropagation, the network model has better anti-overfitting characteristics.

data download


  • Data set partitioning

%% CNN-LSTM time series prediction
%% Input parameters
% time lag order;
Lag = 1:8;
% training set ratio
ratio = 0.9;
% batch samples
MiniBatchSize =24;
% The maximum number of iterations
MaxEpochs = 60;
% learning rate
learningrate = 0.005;
%% Download Data
load data;
data = [data{:}];
%% split order in training and testing
% Split the data in train and test.
% 90% of the data is used for training and 10% is used for testing.
numStepsTraining = round(ratio*numel(data));
indexTrain = 1:numStepsTraining;
dataTrain = data(indexTrain );
indexTest = numStepsTraining+1:size(data,2);
dataTest = data(indexTest);

  • CNN-LSTM network architecture

% Create "CNN-LSTM" model
    layers = [...
        % input features
        sequenceInputLayer([numFeatures 1 1],'Name','input')
        % CNN feature extraction
        topLayer( 'Name' , 'top' )
        % expand layer
        % smooth layer
        % LSTM feature learning
        % LSTM output
        % fully connected layer
        regressionLayer('Name','output')    ];

    layers = layerGraph(layers);
    layers = connectLayers(layers,'fold/miniBatchSize','unfold/miniBatchSize');

forecast result


[1] https://www.bilibili.com/video/BV1pq4y1f7kZ?spm_id_from=333.999.0.0
[2] https://mianbaoduo.com/o/bread/mbd-YZ2Zm5xs
[3] /kjm13182345320/article/details/118858103


  • Your support is the driving force for my writing!
  • Thank you for subscribing, thank you, you need to add Q-[1153460737], remember to note!

Leave a Comment

Your email address will not be published. Required fields are marked *