This post is the reading notes about this paper. This paper proposes a new LSTM structure that can be interpreted with the help of mixture attention, which includes both variable importance and temporal importance.
The hidden state update function is constructed in a similar manner as the regular LSTM
The authors proposed two sets of approaches
The difference between this two approaches is that the equation set 1 first transfers the matrices to vectors, then restore back to matrices, however, for equation set 2, the authors extend the regular LSTM with tensor operations, and operate on matrices directly. The goal of both approaches are the same -- keep the variables independent during propagation.
The notations are defined by
The loss function is defined by
The above Lemma 3.3 ensures that during the EM algorithm, the above loss function upper-bounds the negative log-likelihood.
Therefore, minimizing Eq. (9) enables to simultaneously learn the network parameters and importance vectors without the need of post processing on trained networks.
where
And the temporal importance vector can also be derived
where
where $\mu_n$ is the element of $I$
[1]Guo, Tian, Tao Lin, and Nino Antulov-Fantulin. "Exploring Interpretable LSTM Neural Networks over MultiVariable Data." International Conference on Machine Learning (ICML), 2019.
Background
RNNs trained over multi-variable data capture nonlinear correlation of historical values of target and exogenous variables to the future target values. However, current RNNs fall short of interpretability for multi-variable data due to their opaque hidden states. Existing works aiming to enhance the interpretability of recurrent neural networks rarely touch the internal structure of RNNs to overcome the opacity of hidden states on multivariable data. This paper tries to achieve a uniļ¬ed framework of accurate forecasting and importance interpretation.Proposed Model
This model basically does two things- first explores the internal structure of LSTM to enable hidden states to encode individual variables,
- then, mixture attention is designed to summarize these variable-wise hidden states for predicting.
The IMV-LSTM Structure
The idea of IMV-LSTM is to make use of hidden state matrix and to develop associated update scheme such that each element (e.g. row) of the hidden matrix encapsulates information exclusively from a certain variable of the input.The hidden state update function is constructed in a similar manner as the regular LSTM
The authors proposed two sets of approaches
The difference between this two approaches is that the equation set 1 first transfers the matrices to vectors, then restore back to matrices, however, for equation set 2, the authors extend the regular LSTM with tensor operations, and operate on matrices directly. The goal of both approaches are the same -- keep the variables independent during propagation.
Mixture Attention
Mixture attention is used to enable interpretability of the IMV-LSTM model. the mixture attention is formulated asThe notations are defined by
The above Lemma 3.3 ensures that during the EM algorithm, the above loss function upper-bounds the negative log-likelihood.
Therefore, minimizing Eq. (9) enables to simultaneously learn the network parameters and importance vectors without the need of post processing on trained networks.
Interpretation
After training, a simple closed-form solution of the variable importance vector $I$ can be derivedAnd the temporal importance vector can also be derived
where
Prediction
in the predicting phase, the prediction of $y_{T+1} is obtained by the weighted sum of means as:[1]Guo, Tian, Tao Lin, and Nino Antulov-Fantulin. "Exploring Interpretable LSTM Neural Networks over MultiVariable Data." International Conference on Machine Learning (ICML), 2019.












Implementing advanced recurrent models through Deep Learning Projects for Final Year allows students to gain practical experience with LSTM architectures, attention mechanisms, and interpretable neural networks for solving real-world forecasting and predictive analytics problems.
ReplyDeleteSince the article primarily focuses on the internal design and enhancement of LSTM networks, exploring Deep Neural Network Projects is an excellent way to understand advanced neural architectures, sequence modeling, and scalable deep learning applications across multiple domains.
ReplyDeleteLSTM stands for Long Short-Term Memory, a type of recurrent neural network (RNN) designed to learn patterns in sequential data while overcoming the limitations of traditional RNNs. It uses specialized memory cells and gates to retain important information over long periods, making it highly effective for tasks such as language translation, speech recognition, text prediction, sentiment analysis, and time-series forecasting. Because of its ability to capture long-term dependencies, LSTM remains one of the most widely used deep learning models for sequence-based applications.
ReplyDeletePowered by Blogger is a footer message commonly displayed on websites created using the Blogger platform, indicating that the blog is hosted and managed through Google's Blogger service. Blogger provides users with a simple way to create, customize, and publish blogs without requiring advanced technical knowledge. Deep Learning Projects for Final YearIt offers free hosting, customizable templates, integration with Google services, and tools for managing posts, comments, and analytics, making it a popular choice for personal blogs, educational websites, and small business content.
ReplyDelete