Exploring Interpretable LSTM Neural Networks over MultiVariable Data

This post is the reading notes about this paper. This paper proposes a new LSTM structure that can be interpreted with the help of mixture attention, which includes both variable importance and temporal importance.

Background

RNNs trained over multi-variable data capture nonlinear correlation of historical values of target and exogenous variables to the future target values. However, current RNNs fall short of interpretability for multi-variable data due to their opaque hidden states. Existing works aiming to enhance the interpretability of recurrent neural networks rarely touch the internal structure of RNNs to overcome the opacity of hidden states on multivariable data. This paper tries to achieve a uniﬁed framework of accurate forecasting and importance interpretation.

Proposed Model

This model basically does two things

first explores the internal structure of LSTM to enable hidden states to encode individual variables,
then, mixture attention is designed to summarize these variable-wise hidden states for predicting.

Lets first define some mathematical symbols

The IMV-LSTM Structure

The idea of IMV-LSTM is to make use of hidden state matrix and to develop associated update scheme such that each element (e.g. row) of the hidden matrix encapsulates information exclusively from a certain variable of the input.

The hidden state update function is constructed in a similar manner as the regular LSTM

The authors proposed two sets of approaches

The difference between this two approaches is that the equation set 1 first transfers the matrices to vectors, then restore back to matrices, however, for equation set 2, the authors extend the regular LSTM with tensor operations, and operate on matrices directly. The goal of both approaches are the same -- keep the variables independent during propagation.

Mixture Attention

Mixture attention is used to enable interpretability of the IMV-LSTM model. the mixture attention is formulated as

The notations are defined by

The loss function is defined by

The above Lemma 3.3 ensures that during the EM algorithm, the above loss function upper-bounds the negative log-likelihood.

Therefore, minimizing Eq. (9) enables to simultaneously learn the network parameters and importance vectors without the need of post processing on trained networks.

Interpretation

After training, a simple closed-form solution of the variable importance vector $I$ can be derived

where

And the temporal importance vector can also be derived

where

Prediction

in the predicting phase, the prediction of $y_{T+1} is obtained by the weighted sum of means as:

where $\mu_n$ is the element of $I$

[1]Guo, Tian, Tao Lin, and Nino Antulov-Fantulin. "Exploring Interpretable LSTM Neural Networks over MultiVariable Data." International Conference on Machine Learning (ICML), 2019.

Comments

SankarJune 28, 2026 at 5:28 PM
Implementing advanced recurrent models through Deep Learning Projects for Final Year allows students to gain practical experience with LSTM architectures, attention mechanisms, and interpretable neural networks for solving real-world forecasting and predictive analytics problems.
SankarJune 28, 2026 at 5:28 PM
Since the article primarily focuses on the internal design and enhancement of LSTM networks, exploring Deep Neural Network Projects is an excellent way to understand advanced neural architectures, sequence modeling, and scalable deep learning applications across multiple domains.
Athene pythonJune 29, 2026 at 12:21 AM
LSTM stands for Long Short-Term Memory, a type of recurrent neural network (RNN) designed to learn patterns in sequential data while overcoming the limitations of traditional RNNs. It uses specialized memory cells and gates to retain important information over long periods, making it highly effective for tasks such as language translation, speech recognition, text prediction, sentiment analysis, and time-series forecasting. Because of its ability to capture long-term dependencies, LSTM remains one of the most widely used deep learning models for sequence-based applications.

Athene pythonJune 29, 2026 at 12:21 AM
Powered by Blogger is a footer message commonly displayed on websites created using the Blogger platform, indicating that the blog is hosted and managed through Google's Blogger service. Blogger provides users with a simple way to create, customize, and publish blogs without requiring advanced technical knowledge. Deep Learning Projects for Final YearIt offers free hosting, customizable templates, integration with Google services, and tools for managing posts, comments, and analytics, making it a popular choice for personal blogs, educational websites, and small business content.

PandaCid's Blog

Search This Blog