如何使用 keras-self-attention 包可视化注意力 LSTM?

本文介绍了如何使用 keras-self-attention 包可视化注意力 LSTM?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 (keras-self-attention) 在凯拉斯.训练模型后如何可视化注意力部分?这是一个时间序列预测案例.

I'm using (keras-self-attention) to implement attention LSTM in KERAS. How can I visualize the attention part after training the model? This is a time series forecasting case.

from keras.models import Sequential
from keras_self_attention import SeqWeightedAttention
from keras.layers import LSTM, Dense, Flatten

model = Sequential()
model.add(LSTM(activation = 'tanh' ,units = 200, return_sequences = True,
               input_shape = (TrainD[0].shape[1], TrainD[0].shape[2])))
model.add(SeqSelfAttention())
model.add(Flatten())
model.add(Dense(1, activation = 'relu'))

model.compile(optimizer = 'adam', loss = 'mse')

推荐答案

一种方法是获取给定输入的 SeqSelfAttention 输出，并组织它们以显示预测 per-频道(见下文).如需更高级的内容，请查看 iNNvestigate 库(包括使用示例).

One approach is to fetch the outputs of SeqSelfAttention for a given input, and organize them so to display predictions per-channel (see below). For something more advanced, have a look at the iNNvestigate library (usage examples included).

更新:我也可以推荐见RNN，一个包我写的.

Update: I can also recommend See RNN, a package I wrote.

说明:show_features_1D 获取 layer_name(可以是子字符串)层输出并显示每个通道的预测(标记)，时间步长沿 x 轴，输出值沿 y 轴.

Explanation:show_features_1D fetches layer_name (can be a substring) layer outputs and shows predictions per-channel (labeled), with timesteps along x-axis and output values along y-axis.

input_data = 单批形状数据(1, input_shape)
prefetched_outputs = 已经获得的层输出；覆盖 input_data
max_timesteps = 要显示的最大时间步数
max_col_subplots = 沿水平方向的最大子图数量
equate_axes = 强制所有 x 轴和 y 轴相等(推荐用于公平比较)
show_y_zero = 是否将 y=0 显示为红线
channel_axis = 层特征维度(例如 LSTM 的 units，这是最后一个)
scale_width, scale_height = 缩放显示的图像宽度 &高度
dpi = 图像质量(每英寸点数)

input_data = single batch of data of shape (1, input_shape)
prefetched_outputs = already-acquired layer outputs; overrides input_data
max_timesteps = max # of timesteps to show
max_col_subplots = max # of subplots along horizontal
equate_axes = force all x- and y- axes to be equal (recommended for fair comparison)
show_y_zero = whether to show y=0 as a red line
channel_axis = layer features dimension (e.g. units for LSTM, which is last)
scale_width, scale_height = scale displayed image width & height
dpi = image quality (dots per inches)

视觉效果(下)说明:

First 对于查看提取特征的形状很有用，无论大小如何 - 提供有关例如的信息频率内容
Second 有助于查看特征关系 - 例如相对幅度、偏差和频率.下面的结果与上面的图像形成鲜明对比，因为运行 print(outs_1) 表明所有的幅度都非常小并且变化不大，因此包括 y=0 点和等轴产生一种类似线条的视觉效果，可以解释为自我注意是偏向的.
Third 可用于可视化太多无法像上面那样可视化的特征；使用 batch_shape 而不是 input_shape 定义模型会删除打印形状中的所有 ?，我们可以看到第一个输出的形状是 (10,60, 240)，秒的(10, 240, 240).换句话说，第一个输出返回 LSTM 通道注意力，第二个输出返回时间步注意力".下面的热图结果可以解释为显示注意力冷却"w.r.t.时间步长.

First is useful to see the shapes of extracted features, regardless of magnitude - giving information about e.g. frequency contents
Second is useful to see feature relationships - e.g. relative magnitudes, biases, and frequencies. Below result stands in stark contrast with image above it, as, running print(outs_1) reveals that all magnitudes are very small and don't vary much, so including the y=0 point and equating axes yields a line-like visual, which can be interpreted as self-attention being bias-oriented.
Third is useful for visualizing features too many to be visualized as above; defining model with batch_shape instead of input_shape removes all ? in printed shapes, and we can see that first output's shape is (10, 60, 240), second's (10, 240, 240). In other words, the first output returns LSTM channel attention, and the second a "timesteps attention". The heatmap result below can be interpreted as showing attention "cooling down" w.r.t. timesteps.

SeqWeightedAttention 更容易可视化，但没有太多可视化；你需要去掉上面的 Flatten 才能让它工作.注意力的输出形状然后变成 (10, 60) 和 (10, 240) - 你可以使用一个简单的直方图，plt.hist>(只需确保排除批次维度 - 即提要 (60,) 或 (240,)).

SeqWeightedAttention is a lot easier to visualize, but there isn't much to visualize; you'll need to rid of Flatten above to make it work. The attention's output shapes then become (10, 60) and (10, 240) - for which you can use a simple histogram, plt.hist (just make sure you exclude the batch dimension - i.e. feed (60,) or (240,)).

from keras.layers import Input, Dense, LSTM, Flatten, concatenate
from keras.models import Model
from keras.optimizers import Adam
from keras_self_attention import SeqSelfAttention
import numpy as np

ipt   = Input(shape=(240,4))
x     = LSTM(60, activation='tanh', return_sequences=True)(ipt)
x     = SeqSelfAttention(return_attention=True)(x)
x     = concatenate(x)
x     = Flatten()(x)
out   = Dense(1, activation='sigmoid')(x)
model = Model(ipt,out)
model.compile(Adam(lr=1e-2), loss='binary_crossentropy')

X = np.random.rand(10,240,4) # dummy data
Y = np.random.randint(0,2,(10,1)) # dummy labels
model.train_on_batch(X, Y)

outs = get_layer_outputs(model, 'seq', X[0:1], 1)
outs_1 = outs[0]
outs_2 = outs[1]

show_features_1D(model,'lstm',X[0:1],max_timesteps=100,equate_axes=False,show_y_zero=False)
show_features_1D(model,'lstm',X[0:1],max_timesteps=100,equate_axes=True, show_y_zero=True)
show_features_2D(outs_2[0])  # [0] for 2D since 'outs_2' is 3D

def show_features_1D(model=None, layer_name=None, input_data=None,
                     prefetched_outputs=None, max_timesteps=100,
                     max_col_subplots=10, equate_axes=False,
                     show_y_zero=True, channel_axis=-1,
                     scale_width=1, scale_height=1, dpi=76):
    if prefetched_outputs is None:
        layer_outputs = get_layer_outputs(model, layer_name, input_data, 1)[0]
    else:
        layer_outputs = prefetched_outputs
    n_features    = layer_outputs.shape[channel_axis]

    for _int in range(1, max_col_subplots+1):
      if (n_features/_int).is_integer():
        n_cols = int(n_features/_int)
    n_rows = int(n_features/n_cols)

    fig, axes = plt.subplots(n_rows,n_cols,sharey=equate_axes,dpi=dpi)
    fig.set_size_inches(24*scale_width,16*scale_height)

    subplot_idx = 0
    for row_idx in range(axes.shape[0]):
      for col_idx in range(axes.shape[1]):
        subplot_idx += 1
        feature_output = layer_outputs[:,subplot_idx-1]
        feature_output = feature_output[:max_timesteps]
        ax = axes[row_idx,col_idx]

        if show_y_zero:
            ax.axhline(0,color='red')
        ax.plot(feature_output)

        ax.axis(xmin=0,xmax=len(feature_output))
        ax.axis('off')

        ax.annotate(str(subplot_idx),xy=(0,.99),xycoords='axes fraction',
                    weight='bold',fontsize=14,color='g')
    if equate_axes:
        y_new = []
        for row_axis in axes:
            y_new += [np.max(np.abs([col_axis.get_ylim() for
                                     col_axis in row_axis]))]
        y_new = np.max(y_new)
        for row_axis in axes:
            [col_axis.set_ylim(-y_new,y_new) for col_axis in row_axis]
    plt.show()

def show_features_2D(data, cmap='bwr', norm=None,
                     scale_width=1, scale_height=1):
    if norm is not None:
        vmin, vmax = norm
    else:
        vmin, vmax = None, None  # scale automatically per min-max of 'data'

    plt.imshow(data, cmap=cmap, vmin=vmin, vmax=vmax)
    plt.xlabel('Timesteps', weight='bold', fontsize=14)
    plt.ylabel('Attention features', weight='bold', fontsize=14)
    plt.colorbar(fraction=0.046, pad=0.04)  # works for any size plot

    plt.gcf().set_size_inches(8*scale_width, 8*scale_height)
    plt.show()

def get_layer_outputs(model, layer_name, input_data, learning_phase=1):
    outputs   = [layer.output for layer in model.layers if layer_name in layer.name]
    layers_fn = K.function([model.input, K.learning_phase()], outputs)
    return layers_fn([input_data, learning_phase])

SeqWeightedAttention 示例每个请求:

ipt   = Input(batch_shape=(10,240,4))
x     = LSTM(60, activation='tanh', return_sequences=True)(ipt)
x     = SeqWeightedAttention(return_attention=True)(x)
x     = concatenate(x)
out   = Dense(1, activation='sigmoid')(x)
model = Model(ipt,out)
model.compile(Adam(lr=1e-2), loss='binary_crossentropy')

X = np.random.rand(10,240,4) # dummy data
Y = np.random.randint(0,2,(10,1)) # dummy labels
model.train_on_batch(X, Y)

outs = get_layer_outputs(model, 'seq', X, 1)
outs_1 = outs[0][0] # additional index since using batch_shape
outs_2 = outs[1][0]

plt.hist(outs_1, bins=500); plt.show()
plt.hist(outs_2, bins=500); plt.show()

这篇关于如何使用 keras-self-attention 包可视化注意力 LSTM?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

second