如何在Keras中将具有不同输入大小的两个LSTM层组合在一起?

本文介绍了如何在Keras中将具有不同输入大小的两个LSTM层组合在一起?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两种类型的输入序列，其中input1包含50个值，而input2包含25个值.我尝试使用功能性API中的LSTM模型来组合这两种序列类型.但是，由于我的两个输入序列的长度不同，所以我想知道我当前正在执行的方法是否正确.我的代码如下:

I have two types of input sequences where input1 contains 50 values and input2 contains 25 values. I tried to combine these two sequence types using a LSTM model in functional API. However since the length of my two input sequences are different, I am wondering whether what I am currently doing is the right way. My code is as follows:

input1 = Input(shape=(50,1))
x1 = LSTM(100)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50)(input2)

x = concatenate([x1,x2])
x = Dense(200)(x)
output = Dense(1, activation='sigmoid')(x)

model = Model(inputs=[input1,input2], outputs=output)

更具体地说，我想知道如何组合两个具有不同输入长度(在我的情况下为50和25)的LSTM层.如果需要，我很乐意提供更多详细信息.

More specifically I want to know how to combine two LSTM layers that have different input lengths (i.e. 50 and 25 in my case). I am happy to provide more details if needed.

推荐答案

实际上，您的问题在像NLP这样的任务中是很正常的，因为您有不同的序列长度.在您的评论中，您使用return_sequences=False丢弃了所有先前的输出，这在我们的实践中并不常见，通常会导致性能低下.

Actually you problem is pretty normal in task like NLP where you have different length of sequence. In your comment you discard all of previous output by using return_sequences=False which is not common in our practice and it normally result in a low performance model.

注意:神经网络架构设计中没有最终解决方案

Note: There is no ultimate solution in neural network architecture design

这是我的建议.

方法1(无需自定义图层)

您可以在两个LSTM中使用相同的潜在维，并将它们堆叠为二维，然后将它们视为一个大的隐藏层张量.

You can use same latent dimension in both LSTM and stack them up in 2 dimension and treat them as one big hidden layer tensor.

input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(100, return_sequences=True)(input2)
x = concatenate([x1,x2], axis=1)

# output dimension = (None, 75, 100)

如果您不希望具有相同的潜在尺寸，其他人要做的就是再增加1个部分，我们通常将其称为映射层，该层由密集层的堆叠组成.这种方法具有更多的可变性，这意味着模型更难训练.

If you do not want to have same latent dimension, what others do is adding 1 more part which we normally call it a mapping layer which consisted of stacked of dense layer. This approach have more variable which means model is harder to train.

input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)

# normally we have more than 1 hidden layer
Map_x1 = Dense(75)(x1)
Map_x2 = Dense(75)(x2)
x = concatenate([Map_x1 ,Map_x2 ], axis=1)

# output dimension = (None, 75, 75)

或展平输出(两者)

input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)

# normally we have more than 1 hidden layer
flat_x1 = Flatten()(x1)
flat_x2 = Flatten()(x2)
x = concatenate([flat_x1 ,flat_x2 ], axis=1)

# output (None, 2650)

方法2(需要自定义图层)

创建您的自定义层，并使用生成注意力向量的注意力机制，并将该注意力向量用作LSTM输出张量的表示.其他人所做的并获得更好的性能的方法是使用LSTM的最后一个隐藏状态(仅在模型中使用)，并使用注意力向量作为表示.

create your custom layer and use attention mechanism that produce a attention vector and use that attention vector as a representation of your LSTM output tensor. What others do and achieve better performance is to use last hidden state of LSTM (that you only use in your model) with attention vector as a representation.

注意:根据研究，不同类型的注意力几乎可以提供相同的性能，因此我建议使用按比例缩放的点乘产品注意力"，因为它的计算速度更快.

Note: According to research, different types of attention gives almost the same performance, so I recommend "Scaled Dot-Product Attention" because it is faster to compute.

input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)

rep_x1 = custom_layer()(x1)
rep_x2 = custom_layer()(x2)
x = concatenate([rep_x1 ,rep_x2], axis=1)

# output (None, (length rep_x1+length rep_x2))

这篇关于如何在Keras中将具有不同输入大小的两个LSTM层组合在一起?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！