使用基于ConvLSTM2D的Keras模型从较低的图像估计高分辨率图像

本文介绍了使用基于ConvLSTM2D的Keras模型从较低的图像估计高分辨率图像的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用以下ConvLSTM2D架构从低分辨率的图像序列中估计高分辨率的图像序列:

 import numpy as np, scipy.ndimage, matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, ConvLSTM2D, MaxPooling2D, UpSampling2D
from sklearn.metrics import accuracy_score, confusion_matrix, cohen_kappa_score
from sklearn.preprocessing import MinMaxScaler, StandardScaler
np.random.seed(123)

raw = np.arange(96).reshape(8,3,4)
data1 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=1, mode='nearest') #low res
print (data1.shape)
#(8, 300, 400)

data2 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=3, mode='nearest') #high res
print (data2.shape)
#(8, 300, 400)

X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1)
#(samples,time, rows, cols, channels)

model = Sequential()
input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1)
#samples, time, rows, cols, channels
model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

print (model.summary())

model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(X_train, Y_train,
          batch_size=1, epochs=10, verbose=1)

x,y = model.evaluate(X_train, Y_train, verbose=0)
print (x,y)

此声明将导致以下Value错误:

如何纠正此ValueError?我认为问题在于输入形状，但无法弄清楚到底是什么错误.
请注意，输出也应该是图像序列，而不是分类结果.

解决方案

之所以会发生这种情况，是因为LSTMs需要时态数据，但是您的第一个声明为many-to-one模型，该模型输出形状为(batch_size, 300, 400, 16)的张量.也就是说，成批图像:

 model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

您希望输出为形状为(batch_size, 8, 300, 400, 16)的张量(即图像序列)，以便第二个LSTM可以使用它们.解决此问题的方法是在第一个LSTM定义中添加return_sequences:

 model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

您提到了分类.如果您要缩进的是对整个序列进行分类，那么最后需要分类器:

 model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))
model.add(GlobalAveragePooling2D())
model.add(Dense(10, activation='softmax'))  # output shape: (None, 10)

但是，如果您要尝试在序列中内内对每个图像进行分类，则只需使用TimeDistributed

重新应用分类器即可.

 x = Input(shape=(300, 400, 8))
y = GlobalAveragePooling2D()(x)
y = Dense(10, activation='softmax')(y)
classifier = Model(inputs=x, outputs=y)

x = Input(shape=(data1.shape[0], data1.shape[1], data1.shape[2], 1))
y = ConvLSTM2D(16, kernel_size=(3, 3),
               activation='sigmoid',
               padding='same',
               return_sequences=True)(x)
y = ConvLSTM2D(8, kernel_size=(3, 3),
               activation='sigmoid',
               padding='same',
               return_sequences=True)(y)
y = TimeDistributed(classifier)(y)  # output shape: (None, 8, 10)

model = Model(inputs=x, outputs=y)

最后，看看keras存储库中的示例. 使用ConvLSTM2D的生成模型有一个.. /p>

从data1中估算data2 ...

如果这次我做对了，则X_train应该是8张(300，400，1)图像堆栈中的1个样本，而不是1张形状(300，400，1)图像堆栈中的8个样本.
如果是这样，那么:

 X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1)

应更新为:

 X_train = data1.reshape(1, data1.shape[0], data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(1, data2.shape[0], data2.shape[1], data2.shape[2], 1)

此外，当损失很大时，accuracy通常没有任何意义.您可以使用其他指标，例如mae.

现在，您只需要更新模型即可返回序列，并在最后一层中具有一个单位(因为您要估算的图像只有一个通道):

 model = Sequential()
input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1)
model.add(ConvLSTM2D(16, kernel_size=(3, 3), activation='sigmoid', padding='same',
                     input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(1, kernel_size=(3, 3), activation='sigmoid', padding='same',
                     return_sequences=True))

model.compile(loss='mse', optimizer='adam')

然后，model.fit(X_train, Y_train, ...)将开始正常训练:

 Using TensorFlow backend.
(8, 300, 400)
(8, 300, 400)
Epoch 1/10

1/1 [==============================] - 5s 5s/step - loss: 2993.8701
Epoch 2/10

1/1 [==============================] - 5s 5s/step - loss: 2992.4492
Epoch 3/10

1/1 [==============================] - 5s 5s/step - loss: 2991.4536
Epoch 4/10

1/1 [==============================] - 5s 5s/step - loss: 2989.8523

I'm trying to use the following ConvLSTM2D architecture to estimate high resolution image sequences from low resolution ones:

import numpy as np, scipy.ndimage, matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, ConvLSTM2D, MaxPooling2D, UpSampling2D
from sklearn.metrics import accuracy_score, confusion_matrix, cohen_kappa_score
from sklearn.preprocessing import MinMaxScaler, StandardScaler
np.random.seed(123)

raw = np.arange(96).reshape(8,3,4)
data1 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=1, mode='nearest') #low res
print (data1.shape)
#(8, 300, 400)

data2 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=3, mode='nearest') #high res
print (data2.shape)
#(8, 300, 400)

X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1)
#(samples,time, rows, cols, channels)

model = Sequential()
input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1)
#samples, time, rows, cols, channels
model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

print (model.summary())

model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(X_train, Y_train,
          batch_size=1, epochs=10, verbose=1)

x,y = model.evaluate(X_train, Y_train, verbose=0)
print (x,y)

This declaration will result in the following Value error:

How can I correct this ValueError? I think problem is with input shapes, but could not figure out what exactly is wrong.
Notice that the output should be sequences of images too, instead of a classification result.

解决方案

This is happening because LSTMs require temporal data, but your first one was declared as a many-to-one model, which outputs a tensor of shape (batch_size, 300, 400, 16). That is, batches of images:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

You want the output to be a tensor of shape (batch_size, 8, 300, 400, 16) (i.e. sequences of images), so they can be consumed by the second LSTM. The way to fix this is to add return_sequences in the first LSTM definition:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

You mentioned classification. If what you indent is to classify entire sequences, then you need a classifier at the end:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))
model.add(GlobalAveragePooling2D())
model.add(Dense(10, activation='softmax'))  # output shape: (None, 10)

But if you are trying to classify each image within the sequences, then you can simply reapply the classifier using TimeDistributed:

x = Input(shape=(300, 400, 8))
y = GlobalAveragePooling2D()(x)
y = Dense(10, activation='softmax')(y)
classifier = Model(inputs=x, outputs=y)

x = Input(shape=(data1.shape[0], data1.shape[1], data1.shape[2], 1))
y = ConvLSTM2D(16, kernel_size=(3, 3),
               activation='sigmoid',
               padding='same',
               return_sequences=True)(x)
y = ConvLSTM2D(8, kernel_size=(3, 3),
               activation='sigmoid',
               padding='same',
               return_sequences=True)(y)
y = TimeDistributed(classifier)(y)  # output shape: (None, 8, 10)

model = Model(inputs=x, outputs=y)

Finally, take a look at the examples in keras repository. There's one for a generative model using ConvLSTM2D.

Edit: to estimate data2 from data1...

If I got it right this time, X_train should be 1 sample of a stack of 8 (300, 400, 1) images, not 8 samples of a stack of 1 image of shape (300, 400, 1).
If that's true, then:

X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1)

Should be updated to:

X_train = data1.reshape(1, data1.shape[0], data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(1, data2.shape[0], data2.shape[1], data2.shape[2], 1)

Also, accuracy doesn't usually make sense when your loss is mse. You can use other metrics such as mae.

Now you just need to update your model to return sequences and to have a single unit in the last layer (because the images you are trying to estimate have a single channel):

model = Sequential()
input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1)
model.add(ConvLSTM2D(16, kernel_size=(3, 3), activation='sigmoid', padding='same',
                     input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(1, kernel_size=(3, 3), activation='sigmoid', padding='same',
                     return_sequences=True))

model.compile(loss='mse', optimizer='adam')

After that, model.fit(X_train, Y_train, ...) will start training normally:

Using TensorFlow backend.
(8, 300, 400)
(8, 300, 400)
Epoch 1/10

1/1 [==============================] - 5s 5s/step - loss: 2993.8701
Epoch 2/10

1/1 [==============================] - 5s 5s/step - loss: 2992.4492
Epoch 3/10

1/1 [==============================] - 5s 5s/step - loss: 2991.4536
Epoch 4/10

1/1 [==============================] - 5s 5s/step - loss: 2989.8523

这篇关于使用基于ConvLSTM2D的Keras模型从较低的图像估计高分辨率图像的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！