将tensorflow 2.0 BatchDataset转换为numpy数组

本文介绍了将tensorflow 2.0 BatchDataset转换为numpy数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有此代码:

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)

print(train_dataset, type(train_dataset), test_dataset, type(test_dataset))

我想将这两个 BatchDataset 变量转换为 numpy数组，我可以轻松做到吗?我使用的是 TF 2.0 ，但是我刚刚发现了使用 TF 1.0

And I want to cast these two BatchDataset variables to numpy arrays, can I do it easily? I am using TF 2.0, but I just found code to cast tf.data with TF 1.0

推荐答案

在对数据集进行批处理之后，最后一批的形状可能与其余批次的形状不同.例如，如果您的数据集中总共有100个元素，并且批处理的大小为6，则最后一批的大小仅为4.(100 = 6 * 16 + 4).

After batching of dataset, the shape of last batch may not be same with that of rest of the batches. For example, if there are totally 100 elements in your dataset and you batch with size of 6, the last batch will have size of only 4. (100 = 6 * 16 + 4).

因此，在这种情况下，您将无法直接将数据集转换为numpy.因此，您必须使用 drop_remainder 参数设置为True.如果尺寸不正确，它将删除最后一批.

So, in such cases, you will not be able to transform your dataset into numpy straight forward. For that reason, you will have to use drop_remainder parameter to True in batch method. It will drop the last batch if it is not correctly sized.

在那之后，我已经附上了有关如何将数据集转换为Numpy的代码.

After that, I have enclosed the code on how to convert dataset to Numpy.

import tensorflow as tf
import numpy as np

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

TRAIN_BUF=1000
BATCH_SIZE=64

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).
                          shuffle(TRAIN_BUF).batch(BATCH_SIZE, drop_remainder=True)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).
                          shuffle(TRAIN_BUF).batch(BATCH_SIZE, drop_remainder=True)

# print(train_dataset, type(train_dataset), test_dataset, type(test_dataset))

train_np = np.stack(list(train_dataset))
test_np = np.stack(list(test_dataset))
print(type(train_np), train_np.shape)
print(type(test_np), test_np.shape)

输出:

<class 'numpy.ndarray'> (937, 64, 28, 28)
<class 'numpy.ndarray'> (156, 64, 28, 28)

这篇关于将tensorflow 2.0 BatchDataset转换为numpy数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！