

使用Tensorflow数据集(tfds)时是否可以检索类分发信息?如果我通过 with_info ,则会得到 tfds.core.DatasetInfo 信息对象,该对象具有有关数据集的一些信息(拆分,标签等).

Is it possible to retrieve class distribution information when using Tensorflow Datasets (tfds)? If I pass with_info I get the tfds.core.DatasetInfo info object which has some information about the dataset (splits, labels, etc.).


However, I'm curious if it's possible to determine the exact class wise distribution from what is contained within this object.


From what I've seen, it's not possible without first iterating over the dataset. Just wanted to double-check this hypothesis and inquire as to what the fastest/best way would be to extract this information from the dataset.


NOTE: For the purposes of this question I am only considering datasets for image classification.



I wasn't able to find a way to get the label distribution either. Here's an alternative:

import tensorflow_datasets as tfds
import numpy as np

ds = tfds.load('mnist', split='train', as_supervised=True)

vals = np.unique(np.fromiter(ds.map(lambda x, y: y), float), return_counts=True)

for val, count in zip(*vals):
    print(int(val), count)
0 5923
1 6742
2 5958
3 6131
4 5842
5 5421
6 5918
7 6265
8 5851
9 5949


09-27 15:16