本文介绍了改善由mnist数据集训练的神经网络的真实结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用mnist数据集使用keras构建了一个神经网络,现在我正尝试在实际手写数字的照片上使用它.当然,我并不期望结果会是完美的,但是我目前得到的结果仍有很大的改进空间.

I've built a neural network with keras using the mnist dataset and now I'm trying to use it on photos of actual handwritten digits. Of course I don't expect the results to be perfect but the results I currently get have a lot of room for improvement.

对于初学者来说,我会用我最清楚的笔迹写的一些个人数字照片进行测试.它们是正方形的,并且具有与mnist数据集中的图像相同的尺寸和颜色.它们被保存在名为 individual_test 的文件夹中,例如: 7(2)_digit.jpg .

For starters I test it with some photos of individual digits written in my clearest handwriting. They are square and they have the same dimensions and color as the images in the mnist dataset. They are saved in a folder called individual_test like this for example: 7(2)_digit.jpg.

网络经常非常确定错误的结果,下面我举一个例子:

The network often is terribly sure of the wrong result which I'll give you an example for:

我得到的这张图片的结果如下:

The results I get for this picture are the following:

result:  3 . probabilities:  [1.9963557196245318e-10, 7.241294497362105e-07, 0.02658148668706417, 0.9726449251174927, 2.5416460047722467e-08, 2.6078915027483163e-08, 0.00019745019380934536, 4.8302300825753264e-08, 0.0005754049634560943, 2.8358477788259506e-09]

因此,网络有97%的人确定这是3,而这并不是唯一的情况.在38张照片中,只有16张被正确识别.令我震惊的是,尽管网络离正确的结果再远了,但它对结果的把握如此确定.

So the network is 97% sure this is a 3 and this picture is by far not the only case. Out of 38 pictures only 16 were correctly recognised. What shocks me is the fact that the network is so sure of its result although it couldn't be farther from the correct result.

编辑
将阈值添加到 prepare_image (img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1])后,性能有所提高.现在,它可以在38张照片中获得19张正确的图像,但是对于某些图像(包括上面显示的图像)来说,仍然可以确定错误的结果.这就是我现在得到的:

EDIT
After adding a threshold to prepare_image (img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]) the performance has slightly improved. It now gets 19 out of 38 pictures right but for some images including the one shown above it still is pretty sure of the wrong result. This is what I get now:

result:  3 . probabilities:  [1.0909866760000497e-11, 1.1584616004256532e-06, 0.27739930152893066, 0.7221096158027649, 1.900260038212309e-08, 6.555900711191498e-08, 4.479645940591581e-05, 6.455550760620099e-07, 0.0004443934594746679, 1.0013242457418414e-09]

因此,现在只能确定其结果的72%,这是更好的结果,但仍然...

So it now is only 72% sure of its result which is better but still ...



我该怎么做才能提高效果?我可以更好地准备图像吗?还是应该将自己的图像添加到训练数据中?如果是这样,我该怎么做?



What can I do to improve the performance? Can I prepare my images better? Or should I add my own images to the training data? And if so, how would I do such a thing?

编辑

这是在上面应用 prepare_image 后上面显示的图片的样子:

使用阈值后,这是相同的图片:

相比之下:这是mnist数据集提供的图片之一:

他们看起来和我非常相似.我该如何改善呢?
这是我的代码(包括阈值):

This is what the picture displayed above looks like after applying prepare_image to it:

After using threshold this is what the same picture looks like:

In comparison: This is one of the pictures provided by the mnist dataset:

They look fairly similar to me. How can I improve this?
Here's my code (including threshold):

# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np

# imports for pictures
import matplotlib.pyplot as plt
import PIL
import cv2

# imports for tests
import random
import os

class mnist_network():
    def __init__(self):
        """ load data, create and train model """
        # load data
        (X_train, y_train), (X_test, y_test) = mnist.load_data()
        # flatten 28*28 images to a 784 vector for each image
        num_pixels = X_train.shape[1] * X_train.shape[2]
        X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
        X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
        # normalize inputs from 0-255 to 0-1
        X_train = X_train / 255
        X_test = X_test / 255
        # one hot encode outputs
        y_train = np_utils.to_categorical(y_train)
        y_test = np_utils.to_categorical(y_test)
        num_classes = y_test.shape[1]


        # create model
        self.model = Sequential()
        self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
        self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
        # Compile model
        self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

        # train the model
        self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

        self.train_img = X_train
        self.train_res = y_train
        self.test_img = X_test
        self.test_res = y_test


    def predict_result(self, img, show = False):
        """ predicts the number in a picture (vector) """
        assert type(img) == np.ndarray and img.shape == (784,)

        if show:
            img = img.reshape((28, 28))
            # show the picture
            plt.imshow(img, cmap='Greys')
            plt.show()
            img = img.reshape(img.shape[0] * img.shape[1])

        num_pixels = img.shape[0]
        # the actual number
        res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
        # the probabilities
        res_probabilities = self.model.predict(img.reshape(-1,num_pixels))

        return (res_number[0], res_probabilities.tolist()[0])    # we only need the first element since they only have one


    def prepare_image(self, img, show = False):
        """ prepares the partial images used in partial_img_rec by transforming them
            into numpy arrays that the network will be able to process """
        # convert to greyscale
        img = img.convert("L")
        # rescale image to 28 *28 dimension
        img = img.resize((28,28), PIL.Image.ANTIALIAS)
        # inverse colors since the training images have a black background
        #img =  PIL.ImageOps.invert(img)
        # transform to vector
        img = np.asarray(img, "float32")
        img = img / 255.
        img[img < 0.5] = 0.

        img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]

        if show:
            plt.imshow(img, cmap = "Greys")

        # flatten image to 28*28 = 784 vector
        num_pixels = img.shape[0] * img.shape[1]
        img = img.reshape(num_pixels)

        return img


    def partial_img_rec(self, image, upper_left, lower_right, results=[], show = False):
        """ partial is a part of an image """
        left_x, left_y = upper_left
        right_x, right_y = lower_right

        print("current test part: ", upper_left, lower_right)
        print("results: ", results)
        # condition to stop recursion: we've reached the full width of the picture
        width, height = image.size
        if right_x > width:
            return results

        partial = image.crop((left_x, left_y, right_x, right_y))
        if show:
            partial.show()
        partial = self.prepare_image(partial)

        step = height // 10

        # is there a number in this part of the image?
        res, prop = self.predict_result(partial)
        print("result: ", res, ". probabilities: ", prop)
        # only count this result if the network is at least 50% sure
        if prop[res] >= 0.5:
            results.append(res)
            # step is 80% of the partial image's size (which is equivalent to the original image's height)
            step = int(height * 0.8)
            print("found valid result")
        else:
            # if there is no number found we take smaller steps
            step = height // 20
        print("step: ", step)
        # recursive call with modified positions ( move on step variables )
        return self.partial_img_rec(image, (left_x + step, left_y), (right_x + step, right_y), results = results)

    def individual_digits(self, img):
        """ uses partial_img_rec to predict individual digits in square images """
        assert type(img) == PIL.JpegImagePlugin.JpegImageFile or type(img) == PIL.PngImagePlugin.PngImageFile or type(img) == PIL.Image.Image

        return self.partial_img_rec(img, (0,0), (img.size[0], img.size[1]), results=[])

    def test_individual_digits(self):
        """ test partial_img_rec with some individual digits (shape: square)
            saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
        cnt_right, cnt_wrong = 0,0
        folder_content = os.listdir(".\individual_test")

        for imageName in folder_content:
            # image file must be a jpg or png
            assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
            correct_res = int(imageName[0])
            image = PIL.Image.open(".\\individual_test\\" + imageName).convert("L")
            # only square images in this test
            if image.size[0]  != image.size[1]:
                print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
                continue
            predicted_res = self.individual_digits(image)

            if predicted_res == []:
                print("No prediction possible for ", imageName)
            else:
                predicted_res = predicted_res[0]

            if predicted_res != correct_res:
                print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
                cnt_wrong += 1
            else:
                cnt_right += 1
                print("correctly predicted ",imageName)
        print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")

    def multiple_digits(self, img):
        """ takes as input an image without unnecessary whitespace surrounding the digits """

        #assert type(img) == myImage
        width, height = img.size
        # start with the first square part of the image
        res_list = self.partial_img_rec(img, (0,0),(height ,height), results = [])
        res_str = ""
        for elem in res_list:
            res_str += str(elem)
        return res_str

    def test_multiple_digits(self):
        """ tests the function 'multiple_digits' using some images saved in the folder 'multi_test'.
            These images contain multiple handwritten digits without much whitespac surrounding them.
            The correct solutions are saved in the files' names followed by the characte '_'. """

        cnt_right, cnt_wrong = 0,0
        folder_content = os.listdir(".\multi_test")
        for imageName in folder_content:
            # image file must be a jpg or png
            assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
            image = PIL.Image.open(".\\multi_test\\" + imageName).convert("L")

            correct_res = imageName.split("_")[0]
            predicted_res = self.multiple_digits(image)
            if correct_res == predicted_res:
                cnt_right += 1
            else:
                cnt_wrong += 1
                print("Error in multiple_digits! The network predicted ", predicted_res, " but the correct result would have been ", correct_res)

        print("The network predicted correctly ", cnt_right, " out of ", cnt_right + cnt_wrong, " pictures. That's a success rate of ", cnt_right / (cnt_right + cnt_wrong) * 100, "%.")

network = mnist_network()
# this is the image shown above
result = network.individual_digits(PIL.Image.open(".\individual_test\\7(2)_digit.jpg"))

推荐答案

更新:

您可以通过以下三种方法在此特定任务中获得更好的性能:

Update:

You have three options to achive a better performance in this particular task:

  1. 使用卷积网络,因为它在处理具有空间数据(如图像)的任务时表现更好,并且像这样的生成器更具生成性.
  2. 使用或创建和/或生成更多类型的图片,并训练您的网络,使其与您的网络也能够一起学习.
  3. 预处理,以使您的图像与之前训练网络的原始MNIST图像更好地对齐.
  1. Use Convolutional network as it performs better in tasks with spatial data, like images and are more generative classifier, like this one.
  2. Use or Create and/or generate more pictures of your types and train your network with them your network to be able to learn them too.
  3. Preprocess your images to be better aligned to the original MNIST images, against which you trained your network before.

我刚刚做了一个实验.我检查了每个代表一个数字的MNIST图像.我拍摄了您的图像,并进行了一些较早前向您建议的预处理,例如:

I've just made an experiment. I checked the MNIST images regarding one represented number each. I took your images and made some preprocessing I proposed to you earlier like:

1.设置了一些阈值,但是向下消除了背景噪声,因为原始MNIST数据仅对空白背景具有一些最小阈值:

1. made some threshold, but just downwards eliminating the background noice because the original MNIST data has some minimal threshold only for the blank background:

image[image < 0.1] = 0.

2.令人惊讶的是,图像内部数字的大小被证明是至关重要的,因此我按比例缩放了28 x 28图像内部的数字.我们在该数字周围还有更多填充.

2. Surprisingly the size of the number inside of the image has proved to be crucial, so I scaled the number inside of the 28 x 28 image e.g. we have more padding around the number.

3..由于来自keras的MNIST数据也反转了,所以我反转了图像.

3. I inverted the images as the MNIST data from keras has inverted also.

image = ImageOps.invert(image)

4..最后,像我们在培训中一样,对数据进行了缩放:

4. Finally scaled data with, as we did it at the training as well:

image = image / 255.

预处理后,我使用MNIST数据集训练了模型,参数为epochs=12, batch_size=200和结果:

After the preprocessing I trained the model with MNIST dataset with the parameters epochs=12, batch_size=200 and the results:

结果: 1 ,概率: 0.6844741106033325

 result:  **1** . probabilities:  [2.0584749904628552e-07, 0.9875971674919128, 5.821426839247579e-06, 4.979299319529673e-07, 0.012240586802363396, 1.1566483948399764e-07, 2.382085284580171e-08, 0.00013023221981711686, 9.620113416985987e-08, 2.5273093342548236e-05]

结果: 6 ,概率: 0.9221984148025513

result:  6 . probabilities:  [9.130864782491699e-05, 1.8290626258021803e-07, 0.00020504613348748535, 2.1564576968557958e-07, 0.0002401985548203811, 0.04510130733251572, 0.9221984148025513, 1.9014490248991933e-07, 0.03216308355331421, 3.323434683011328e-08]

结果: 7 ,概率: 0.7105212807655334 注意:

Result: 7 with probabilities: 0.7105212807655334Note:

result:  7 . probabilities:  [1.0372193770535887e-08, 7.988557626958936e-06, 0.00031014863634482026, 0.0056108818389475346, 2.434678014751057e-09, 3.2280522077599016e-07, 1.4190952857262573e-09, 0.9940618872642517, 1.612859932720312e-06, 7.102244126144797e-06]

您的电话号码 9 有点棘手:

Your number 9 was a bit tricky:

我发现带有MNIST数据集的模型获得了关于 9 的两个主要特征".上部和下部.与图像上一样,具有良好圆形的上部不是 9 ,而是针对MNIST数据集训练的模型的 3 .根据MNIST数据集, 9 的下部大部分是拉直曲线.因此,由于MNIST样本的缘故,基本上,理想形状的 9 始终是模型的 3 ,除非您再次用足够数量的形状的样本来训练模型> 9 .为了检查我的想法,我用 9 s做了一个子实验:

As I figured out the model with MNIST dataset picked up two main "features" regarding 9. Upper and lower parts. Upper parts with nice round shape, as on your image, is not a 9, but mostly 3 for your model trained against the MNIST dataset. Lower part of 9 is mostly a straighten curve as per the MNIST dataset. So basicly your perfect shaped 9 is always a 3 for your model because of the MNIST samples, unless you will train again the model with sufficiant amount of samples of your shaped 9. In order to check my thoughts I made a subexperiment with 9s:

我的 9 上部倾斜(根据MNIST,大多数情况下 9 都可以),但底部略微卷曲( 9 不能根据MNIST):

My 9 with skewed upper parts (mostly OK for 9 as per MNIST) but with slightly curly bottom (Is not OK for 9 as per MNIST):

结果: 9 ,概率: 0.5365301370620728

我的 9 ,其上部倾斜(根据MNIST,大多数情况下 9 都可以),并且笔直的底部(按照 9 就可以了) MNIST):

My 9 with skewed upper parts (mostly OK for 9 as per MNIST) and with straight bottom (Is OK for 9 as per MNIST):

结果: 9 ,概率: 0.923724353313446

您的 9 具有错误解释的形状属性:

Your 9 with the misinterpreted shape properties:

结果: 3 ,概率: 0.8158268928527832

result:  3 . probabilities:  [9.367801249027252e-05, 3.9978775021154433e-05, 0.0001467708352720365, 0.8158268928527832, 0.0005801069783046842, 0.04391581565141678, 6.44062723154093e-08, 7.099170943547506e-06, 0.09051419794559479, 0.048875387758016586]


最后只是证明图像缩放(填充)重要性的证据,我上面提到的至关重要:

结果: 3 ,概率: 0.9845736622810364

结果: 9 ,概率: 0.923724353313446

因此我们可以看到,如果模型内部形状过大且填充尺寸较小,那么模型会解释并解释为 3 .

So we can see that our model picked up some features, which it interprets, classifies always as 3 in the case of an oversized shape inside of the image with low padding size.

我认为使用CNN可以获得更好的性能,但是采样和预处理方式对于在ML任务中获得最佳性能始终至关重要.

I think that we can get a better performance with CNN, but the way of sampling and preprocessing is always crucial for getting the best performance in an ML task.

希望对您有帮助.

更新2:

我发现了另一个问题,我也检查并证明是正确的,数字在图像内部的放置也至关重要,这对于这种类型的NN是有意义的.一个很好的例子,在MNIST数据集中居中放置的数字 7 9 ,如果我们放置新的数字,则图像底部附近会导致较难或易碎的分类用于在图像中心进行分类.我检查了将 7 9 移至底部的理论,因此在图像顶部保留了更多位置,结果几乎是 100%准确性.由于这是一个 spatial 类型的问题,我想,使用 CNN ,我们可以更有效地消除它.但是,如果MNIST被指定为居中,那会更好,或者我们可以通过编程方式避免出现此问题.

I found another issue, what I checked as well and proved to be true, that the placement of number inside of image is crucial as well, which makes sense by this type of NN. A good example the number 7 and 9 which have been placed of center in MNIST dataset, near to bottom of the image resulted in harder or flase classification if we place the new number for classifying in the center of image. I checked the theory shifting the 7s and 9s towards to the bottom, so lefting more place at the top of the image and the result was almost 100% accuracy.As this is a spatial type problem, I guess that, with CNN we could eliminate it with more effectiveness. However would be better, if MNIST was alligned to center, or we can do it programatically to avoid the issue.

这篇关于改善由mnist数据集训练的神经网络的真实结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-25 12:36