本文介绍了将手写笔记的图像转换为文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数百张手写笔记的图像.它们是由不同的人编写的,但是它们是按顺序排列的,因此您知道例如person1编写了img1.jpg-> img100.jpg.手写样式因人而异,但笔记中的某些部分始终是固定的,我想这可以帮助一种算法(对我有帮助!).

I have hundreds of images of handwritten notes. They were written from different people but they are in sequence so you know that for example person1 wrote img1.jpg -> img100.jpg. The style of handwriting varies a lot from person to person but there are parts of the notes which are always fixed, I imagine that could help an algorithm (it helps me!).

我尝试了tesseract,但它在识别文本方面非常失败.我在想,因为每个人都有100张图像,是否有一种算法可以通过提供少量示例(例如5个或更少)来进行训练,并且可以从中学习?还是数据不足?通过搜索,似乎我需要实现CNN(例如本文).

I tried tesseract and it failed pretty bad at recognizing the text. I'm thinking since each person has like 100 images is there an algorithm I can train by feeding it a small number of examples, like 5 or less and it can learn from that? Or would it not be enough data? From searching around it seems looks like I need to implement a CNN (e.g. this paper).

尽管我对ai的知识有限,但是我仍然可以使用图书馆和一些学习来做这些吗?如果是这样,我应该怎么做?

My knowledge of ai is limited though, is this something that I could still do using a library and some studying? If so, what should I do going forward?

推荐答案

这称为OCR,并且已经取得了进展.实际上,这是一个使用tesseract将图像文件解析为文本的过程如此简单的示例:

This is called OCR and there has been a progress. Actually, here is an example of how simple it is to parse an image file to text using tesseract:

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract


def ocr_core(file):
    text = pytesseract.image_to_string(file)
    return text


print(ocr_core('sample.png'))

但是

我不太确定它能否识别不同类型的笔迹.您可以自己尝试一下以找出答案.如果要尝试python示例,则需要导入tesseract,但首先要在操作系统上安装tesseract并将其添加到PATH.

BUT

I am not very sure that it can recognize different types of handwriting. You can give it a try yourself to find out. If you want to try the python example you need to import tesseract but first things first to install tesseract on your OS and add it to your PATH.

这篇关于将手写笔记的图像转换为文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 11:46