本文介绍了如何在iphone的pdf页面中使用CGPDFScanner找到Word坐标?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CGPDFScanner解析pdf页面。
但是我无法找到serach结果的共同点。

I am doing parsing of the pdf page using CGPDFScanner.But I am not able to find the co-oridnate of the serach result.

在void Tm1(CGPDFScannerRef scanner,void * info)中,我是只得到某些单词的联合而不是pdf的每个单词。

In the void Tm1(CGPDFScannerRef scanner, void *info),I am only getting co-oridnates for some word but not for every word of the pdf.

如何找到每个单词的联合符号(例如(x,y)) pdf page?

How can I find the co-oridnates e.g(x,y) of every word of pdf page ?

推荐答案

您大幅低估了将PDF转换为文本的复杂性。我也犯了这个错误,编写一个适用于大多数PDF的文本提取引擎需要几个月的时间。我的代码是商业代码,但只是为了给你一个想法:

You're drastically under-estimating the complexity to convert PDF to text. I made that mistake as well, and it took months to write a text extraction engine that works with most PDFs. My code is commercial, but just to give you an idea:

Td,TD,Tm,T *,d0,d1都可以包含文字。 (d0,d1用于Type3字体,它们不太常见,但Microsoft Word非常喜欢它们)因此可以在XObjects中执行任何对象(也可以递归)。但是你还需要解析字体,因为许多PDF都有CMap附加到字体上,将随机数转换为字符(或者字符 - PDF也可以有连字)。请注意,XObjects也可能包含字体,并且以正确的顺序解析它们至关重要,因为字体可以包含父字体。

Td, TD, Tm, T*, d0, d1 all can contain text. (d0, d1 are for Type3 fonts, which are less common, but Microsoft Word really likes them) So can do any objects in XObjects (also recursively). But you also need to parse the Fonts, since many PDFs have CMaps attached to fonts that translate "random numbers" to the character (or characters - PDF can have ligatures as well). Beware, XObjects might also contain fonts, and it's critical to parse them in the right order, since fonts can have parent fonts.

让您了解如何开始,但只是一个警告,规格非常不完整。在官方PDF参考文献中还有一些内容,但您仍然会找到不起作用的文档(查看规范时),但仍然可以工作(当您在Adobe Acrobat中尝试它们时)。

Adobe's ToUnicode PDF gives you some idea how to start, but just a warning, the spec is very incomplete. There's a bit more in the official PDF reference, but you still will find documents that should not work (when looking at the spec) but still DO work (when you try them in Adobe Acrobat).

这篇关于如何在iphone的pdf页面中使用CGPDFScanner找到Word坐标?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 19:29