本文介绍了删除XWPFParagraph会保留其段落符号(¶)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Apache POI从Microsoft Word文档中删除一组连续的段落.

I am trying to remove a set of contiguous paragraphs from a Microsoft Word document, using Apache POI.

据我了解,可以通过以下方式删除所有段落来删除该段落:

From what I have understood, deleting a paragraph is possible by removing all of its runs, this way:

/*
 * Deletes the given paragraph.
 */
public static void deleteParagraph(XWPFParagraph p) {
    if (p != null) {
        List<XWPFRun> runs = p.getRuns();
        //Delete all the runs
        for (int i = runs.size() - 1; i >= 0; i--) {
            p.removeRun(i);
        }
        p.setPageBreak(false); //Remove the eventual page break
    }
}

实际上,它可以工作,但是有一些奇怪的事情.删除的段落块不会从文档中消失,但是会转换为一组空行.就像每个段落都将转换为新行一样.

In fact, it works, but there's something strange. The block of removed paragraphs does not disappear from the document, but it's converted in a set of empty lines. It's just like every paragraph would be converted into a new line.

通过打印代码中的段落内容,我实际上可以看到一个空格(每删除一个空格).直接从文档中查看内容,并启用格式标记的可视化,我可以看到以下内容:

By printing the paragraphs' content from code I can see, in fact, a space (for each one removed). Looking at the content directly from the document, with the formatting mark's visualization enabled, I can see this:

¶的垂直列对应于已删除元素的块.

The vertical column of ¶ corresponds to the block of deleted elements.

您对此有想法吗?我希望我的段落被完全删除.

Do you have an idea for that? I'd like my paragraphs to be completely removed.

我还尝试通过以下方式替换文本(用setText())并删除最终可以自动添加的空格:

I also tried by replacing the text (with setText()) and by removing eventual spaces that could be added automatically, this way:

p.setSpacingAfter(0);
p.setSpacingAfterLines(0);
p.setSpacingBefore(0);
p.setSpacingBeforeLines(0);
p.setIndentFromLeft(0);
p.setIndentFromRight(0);
p.setIndentationFirstLine(0);
p.setIndentationLeft(0);
p.setIndentationRight(0);

但是没有运气.

推荐答案

我将通过删除段落来删除段落,而不是仅删除该段落中的运行.删除段落不是apache poi高级API的一部分.但是使用XWPFDocument.getDocument().getBody()可以得到较低级别的 CTBody ,其中有一个removeP(int i).

I would delete paragraphs by deleting paragraphs, not by deleting only the runs in this paragraphs. Deleting paragraphs is not part of the apache poi high level API. But using XWPFDocument.getDocument().getBody() we can get the low level CTBody and there is a removeP(int i).

示例:

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import java.awt.Desktop;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

public class WordRemoveParagraph {

 /*
  * Deletes the given paragraph.
  */

 public static void deleteParagraph(XWPFParagraph p) {
  XWPFDocument doc = p.getDocument();
  int pPos = doc.getPosOfParagraph(p);
  //doc.getDocument().getBody().removeP(pPos);
  doc.removeBodyElement(pPos);
 }

 public static void main(String[] args) throws IOException, InvalidFormatException {

  XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));

  int pNumber = doc.getParagraphs().size() -1;
  while (pNumber >= 0) {
   XWPFParagraph p = doc.getParagraphs().get(pNumber);
   if (p.getParagraphText().contains("delete")) {
    deleteParagraph(p);
   }
   pNumber--;
  }

  FileOutputStream out = new FileOutputStream("result.docx");
  doc.write(out);
  out.close();
  doc.close();

  System.out.println("Done");
  Desktop.getDesktop().open(new File("result.docx"));

 }

}

这将从文档source.docx中删除所有段落,其中文本包含删除",并将结果保存在result.docx中.

This deletes all paragraphs from the document source.docx where the text contains "delete" and saves the result in result.docx.

尽管doc.getDocument().getBody().removeP(pPos);有效,但不会更新XWPFDocument的段落列表.因此,由于该列表仅在再次阅读文档时才更新,因此将破坏该列表的段落迭代器和其他访问.

Although doc.getDocument().getBody().removeP(pPos); works, it will not update the XWPFDocument's paragraphs list. So it will destroy paragraph iterators and other accesses to that list since the list is only updated while reading the document again.

因此更好的方法是使用doc.removeBodyElement(pPos);. removeBodyElement(int pos)pos指向文档正文中的gagagraph,则a>与doc.getDocument().getBody().removeP(pos);完全相同,因为该段落也是BodyElement.但除此之外,它将更新XWPFDocument的段落列表.

So the better approach is using doc.removeBodyElement(pPos); instead. removeBodyElement(int pos) does exactly the same as doc.getDocument().getBody().removeP(pos); if the pos is pointing to a pagagraph in the document body since that paragraph is an BodyElement too. But in addition, it will update the XWPFDocument's paragraphs list.

这篇关于删除XWPFParagraph会保留其段落符号(¶)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!