本文介绍了使用 Apache POI 来自 Excel 的 HTML 格式单元格值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 apache POI 读取 Excel 文档.至少可以说,它目前能够达到我的目的.但我感到震惊的一件事是将单元格的值提取为 HTML.

I am using apache POI to read an excel document. To say the least, it is able to serve my purpose as of now. But one thing where I am getting struck is extracting the value of cell as HTML.

我有一个单元格,用户将在其中输入一些字符串并应用一些格式(如项目符号/数字/粗体/斜体) 等.

I have one cell wherein user will enter some string and apply some formatting(like bullets/numbers/bold/italic) etc.

所以当我阅读它时,内容应该是 HTML 格式,而不是 POI 给出的纯字符串格式.

SO when I read it the content should be in HTML format and not a plain string format as given by POI.

我几乎浏览了整个 POI API,但找不到任何人.我想只保留一列的格式,而不是整个 excel.我所说的列是指在该列中输入的文本.我希望该文本为 HTML 文本.

I have almost gone through the entire POI API but not able to find anyone. I want to remain the formatting of just one particular column and not the entire excel. By column I mean, the text which is entered in that column. I want that text as HTML text.

还探索和使用了 Apache Tika.但是,据我所知,它只能获取文本,而不能获取文本的格式.

Explored and used Apache Tika also. However as I understand it can only get me the text but not the formatting of the text.

请有人指导我.我的选择不多了.

Please someone guide me. I am running out of options.

假设我在 Excel 中写了 My name is AngelDemon.

Suppose I wrote My name is Angel and Demon in Excel.

我应该在 Java 中得到的输出是 My name is Angel和<i>恶魔</i>

The output I should get in Java is My name is <b>Angel</b> and <i>Demon</i>

推荐答案

我已将此作为 unicode 粘贴到 xls 文件的单元格 A1 中:

I've paste this as unicode to cell A1 of xls file:

<html><p>This is a test. Will this text be <b>bold</b> or <i>italic</i></p></html>

这个 html 行产生这个:

This html line produce this:

这是一个测试.这段文字是粗体还是斜体

This is a test. Will this text be bold or italic

我的代码:

public class ExcelWithHtml {
    // <html><p>This is a test. Will this text be <b>bold</b> or
    // <i>italic</i></p></html>

    public static void main(String[] args) throws FileNotFoundException,
            IOException {
        new ExcelWithHtml()
                .readFirstCellOfXSSF("/Users/rcacheira/testeHtml.xlsx");
    }

    boolean inBold = false;
    boolean inItalic = false;

    public void readFirstCellOfXSSF(String filePathName)
            throws FileNotFoundException, IOException {
        FileInputStream fis = new FileInputStream(filePathName);
        XSSFWorkbook wb = new XSSFWorkbook(fis);
        XSSFSheet sheet = wb.getSheetAt(0);

        String cellHtml = getHtmlFormatedCellValueFromSheet(sheet, "A1");

        System.out.println(cellHtml);

        fis.close();
    }

    public String getHtmlFormatedCellValueFromSheet(XSSFSheet sheet,
            String cellName) {

        CellReference cellReference = new CellReference(cellName);
        XSSFRow row = sheet.getRow(cellReference.getRow());
        XSSFCell cell = row.getCell(cellReference.getCol());

        XSSFRichTextString cellText = cell.getRichStringCellValue();

        String htmlCode = "";
        // htmlCode = "<html>";

        for (int i = 0; i < cellText.numFormattingRuns(); i++) {
            try {
                htmlCode += getFormatFromFont(cellText.getFontAtIndex(i));
            } catch (NullPointerException ex) {
            }
            try {
                htmlCode += getFormatFromFont(cellText
                        .getFontOfFormattingRun(i));
            } catch (NullPointerException ex) {
            }

            int indexStart = cellText.getIndexOfFormattingRun(i);
            int indexEnd = indexStart + cellText.getLengthOfFormattingRun(i);

            htmlCode += cellText.getString().substring(indexStart, indexEnd);
        }

        if (inItalic) {
            htmlCode += "</i>";
            inItalic = false;
        }
        if (inBold) {
            htmlCode += "</b>";
            inBold = false;
        }

        // htmlCode += "</html>";
        return htmlCode;

    }

    private String getFormatFromFont(XSSFFont font) {
        String formatHtmlCode = "";
        if (font.getItalic() && !inItalic) {
            formatHtmlCode += "<i>";
            inItalic = true;
        } else if (!font.getItalic() && inItalic) {
            formatHtmlCode += "</i>";
            inItalic = false;
        }

        if (font.getBold() && !inBold) {
            formatHtmlCode += "<b>";
            inBold = true;
        } else if (!font.getBold() && inBold) {
            formatHtmlCode += "</b>";
            inBold = false;
        }

        return formatHtmlCode;
    }

}

我的输出:

This is a test. Will this text be <b>bold</b> or <i>italic</i>

我认为这就是您想要的,我只是向您展示了可能性,我没有使用最佳代码实践,我只是快速编程以产生输出.

I think it is what you want, i'm only show you the possibilities, i'm not using the best code practices, i'm just programming fast to produce an output.

这篇关于使用 Apache POI 来自 Excel 的 HTML 格式单元格值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 12:12