本文介绍了正确使用 Apache Tika MediaType的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 APache Tika 的 MediaType 类来比较 mediaType.

I want to use APache Tika's MediaType class to compare mediaTypes.

我首先使用 Tika 来检测 MediaType.然后我想根据MediaType开始一个动作.

I first use Tika to detect the MediaType.Then I want to start an action according to the MediaType.

因此,如果 MediaType 来自 XML 类型,我想做一些操作,如果它是压缩文件,我想启动其他操作.

So if the MediaType is from type XML I want to do some action, if it is a compressed file I want to start an other action.

我的问题是有很多 XML 类型,那么如何使用 MediaType 检查它是否是 XML?

My problem is that there are many XML types, so how do I check if it is an XML using the MediaType ?

这是我之前(在 Tika 之前)的实现:

Here is my previous (before Tika) implementation:

if (contentType.contains("text/xml") ||
    contentType.contains("application/xml") ||
    contentType.contains("application/x-xml") ||
    contentType.contains("application/atom+xml") ||
    contentType.contains("application/rss+xml")) {
        processXML();
}

else if (contentType.contains("application/gzip") ||
    contentType.contains("application/x-gzip") ||
    contentType.contains("application/x-gunzip") ||
    contentType.contains("application/gzipped") ||
    contentType.contains("application/gzip-compressed") ||
    contentType.contains("application/x-compress") ||
    contentType.contains("gzip/document") ||
    contentType.contains("application/octet-stream")) {
        processGzip();
}

我想将其切换为使用 Tika,如下所示:

I want to switch it to use Tika something like the following:

MediaType mediaType = MediaType.parse(contentType);
if (mediaType == APPLICATION_XML) {
    return processXml();
} else if (mediaType == APPLICATION_ZIP || mediaType == OCTET_STREAM) {
    return processGzip();
}

但问题是 Tika.detect(...) 返回许多没有 MediaType 常量的不同类型.

But the problem is that Tika.detect(...) returns many different types which don't have a MediaType constant.

如果 MediaType 是 XML 类型,我如何才能识别它?或者如果它是类型 Compress ?我需要一个包含所有孩子的父亲"类型,也许是一种方法:"boolean isXML()" 包括 application/xml 和 text/xml 和 application/x-xml或boolean isCompress()",其中包括所有 zip + gzip 类型等

How can I just identify the MediaType if it is type XML ? Or if it is type Compress ?I need a "Father" type which includes all of it's childs, maybe a method which is:"boolean isXML()" which includes application/xml and text/xml and application/x-xmlor "boolean isCompress()" which includes all of the zip + gzip types etc

推荐答案

您需要做的是遍历类型层次结构,直到找到您想要的内容,或者用完要检查的内容.这可以用递归来完成,也可以用循环来完成

What you'll need to do is walk the types hierarchy, until you either find what you want, or run out of things to check. That can be done with recursion, or could be done with a loop

您需要的关键方法是MediaTypeRegistry.getSupertype(MediaType)

你的代码应该是这样的:

Your code would want to be something like:

// Define your media type constants here
MediaType FOO = MediaType.parse("application/foo");

// Work out the file's type
MediaType type = detector.detect(stream, metadata);

// Is it one we want in the tree?
while (type != null && !type.equals(MediaType.OCTET_STREAM)) {
   if (type.equals(MediaType.Application_XML)) {
       doThingForXML();
   } else if (type.equals(MediaType.APPLICATION_ZIP)) {
       doThingForZip();
   } else if (type.equals(FOO)) {
       doThingForFoo();
   } else {
       // Check parent
       type = registry.getSuperType(type);
   }
}

这篇关于正确使用 Apache Tika MediaType的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-18 01:06