本文介绍了正确使用Apache Tika MediaType的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用APache Tika的MediaType类来比较mediaType.

I want to use APache Tika's MediaType class to compare mediaTypes.

我首先使用Tika来检测MediaType.然后,我要根据MediaType开始操作.

I first use Tika to detect the MediaType.Then I want to start an action according to the MediaType.

因此,如果MediaType来自XML类型,我想执行一些操作,如果它是压缩文件,我想开始其他操作.

So if the MediaType is from type XML I want to do some action, if it is a compressed file I want to start an other action.

我的问题是有很多XML类型,那么如何使用MediaType来检查它是否是XML?

My problem is that there are many XML types, so how do I check if it is an XML using the MediaType ?

这是我以前的实现(在Tika之前):

Here is my previous (before Tika) implementation:

if (contentType.contains("text/xml") ||
    contentType.contains("application/xml") ||
    contentType.contains("application/x-xml") ||
    contentType.contains("application/atom+xml") ||
    contentType.contains("application/rss+xml")) {
        processXML();
}

else if (contentType.contains("application/gzip") ||
    contentType.contains("application/x-gzip") ||
    contentType.contains("application/x-gunzip") ||
    contentType.contains("application/gzipped") ||
    contentType.contains("application/gzip-compressed") ||
    contentType.contains("application/x-compress") ||
    contentType.contains("gzip/document") ||
    contentType.contains("application/octet-stream")) {
        processGzip();
}

我想将其切换为使用Tika,如下所示:

I want to switch it to use Tika something like the following:

MediaType mediaType = MediaType.parse(contentType);
if (mediaType == APPLICATION_XML) {
    return processXml();
} else if (mediaType == APPLICATION_ZIP || mediaType == OCTET_STREAM) {
    return processGzip();
}

但是问题是Tika.detect(...)返回许多没有MediaType常量的类型.

But the problem is that Tika.detect(...) returns many different types which don't have a MediaType constant.

如果MediaType是XML类型,我如何才能识别它?或者,如果它是Compress类型?我需要一个包括所有子项的父亲"类型,也许是一种方法:"boolean isXML()",其中包括application/xml和text/xml以及application/x-xml或包含所有zip + gzip类型等的"boolean isCompress()"

How can I just identify the MediaType if it is type XML ? Or if it is type Compress ?I need a "Father" type which includes all of it's childs, maybe a method which is:"boolean isXML()" which includes application/xml and text/xml and application/x-xmlor "boolean isCompress()" which includes all of the zip + gzip types etc

推荐答案

您需要做的就是遍历类型层次结构,直到找到所需的内容或用尽所有要检查的内容.这可以通过递归来完成,也可以通过循环来完成

What you'll need to do is walk the types hierarchy, until you either find what you want, or run out of things to check. That can be done with recursion, or could be done with a loop

您需要的关键方法是 MediaTypeRegistry.getSupertype(MediaType)

您的代码希望是这样的:

Your code would want to be something like:

// Define your media type constants here
MediaType FOO = MediaType.parse("application/foo");

// Work out the file's type
MediaType type = detector.detect(stream, metadata);

// Is it one we want in the tree?
while (type != null && !type.equals(MediaType.OCTET_STREAM)) {
   if (type.equals(MediaType.Application_XML)) {
       doThingForXML();
   } else if (type.equals(MediaType.APPLICATION_ZIP)) {
       doThingForZip();
   } else if (type.equals(FOO)) {
       doThingForFoo();
   } else {
       // Check parent
       type = registry.getSuperType(type);
   }
}

这篇关于正确使用Apache Tika MediaType的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-18 01:05