用于检索domain.tld的正则表达式

本文介绍了用于检索domain.tld的正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要一个Java中的正则表达式，我可以用来从任何url中检索domain.tld部分。所以，，将全部返回foo.com。

I'm need a regular expression in Java that I can use to retrieve the domain.tld part from any url. So https://foo.com/bar, http://www.foo.com#bar, http://bar.foo.com will all return foo.com.

我写了这个正则表达式，但它匹配整个网址

I wrote this regex, but it's matching the whole url

Pattern.compile("[.]?.*[.x][a-z]{2,3}");

我不确定我是否匹配。性格正确。我试过了。但我从netbeans收到错误。

I'm not sure I'm matching the "." character right. I tried "." but I get an error from netbeans.

更新：

tld是不限于2或3个字符，并且应该返回foo.co.uk。

The tld is not limited to 2 or 3 characters, and http://www.foo.co.uk/bar should return foo.co.uk.

推荐答案

我会使用java.net.URI类来提取主机名，然后使用正则表达式提取主机uri的最后两部分。

I would use the java.net.URI class to extract the host name, and then use a regex to extract the last two parts of the host uri.

import java.net.URI;
import java.net.URISyntaxException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RunIt {

    public static void main(String[] args) throws URISyntaxException {
        Pattern p = Pattern.compile(".*?([^.]+\\.[^.]+)");

        String[] urls = new String[] {
                "https://foo.com/bar",
                "http://www.foo.com#bar",
                "http://bar.foo.com"
        };

        for (String url:urls) {
            URI uri = new URI(url);
            //eg: uri.getHost() will return "www.foo.com"
            Matcher m = p.matcher(uri.getHost());
            if (m.matches()) {
                System.out.println(m.group(1));
            }
        }
    }
}

打印：

foo.com
foo.com
foo.com

这篇关于用于检索domain.tld的正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！