问题描述
我需要一个Java中的正则表达式,我可以用来从任何url中检索domain.tld部分。所以,,将全部返回foo.com。
I'm need a regular expression in Java that I can use to retrieve the domain.tld part from any url. So https://foo.com/bar, http://www.foo.com#bar, http://bar.foo.com will all return foo.com.
我写了这个正则表达式,但它匹配整个网址
I wrote this regex, but it's matching the whole url
Pattern.compile("[.]?.*[.x][a-z]{2,3}");
我不确定我是否匹配。性格正确。我试过了 。但我从netbeans收到错误。
I'm not sure I'm matching the "." character right. I tried "." but I get an error from netbeans.
更新:
tld是不限于2或3个字符,并且应该返回foo.co.uk。
The tld is not limited to 2 or 3 characters, and http://www.foo.co.uk/bar should return foo.co.uk.
推荐答案
我会使用java.net.URI类来提取主机名,然后使用正则表达式提取主机uri的最后两部分。
I would use the java.net.URI class to extract the host name, and then use a regex to extract the last two parts of the host uri.
import java.net.URI;
import java.net.URISyntaxException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RunIt {
public static void main(String[] args) throws URISyntaxException {
Pattern p = Pattern.compile(".*?([^.]+\\.[^.]+)");
String[] urls = new String[] {
"https://foo.com/bar",
"http://www.foo.com#bar",
"http://bar.foo.com"
};
for (String url:urls) {
URI uri = new URI(url);
//eg: uri.getHost() will return "www.foo.com"
Matcher m = p.matcher(uri.getHost());
if (m.matches()) {
System.out.println(m.group(1));
}
}
}
}
打印:
foo.com
foo.com
foo.com
这篇关于用于检索domain.tld的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!