本文介绍了使用'wget'获取的HTML文件由'less'报告为二进制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我使用 wget 下载 page:

If I use wget to download this page:

wget http://www.aqr.com/ResearchDetails.htm -O page.html

然后尝试在中查看页面,少将文件报告为二进制文件。

and then attempt to view the page in less, less reports the file as being a binary.

less page.html 
"page.html" may be a binary file.  See it anyway? 

以下是回复标题:

Accept-Ranges:bytes
Cache-Control:private
Content-Encoding:gzip
Content-Length:8295
Content-Type:text/html
Cteonnt-Length:44064
Date:Sun, 25 Sep 2011 12:15:53 GMT
ETag:"c0859e4e785ecc1:6cd"
Last-Modified:Fri, 19 Aug 2011 14:00:09 GMT
Server:Microsoft-IIS/6.0
X-Powered-By:ASP.NET

在vim中打开文件工作正常。

Opening the file in vim works fine.

为什么少有人无法处理它的任何线索?

Any clues as to why less can not handle it?

推荐答案

这是一个UTF-16编码文件。 ()。您可以使用以下命令将其转换为UTF-8:

It's an UTF-16 encoded file. (Check with W3C Validator). You can convert it to UTF-8 with this command:

wget http://www.aqr.com/ResearchDetails.htm -q -O - | iconv -f utf-16 -t utf-8 > page.html

less 通常知道UTF- 8。

less usally knows UTF-8.

编辑

正如@Stephen C报道的那样,<$ c Red Hat中的$ c> less 支持UTF-16。在我看来,。在上,UTF-16支持目前是一个未解决的问题(参考编号) 282)。

As @Stephen C reported, less in Red Hat supports UTF-16. It looks to me that Red Hat patched less for UTF-16 support. On the official site of the less UTF-16 support currently is an open issue (ref number 282).

这篇关于使用'wget'获取的HTML文件由'less'报告为二进制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 17:55