IE8和IE9/Chrome/Firefox/…上中文字符串的正则表达式String.match()的工作方式有所不同.

本文介绍了IE8和IE9/Chrome/Firefox/…上中文字符串的正则表达式String.match()的工作方式有所不同.的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我使用正则表达式\s匹配IE8中的汉字空白(　)时，它返回false，这意味着它是一个两个字节的字符.另一方面，在IE9(或更高版本的IE，Chrome，Firefox，Safari等)中，它返回true，这表示　是一个字节的字符.实际上，它是一个两个字节的字符.

When I use regex \s for matching Chinese Character Blank (　) in IE8, it returns false, which means it is a two-byte character. On the other hand, in IE9 (or higher version of IE, Chrome, Firefox, Safari, etc.), it returns true, which means 　 is a one-byte character. Actually, it is a two-byte character.

我想知道IE8的正则表达式引擎与其他更高版本的浏览器之间是否有区别.什么事?

I want to know if there is any difference between IE8's regex engine and the other higher version browsers’ one. What is it?

代码如下.

function OneByteCharCheck(value) {
    if (value.match(/^(?:[a-zA-Z0-9@\;\:\[\]\{\}\|\^\=\/\!\*\`\"\#\$\+\%\&\'\(\)\,\.\<\>\-\_\?\\\s()ｧ-ﾝﾞﾟ ｡｢｣､･ｦ~]*)*$/)) {
        return true
    }
    return false;
}

推荐答案

差异是由于IE8以及与许多Web标准(即使在发布之时)之间的一致性差.

The difference is due to the poor conformant of IE8 and below to many of the web standards even at the time of its release.

在测试页 .browserstack.com/screenshots/60a828b3ebb6b5f9f7daf8491d3fd086b5b0b358"rel =" nofollow noreferrer>浏览器堆栈显示IE8仅能匹配Unicode 空格字符(Zs)类别，而IE9成功匹配了所有17个字符(包括U + 3000).这很可能是由于Microsoft努力符合 IE9及更高版本中的许多Web标准.

Running my test page on browserstack shows that IE8 only manage to match 1 out of 17 characters that are in Unicode Space Character (Zs) Category, while IE9 successfully match all the 17 characters (which includes U+3000). This is most likely due to Microsoft's effort to conform to many of the web standards from IE9 and above.

即使在 IE8 (于2009年3月发布的中)，其JavaScript引擎也只能匹配1个字符，可能是空格(U + 0020)，这意味着它是 ECMA-262第三版( 1999年12月发布)，因为第三版规范明确列出了U + 00A0 NO-BREAK SPACE.在 7.2空白部分中.虽然从一开始就未映射所有Zs类别代码点，但从U + 2000到U + 200A，U + 00A0和U + 3000的代码点已经以Unicode 2.0映射(日期为 1996年7月).

Even in IE8 (released March 2009), its JavaScript engine only managed to match 1 character, probably space (U+0020), which means that it is not even ECMA-262 3rd-edition compliant (released Dec 1999), since the 3rd edition specification explicitly lists out U+00A0 NO-BREAK SPACE in section 7.2 White Space. While not all Zs category code points are mapped since the beginning, code points from U+2000 to U+200A, U+00A0 and U+3000 are already mapped in Unicode 2.0 (dated July 1996).

，最好列出所有字符而不是使用简写的字符类表示法.这样可以确保新旧浏览器之间的行为一致.

As bobince mentioned in the comment, it is best that you list out all the characters instead of using the short-hand character class notation. This will ensure consistent behavior across browsers old and new.

这是我的测试页的源代码:

This is the source code of my test page:

<!DOCTYPE html>
<html>
  <meta charset="utf-8">
<body>
<script type="text/javascript">
  var Zs = ['\u0020', '\u00a0', '\u1680', '\u2000', '\u2001',
            '\u2002', '\u2003', '\u2004', '\u2005', '\u2006',
            '\u2007', '\u2008', '\u2009', '\u200a', '\u202f',
            '\u205f', '\u3000'];

  var count = 0;          
  for (var i = 0; i < Zs.length; i++) {
      count += /^\s$/.test(Zs[i]);
  }

  document.write("<h2>" + count + "/" + Zs.length + " in Unicode Zs category passed the test</h2>")
</script>
</body>
</html>

Windows 7 IE8屏幕截图(按浏览器堆栈)
Windows 7 IE9屏幕截图(按浏览器堆栈)

这篇关于IE8和IE9/Chrome/Firefox/…上中文字符串的正则表达式String.match()的工作方式有所不同.的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！