正则表达式:在HTML标记之间查找小写字母组

这是"全部替换"的一种解决方案，因为它可以通过RegEx全局修饰符/g来工作！I'm attempting to develop a regular expression that can be run in Sigil, the ePub 2 editor.Small-caps are a well-known problem within the current ePub reader ecosystem. Many readers, such as Adobe Digital Editions, do not support "font-variant: small-caps". After trying several different workarounds, I've settled on creating fake small caps by transforming the text to uppercase and setting the previously lowercase letters to "font-size: 0.75em".This process is extremely tedious, especially when working with books that have lots of endnotes with citations of other books.Say that I have a bunch of phrases in an HTML page tagged with an "SC" class. I've created a test phrase:Hello World! Testing: one tWo thrEE & W.T.F.Don't touch me!The goal is to write a regex that matches any lowercase letters within the "SC" span tag only, and replace them with:LETTERSI can manage to match and replace the letters in the first word "Hello", but everything breaks down after that.Here's what I've got so far:Find:(.*?)([a-z]+)(.*)Replace:\1\U\2\E\3The tricky part is then continuing to find the rest of the lowercase letters within that tag, now that a new "FSC" (Fake Small Caps) span tag has been introduced. Trying the same regex again results in "span" and then "class" getting the FSC treatment. Ideally, I'd like to be able to just keep hitting the "Replace All" button until no more matches are found.The above example would look like this when finished:HELLO WORLD! TESTING: ONE TWO THREE & W.T.F.Don't touch me!It's not pretty, but it works on every ePub reader that I've tested it on.If you google "epub small caps regex", you'll come across a MobileRead wiki article that I edited to include this regex, which I've decided is not satisfactory:((?:.+?.+?)*[\.|,|:|;|-|–|—|!|\?]? ?(?:&)? ?[A-Z]+)([a-z'’\. ]+)(.*?)This ends up miniaturizing a bunch of punctuation and sometimes stops in the middle of a phrase. I started over, thinking there was probably a better solution that doesn't attempt to plan for every single possibility up front.If someone comes up with a better solution to this, you'll be the hero of the entire ePub publishing industry.UpdateI've added the accepted (and only) answer to the Mobile Read wiki. Please note that this regex has been altered specifically for use in Sigil; YMMV in other environments. 解决方案 Perfect usecase for: Collapse and Capture a Repeating Pattern in a Single Regex ExpressionModified it for your case: ((?:(?!<\/span>)(?:[^a-z&]|&[^;]+;))*|(?!^)\G(?:(?!<\/span>)(?:[^a-z&]|&[^;]+;))*)([a-z]+)Replace with: \1\U\2\EAnd here's the RegEx explained: http://regex101.com/r/jU6bA5This is a solution for "Replace All" as it works via RegEx global modifier /g ! 这篇关于正则表达式:在HTML标记之间查找小写字母组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！