本文介绍了如何用超链接替换特定文本而无需修改现有的< img>和< a>标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我要纠正的错误

<img class="lazy_responsive" title="<a href='kathryn-kuhlman-language-en-topics-718-page-1' title='Kathryn Kuhlman'>Kathryn Kuhlman</a> - iUseFaith.com" src="ojm_thumbnail/1000/32f808f79011a7c0bd1ffefc1365c856.jpg" alt="<a href='kathryn-kuhlman-language-en-topics-718-page-1' title='Kathryn Kuhlman'>Kathryn Kuhlman</a> - iUseFaith.com" width="1600" height="517">

如果仔细看一下上面的代码,您会发现属性alt和Title 中的文本已替换为链接,原因是该文本中包含关键字.结果,我的图像正像一个工具提示一样显示,它提供了一个链接,而不仅仅是一个像这样的名称

If you look carefully at the code above, you will see that the text in the attribute alt and Title were replaced with a link due to the fact that the keyword was in that text. As a result, my image is being displayed like with a tooltip which gives a link instead of just a name like this

问题:我有一个包含关键字的数组,其中每个关键字都有自己的URL,该URL可以像这样链接:

Problem: I have an array with keywords where each keyword has its own URL which will serve as a link like this:

$keywords["Kathryn Kuhlman"] = "https://www.iusefaith.com/en-354";
$keywords["Max KANTCHEDE"] = "https://www.iusefaith.com/MaxKANTCHEDE";

我有一个带有图像和链接的文字...可以在其中找到那些关键字.

I have a text with images and links ... where those keywords may be found.

$text='Meet God\'s General Kathryn Kuhlman. <br>
<img class="lazy_responsive" title="Kathryn Kuhlman - iUseFaith.com" src="https://www.iusefaith.com/ojm_thumbnail/1000/32f808f79011a7c0bd1ffefc1365c856.jpg" alt="Kathryn Kuhlman - iUseFaith.com" width="1600" height="517" />
<br>
Follow <a href="https://www.iusefaith.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>
<br>
Max KANTCHEDE
';

我想用标题的完整链接替换每个关键字,而不用替换 href 的内容或 alt 的内容或 title .我做到了

I want to replace each keyword with a full link to the keyword with the title without replacing the content of href nor the content of alt nor the content of title that is in the text. I did this

$lien_existants = array();

$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";

if(preg_match_all("/$regexp/siU", $text, $matches, PREG_SET_ORDER))
{
    foreach($matches as $match)
    {
        $lien_actuels_existant = filter_var($match[3], FILTER_SANITIZE_STRING);
        $lien_existants [] = trim($lien_actuels_existant);

        // $match[2] = link address
        // $match[3] = link text

        echo $match[2], '', $match[3], '<br>';
    }
}

foreach(@$keywords as $name => $value)
{
    if(!in_array($name, $lien_existants)&&!preg_match("/'/i", $name)&&!preg_match('/"/i', $name))
    {
        $text =  trim(preg_replace('~(\b'. $name.'\b)~ui', "<a href='$value' title='$name'>$1</a>", $text));
    }
    else
    {
        $name = addslashes($name);
        $text =  trim(preg_replace('~(\b'. $name.'\b)~ui', "<a href='$value' title='$name'>$1</a>", $text));
    }
    #########################################
}

这会用链接替换单词,但也会在属性alt(图像标题)中替换单词.

如何防止其替换alt,title和href中的文本?

请注意,我已经尝试了在SO上找到的所有其他解决方案,因此,如果您认为一个工作正常,请使用上面的代码,并向我展示应该怎么做,因为如果我知道如何使其工作,那么我就不会问它在这里.

Note I have tried all the other solutions I have found on S.O so if you think one works kindly use my code above and show me how it should be done because if I knew how to make it work I would not be asking it here.

推荐答案

我认为@Jiwoks的答案是正确的,使用dom解析调用来隔离合格的文本节点.

I think @Jiwoks' answer was on the right path with using dom parsing calls to isolate the qualifying text nodes.

虽然他的答案适用于OP的示例数据,但我不满意的是,当在单个文本节点中要替换多个字符串时,他的解决方案失败了.

While his answer works on the OP's sample data, I was unsatisfied to find that his solution failed when there was more than one string to be replaced in a single text node.

我精心设计了自己的解决方案,以适应不区分大小写的匹配,单词边界,在文本节点中进行多次替换以及插入完全限定的节点(不仅仅是外观就像子节点一样.

I've crafted my own solution with the goal of accommodating case-insensitive matching, word-boundary, multiple replacements in a text node, and fully qualified nodes being inserted (not merely new strings that look like child nodes).

代码:(演示#1,在文本节点中有2个替换)(演示2:带有OP的文本)
(从OP收到更完整,更真实的文本后:演示#3,不修剪saveHTML())

$html = <<<HTML
Meet God's General Kathryn Kuhlman. <br>
<img class="lazy_responsive" title="Kathryn Kuhlman - iUseFaith.com" src="https://www.iusefaith.com/ojm_thumbnail/1000/32f808f79011a7c0bd1ffefc1365c856.jpg" alt="Kathryn Kuhlman - iUseFaith.com" width="1600" height="517" />
<br>
Follow <a href="https://www.iusefaith.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>
<br>
Max KANTCHEDE & Kathryn Kuhlman
HTML;

$keywords = [
    'Kathryn Kuhlman' => 'https://www.example.com/en-354',
    'Max KANTCHEDE' => 'https://www.example.com/MaxKANTCHEDE',
    'eneral' => 'https://www.example.com/this-is-not-used',
];

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

$lookup = [];
$regexNeedles = [];
foreach ($keywords as $name => $link) {
    $lookup[strtolower($name)] = $link;
    $regexNeedles[] = preg_quote($name, '~');
}
$pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~i' ;

foreach($xpath->query('//*[not(self::img or self::a)]/text()') as $textNode) {
    $newNodes = [];
    $hasReplacement = false;
    foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
        $fragmentLower = strtolower($fragment);
        if (isset($lookup[$fragmentLower])) {
            $hasReplacement = true;
            $a = $dom->createElement('a');
            $a->setAttribute('href', $lookup[$fragmentLower]);
            $a->setAttribute('title', $fragment);
            $a->nodeValue = $fragment;
            $newNodes[] = $a;
        } else {
            $newNodes[] = $dom->createTextNode($fragment);
        }
    }
    if ($hasReplacement) {
        $newFragment = $dom->createDocumentFragment();
        foreach ($newNodes as $newNode) {
            $newFragment->appendChild($newNode);
        }
        $textNode->parentNode->replaceChild($newFragment, $textNode);
    }
}
echo substr(trim($dom->saveHTML()), 3, -4);

输出:

Meet God's General <a href="https://www.example.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>. <br>
<img class="lazy_responsive" title="Kathryn Kuhlman - iUseFaith.com" src="https://www.iusefaith.com/ojm_thumbnail/1000/32f808f79011a7c0bd1ffefc1365c856.jpg" alt="Kathryn Kuhlman - iUseFaith.com" width="1600" height="517">
<br>
Follow <a href="https://www.iusefaith.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>
<br>
<a href="https://www.example.com/MaxKANTCHEDE" title="Max KANTCHEDE">Max KANTCHEDE</a> &amp; <a href="https://www.example.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>

一些说明点:

  • 我正在使用一些DomDocument沉默和标志,因为示例输入缺少包含所有文本的父标记. (@Jiwoks的技术没什么问题,这只是一种不同的方法-选择您喜欢的任何方法.)
  • 声明具有小写字母键的查找数组,以允许对限定文本进行不区分大小写的翻译.
  • 正则表达式模式是动态构建的,因此应进行preg_quote()处理,以确保模式逻辑得以维持. b是单词边界元字符,用于防止匹配较长单词中的子字符串.请注意,输出中的General中未替换eneral.不区分大小写的标志i将为该应用程序和将来的应用程序提供更大的灵活性.
  • 我的xpath查询与@Jiwoks'相同;如果没有理由更改它.它正在寻找不是<img><a>标记子代的文本节点.
  • I am using some DomDocument silencing and flags because the sample input is missing a parent tag to contain all of the text. (There is nothing wrong with @Jiwoks' technique, this is just a different one -- choose whatever you like.)
  • A lookup array with lowercased keys is declared to allow case-insensitive translations on qualifying text.
  • A regex pattern is dynamically constructed and therefore should be preg_quote()ed to ensure that the pattern logic is upheld. b is a word boundary metacharacter to prevent matching a substring in a longer word. Notice that eneral is not replaced in General in the output. The case-insensitive flag i will allow greater flexibility for this application and future applications.
  • My xpath query is identical to @Jiwoks'; if see no reason to change it. It is seeking text nodes that are not the children of <img> or <a> tags.

...现在有点麻烦了...现在我们正在处理孤立的文本节点,可以使用regex区分合格字符串和非合格字符串.

...now it gets a little fiddly... Now that we are dealing with isolated text nodes, regex can be used to differentiate qualifying strings from non-qualifying strings.

  • preg_split()正在创建一个平面的,带索引的非空子字符串数组.符合翻译条件的子字符串将被隔离为元素,如果存在任何不符合条件的子字符串,它们将被隔离为元素.

  • preg_split() is creating a flat, indexed array of non-empty substrings. Substrings which qualify for translation will be isolated as elements and if there are any non-qualifying substrings, they will be isolated elements.

  • 示例中的最后一个文本节点将生成4个元素:

  • The final text node in my sample will generate 4 elements:

0 => '
',                                 // non-qualifying newline
1 => 'Max KANTCHEDE',              // translatable string
2 => ' & ',                        // non-qualifying text
3 => 'Kathryn Kuhlman'             // translatable string

对于可翻译字符串,将创建新的<a>节点,并填充适当的属性和文本,然后将其压入一个临时数组.

For translatable strings, new <a> nodes are created and filled with the appropriate attributes and text, then pushed into a temporary array.

对于不可翻译的字符串,将创建文本节点,然后将其推送到临时数组中.

For non-translatable strings, text nodes are created, then pushed into a temporary array.

如果完成任何翻译/替换,则dom被更新;否则,无需更改文档.

If any translations/replacements have been done, then dom is updated; otherwise, no mutation of the document is necessary.

最后,将回显已完成的html文档,但是由于您的示例输入中包含一些不在标签内的文本,因此必须使用DomDocument出于稳定性考虑而使用的临时前导<p>和尾随</p>标签.删除以将结构恢复为其原始形式.如果所有文本都包含在标记中,则可以直接使用saveHTML(),而不会对该字符串进行任何改动.

In the end, the finalized html document is echoed, but because your sample input has some text that is not inside of tags, the temporary leading <p> and trailing </p> tag that DomDocument applied for stability must be removed to restore the structure to its original form. If all text is enclosed in tags, you can just use saveHTML() without any hacking at the string.

这篇关于如何用超链接替换特定文本而无需修改现有的&lt; img&gt;和&lt; a&gt;标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-12 09:56