本文介绍了AHO Corasick算法的状态转换表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 请帮助我了解Aho-Corasick算法中多个模式的状态转换表的构造。Please help me to understand the construction of state transition table for more than one patterns in the Aho-Corasick algorithm.请给出简单详细的解释,以便我能理解。Please give a simple and detailed explanation so that I could understand.我正在关注本文和此处是动画。谢谢。推荐答案 第1阶段 创建关键字树:Starting at the root, follow the path labeled by chars of Pi If the path ends before Pi, continue it by adding new edges and ... nodes for the remaining characters of P Store identifier i of Pi at the terminal node of the path我通过一个示例进行演示:I demonstrate it by an example :假设我们有一组有限的模式P = {他,她,他的,她的}。Suppose we have a finite set of patterns P = {he, she, his, hers}. 接下来,我们将关键字树扩展为自动机,以支持线性时间匹配,如下所示。Next we extend the keyword tree into an automaton,to support linear-time matching as follows.显然,自动机由两部分组成:Obviously the automaton consists of two parts : 状态 转换 状态:正是由关键字树实现的节点;并且初始状态=树的根。States : Exactly the nodes achieved by the keyword tree; and initial state = root of the tree. 转换:使用 goto(q,a) 函数Transitions : using goto(q,a) function as follows./*This function gives the state entered from current state q by matching target char a*/- If edge (q,v) in the tree is labeled by a, then g(q,a)=v; - g(0,a)=0 for each a that does not label an edge out of the root//the automton stays at the initial state while scanning non-matching characters - O.W. g(q,a)={} 使用自动机:SearchofTarget T[1..m] //considering the target string T as an array of chars indexed from 1 to mq:= 0; // initial state(root)for i:= 1 to m do while g(q,T[i]) == {} do q:= fail(q); // follow a fail q:= goto(q,T[i]); // follow a goto if output(q) != {} then print i, out(q);endfor;如上面的伪代码所示,它调用了两个函数: fail(q) 和 output(q) As you see in the above pseudo-code it calls two functions : fail(q) and output(q) 失败(q) : //对于q!= 0给出在不匹配状态下输入的状态fail(q) : // for q !=0 gives the state entered at a mismatchfailure(q) is the node labeled by the longest proper suffix w of L(q) ... s.t. w is a prefix of some pattern//L(q) is the label of node q as the concatenation of edge labels ... on the path from the root to q 输出(q) :output(q) :gives the set of patterns recognized when entering state q计算完这两个函数后,自动机看起来像这样:After computation of these two functions, the automaton looks like this : 现在您可能想知道何时计算这些方法,因此请查看以下官方形式:Now you may wonder when to compute these methods, so please look at these for more official form : 希望获得帮助,如果仍有疑问,请随时询问。Hope this help and if still something is ambiguous please don't hesitate to ask. 这篇关于AHO Corasick算法的状态转换表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-31 14:28