问题描述
我需要找到一个字符串中最长的非重叠重复子字符串。我有可用的字符串的后缀树和后缀数组。
I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available.
允许重叠时,答案很简单(后缀树中最深的父节点)。
When overlapping is allowed, the answer is trivial (deepest parent node in suffix tree).
例如String = acaca
For example for String = "acaca"
如果允许重叠,则答案为 aca,但不允许重叠允许,答案是 ac或 ca。
If overlapping is allowed, the answer is "aca" but when overlapping is not allowed, the answer is "ac" or "ca".
我只需要算法或高级思路。
I need the algorithm or high level idea only.
PS:我尝试过,但是没有明确的答案可以在网上找到。
P.S.: I tried but there is no clear answer I can find on web.
推荐答案
生成后缀数组并按O(nlogn).ps排序:有更有效的算法,例如DC3和Ukkonen算法。
示例:
Generate suffix array and sort in O(nlogn).ps: There is more effective algorithm like DC3 and Ukkonen algorithm.example:
字符串:ababc
后缀数组:
子字符串的起始索引| substring
0-ababc
2-abc
1-babc
3-bc
4- c
String : ababcSuffix array:start-index of substring | substring
0 - ababc
2 - abc
1 - babc
3 - bc
4 - c
比较每个连续的两个子字符串,并获得具有以下约束的公共前缀:
说 i1 是子字符串 ababc 的索引: 0
说 i2 是子字符串 abc : 2
的公共前缀为 ab ,公共前缀的长度为 len
compare each two consecutive substring and get common prefix with following constraint:
Say i1 is index of substring "ababc": 0
say i2 is index of substring "abc":2
common prefix is "ab" , length of common prefix is len
abs(i1-i2)> len //避免重叠
abs(i1-i2) >len //avoid overlap
通过带有解决方案的后缀数组,您将获得 ababc的结果,即 ab ;
go through suffix array with solution, and you will get the result of "ababc", which is "ab";
整个解决方案将运行O(nlogn)
The whole solution will run O(nlogn)
但是,会有一个特例: aaaaa,该解决方案无法彻底解决。
欢迎讨论并提出O(nlogn)而不是O(n ^ 2)的解决方案
However, there will be a special case: "aaaaa" that this solution can't solve thoroughly.
Welcome to discuss and come up to a solution of O(nlogn) but not O(n^2)
这篇关于使用后缀树/数组的最长非重叠重复子串(仅算法)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!