本文介绍了使用后缀树/数组的最长非重叠重复子串(仅算法)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到一个字符串中最长的非重叠重复子字符串。我有可用的字符串的后缀树和后缀数组。

I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available.

允许重叠时,答案很简单(后缀树中最深的父节点)。

When overlapping is allowed, the answer is trivial (deepest parent node in suffix tree).

例如String = acaca

For example for String = "acaca"

如果允许重叠,则答案为 aca,但不允许重叠允许,答案是 ac或 ca。

If overlapping is allowed, the answer is "aca" but when overlapping is not allowed, the answer is "ac" or "ca".

我只需要算法或高级思路。

I need the algorithm or high level idea only.

PS:我尝试过,但是没有明确的答案可以在网上找到。

P.S.: I tried but there is no clear answer I can find on web.

推荐答案

生成后缀数组并按O(nlogn).ps排序:有更有效的算法,例如DC3和Ukkonen算法。
示例:

Generate suffix array and sort in O(nlogn).ps: There is more effective algorithm like DC3 and Ukkonen algorithm.example:

字符串:ababc
后缀数组:
子字符串的起始索引| substring

0-ababc

2-abc

1-babc

3-bc

4- c

String : ababcSuffix array:start-index of substring | substring
0 - ababc
2 - abc
1 - babc
3 - bc
4 - c

比较每个连续的两个子字符串,并获得具有以下约束的公共前缀:

i1 是子字符串 ababc 的索引: 0

i2 是子字符串 abc 2

的公共前缀为 ab ,公共前缀的长度为 len

compare each two consecutive substring and get common prefix with following constraint:
Say i1 is index of substring "ababc": 0
say i2 is index of substring "abc":2
common prefix is "ab" , length of common prefix is len

abs(i1-i2)> len //避免重叠

abs(i1-i2) >len //avoid overlap

通过带有解决方案的后缀数组,您将获得 ababc的结果,即 ab ;

go through suffix array with solution, and you will get the result of "ababc", which is "ab";

整个解决方案将运行O(nlogn)

The whole solution will run O(nlogn)

但是,会有一个特例: aaaaa,该解决方案无法彻底解决。

欢迎讨论并提出O(nlogn)而不是O(n ^ 2)的解决方案

However, there will be a special case: "aaaaa" that this solution can't solve thoroughly.
Welcome to discuss and come up to a solution of O(nlogn) but not O(n^2)

这篇关于使用后缀树/数组的最长非重叠重复子串(仅算法)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 18:43