本文介绍了检测序列是否在Python中的子序列的倍数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的零和一元组,例如:

(1,0,1,1,1,0,1,1,1,0,1,1)
 

事实证明:

(1,0,1,1,1,0,1,1,1,0,1,1)==(1,0,1,1)* 3
 

我想要一个函数 F 例如,如果取值是零和一的非空的元组, F(S)是最短的子元组研究,使得取值== R *ñ为正整数 N

因此​​,例如,

 F((1,0,1,1,1,0,1,1,1,0,1,1))==(1,0,1,1 )
 

什么是华而不实的方式来用Python语言编写的函数 F

编辑:

我目前用天真的方法

 高清F(S):
  对于i在范围(1,LEN(多个)):
    如果len(S)%我== 0和s == S [我] *(LEN(S)/ I):
      返回小号[我]
 

解决方案

我相信我有一个O(n)时间的解决方案(实际上是2N + R,N是元组的长度,r为子tuplle)不使用后缀树木,但使用字符串匹配算法(KMP一样,你应该找到现成的)。

我们使用下面的鲜为人知的定理:

 如果x,y是对一些字母串,

然后XY = YX当且仅当x = Z ^ k和Y = Z ^升一些串z和整数K,L。
 

我现在声称,对于我们的问题而言,这意味着我们需要做的是确定给定的元组/列表(或字符串)是其自身的循环移位!

要确定一个字符串是其自身的循环移位,我们与其自身串联(它甚至没有成为一个真正的CONCAT,只是一个虚拟的也可以),然后检查字符串匹配(与自己)。

有关的一个证明,假设该字符串是其自身的循环移位。

在我们有一个给定的字符串Y = UV =似曾相识。由于UV =似曾相识,我们必须有,美= Z ^ K和V = Z ^升,从上面的定理,因此Y = Z ^ {K + L〕。另一个方向是容易证明

下面是蟒蛇code。该方法被称为powercheck

 高清powercheck(LST):
    数= 0
    位置= 0
    在KnuthMorrisPratt(双(LST),LST)PO​​S:
        数+ = 1
        位置= POS
        如果count == 2:
            打破

    返回LST [:位置]


高清的两倍(LST):
    因为我在范围(1,3):
        对于ELEM在LST:
            产量ELEM

高清的main():
    打印powercheck([1,0,1,1,1,0,1,1,1,0,1,1])

如果__name__ ==__main__:
    主要()
 

这里是我以前(由于大卫​​Eppstein的)的KMP code。

 #克努特 - 莫里斯 - 普拉特字符串匹配
#大卫Eppstein的,加州大学欧文分校,2002年3月1日

高清KnuthMorrisPratt(文字,图案):

    '''产量在文本模式拷贝所有的起始位置。
调用约定类似于string.find,但它的参数可以是
列表或迭代器,不只是字符串,它返回所有的比赛,而不仅仅是
第一个,它并不立刻需要在存储器中的整个文本。
每当它的产量,它会读取文本完全直至并包括
导致产量的比赛。'''

    #允许索引到的模式和产量在防止变
    模式=表(样式)

    #构建的偏移量表
    偏移= [1] *(LEN(图案)+1)
    移= 1
    在范围(LEN(模式))POS:
        而换挡< = POS和模式[POS] =模式[POS-SHIFT]!
            Shift + =移[POS-SHIFT]
        移[POS + 1] =移

    #做的实际搜索
    startPos = 0
    matchLen = 0
    对于C文本:
        而matchLen == LEN(模式)或\
              matchLen> = 0和模式[matchLen] = C:!
            startPos + =移[matchLen]
            matchLen  -  =移[matchLen]
        matchLen + = 1
        如果matchLen == LEN(图案):
            产量startPos
 

有关样品此输出

  [1,0,1,1]
 

如预期。

余相比,这个反对shx2的code(不是numpy的一个),通过产生一个随机50比特串,然后复制,使总长度为100万美元。这是输出(十进制数是time.time的输出())

  1362988461.75

(50,[1,1,1,0,0,1,0,1,0,0,1,0,0,1,1,1,0,0,0,0,0,0,1, 1,1,1,0,0,0,1,1,0,1,1,1,1,1,1,1,0,0,1,1,1,0,0,0,0, 0,1])

1362988465.96

50 [1,1,1,0,0,1,0,1,0,0,1,0,0,1,1,1,0,0,0,0,0,0,1,1, 1,1,0,0,0,1,1,0,1,1,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0, 1]

1362988487.14
 

上面的方法了〜4秒,而shx2的方法了〜21秒!

下面是定时code。 (shx2的方法被称为powercheck2)。

 高清rand_bitstring(N):
    兰特= random.SystemRandom()
    LST = []
    对于j在范围(1,N + 1):
        R = rand.randint(1,2)
        若R == 2:
            lst.append(0)
        其他:
            lst.append(1)

    回报LST

高清的main():
    LST = rand_bitstring(50)* 20万
    打印time.time()
    打印powercheck(LST)
    打印time.time()
    powercheck2(LST)
    打印time.time()
 

I have a tuple of zeros and ones, for instance:

(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)

It turns out:

(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1) == (1, 0, 1, 1) * 3

I want a function f such that if s is a non-empty tuple of zeros and ones, f(s) is the shortest subtuple r such that s == r * n for some positive integer n.

So for instance,

f( (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1) ) == (1, 0, 1, 1)

What is a slick way to write the function f in Python?

Edit:

The naive method I am currently using

def f(s):
  for i in range(1,len(s)):
    if len(s)%i == 0 and s == s[:i] * (len(s)/i):
      return s[:i]
解决方案

I believe I have an O(n) time solution (actually 2n+r, n is length of tuple, r is sub tuplle) which does not use suffix trees, but uses a string matching algorithm (like KMP, which you should find off-the shelf).

We use the following little known theorem:

If x,y are strings over some alphabet,

then xy = yx if and only if x = z^k and y = z^l for some string z and integers k,l.

I now claim that, for the purposes of our problem, this means that all we need to do is determine if the given tuple/list (or string) is a cyclic shift of itself!

To determine if a string is a cyclic shift of itself, we concatenate it with itself (it does not even have to be a real concat, just a virtual one will do) and check for a substring match (with itself).

For a proof of that, suppose the string is a cyclic shift of itself.

The we have that the given string y = uv = vu.Since uv = vu, we must have that u = z^k and v= z^l and hence y = z^{k+l} from the above theorem. The other direction is easy to prove.

Here is the python code. The method is called powercheck.

def powercheck(lst):
    count = 0
    position = 0
    for pos in KnuthMorrisPratt(double(lst), lst):
        count += 1
        position = pos
        if count == 2:
            break

    return lst[:position]


def double(lst):
    for i in range(1,3):
        for elem in lst:
            yield elem

def main():
    print powercheck([1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1])

if __name__ == "__main__":
    main()

And here is the KMP code which I used (due to David Eppstein).

# Knuth-Morris-Pratt string matching
# David Eppstein, UC Irvine, 1 Mar 2002

def KnuthMorrisPratt(text, pattern):

    '''Yields all starting positions of copies of the pattern in the text.
Calling conventions are similar to string.find, but its arguments can be
lists or iterators, not just strings, it returns all matches, not just
the first one, and it does not need the whole text in memory at once.
Whenever it yields, it will have read the text exactly up to and including
the match that caused the yield.'''

    # allow indexing into pattern and protect against change during yield
    pattern = list(pattern)

    # build table of shift amounts
    shifts = [1] * (len(pattern) + 1)
    shift = 1
    for pos in range(len(pattern)):
        while shift <= pos and pattern[pos] != pattern[pos-shift]:
            shift += shifts[pos-shift]
        shifts[pos+1] = shift

    # do the actual search
    startPos = 0
    matchLen = 0
    for c in text:
        while matchLen == len(pattern) or \
              matchLen >= 0 and pattern[matchLen] != c:
            startPos += shifts[matchLen]
            matchLen -= shifts[matchLen]
        matchLen += 1
        if matchLen == len(pattern):
            yield startPos

For your sample this outputs

[1,0,1,1]

as expected.

I compared this against shx2's code(not the numpy one), by generating a random 50 bit string, then replication to make the total length as 1 million. This was the output (the decimal number is the output of time.time())

1362988461.75

(50, [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1])

1362988465.96

50 [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1]

1362988487.14

The above method took ~4 seconds, while shx2's method took ~21 seconds!

Here was the timing code. (shx2's method was called powercheck2).

def rand_bitstring(n):
    rand = random.SystemRandom()
    lst = []
    for j in range(1, n+1):
        r = rand.randint(1,2)
        if r == 2:
            lst.append(0)
        else:
            lst.append(1)

    return lst

def main():
    lst = rand_bitstring(50)*200000
    print time.time()
    print powercheck(lst)
    print time.time()
    powercheck2(lst)
    print time.time()

这篇关于检测序列是否在Python中的子序列的倍数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 10:18