本文介绍了字符串列表,获取n个元素的公共子串,Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题可能类似于 this,但另一种情况.在输入中考虑这个列表:

My problem is maybe similar to this, but another situation. Consider this list in input :

['ACCCACCCGTGG','AATCCC','CCCTGAGG']

另一个输入是n,n是一个数字,列表的每个元素中共同的子串的维度.所以输出必须是具有occorences数量的最大occorence子串,类似于:

And the other input is n,n is a number, the dimension of the substring in common in every element of the list. So the output has to be the maximum occorence substring with the number of occorences, similar to this:

{'CCC' : 4}

4 因为在列表的第一个元素中出现了两次,在其他两个字符串中出现了一次.CCC 因为是具有 3 个元素的最长子串,重复于每串至少 1 次我就是这样开始的:

4 becouse in the first element of list are twice, and one time in the other two strings.CCC becouse is the longhest substring with 3 elements,that repeats at least 1 time per string I started in that way :

def get_n_repeats_list(n,seq_list):
max_substring={}
list_seq=list(seq_list)
for i in range(0,len(list_seq)):
    if i+1<len(list_seq):
        #Idea : to get elements in common,comparing two strings at time
        #in_common=set(list_seq[i])-set(list_seq[i+1])
        #max_substring...       
return max_substring

也许这里有一个解决方案

推荐答案

这是我的看法.这绝对不是这个星球上最漂亮的东西,但它应该可以正常工作.

So this is my take on it. It is definitely not the prettiest thing on the planet but it should work just fine.

a = ['ACCCWCCCGTGG', 'AATCCC', 'CCCTGAGG']

def occur(the_list, a_substr):
    i_found = 0
    for a_string in the_list:
        for i_str in range(len(a_string) - len(a_substr) + 1):
            #print('Comparing {:s} to {:s}'.format(substr, a_string[i_str:i_str + len(substr)]))
            if a_substr == a_string[i_str:i_str + len(a_substr)]:
                i_found += 1
    return i_found

def found_str(original_List, n):
    result_dict = {}
    if n > min(map(len, original_List)):
        print("The substring has to be shorter than the shortest string!")
        exit()
    specialChar = '|'
    b = specialChar.join(item for item in original_List)
    str_list = []
    for i in range(len(b) - n):
        currStr = b[i:i+n]
        if specialChar not in currStr:
            str_list.append(currStr)
        else:
            continue
    str_list = set(str_list)

    for sub_strs in str_list:
        i_found = 0
        for strs in original_List:
            if sub_strs in strs:
                i_found += 1

        if i_found == len(original_List):
            #print("entered with sub = {:s}".format(sub_strs))
            #print(occur(original_List, sub_strs))
            result_dict[sub_strs] = occur(original_List, sub_strs)

    if result_dict == {}:
        print("No common substings of length {:} were found".format(n))

    return result_dict

end = found_str(a, 3)
print(end)

返回:{'CCC': 4}

returns: {'CCC': 4}

这篇关于字符串列表,获取n个元素的公共子串,Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 21:17