本文介绍了提高扩展排序文件的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用给定的文件名数组,通过文件扩展名排序的最简单的方法是这样:

  .Sort(fileNames,
(x,y)=> Path.GetExtension(x).CompareTo(Path.GetExtension(y)));问题是,在很长的列表(〜800k),它需要很长时间排序,而排序整个文件名的速度更快几秒钟!



理论上,有一种方法来优化它:而不是使用 Path.GetExtension ()并比较新创建的仅扩展字符串,我们可以提供一个比较比较从 LastIndexOf('。')不创建新字符串。



现在,假设我发现 LastIndexOf('。'),i想要重用本地.NET的StringComparer并将其仅应用于 LastIndexOf('。')后的字符串部分,以保留所有文化考虑。



编辑:



b
$ b

使用tanascius的想法使用 char.CompareTo()方法,我来了我的Uber-Fast-File-Extension-Comparer它排序通过扩展快3倍!它甚至比以某种方式使用 Path.GetExtension()的所有方法更快。



我发现这个实现不考虑文化,因为 char.CompareTo()方法不考虑文化,所以这不是一个完美的解决方案。



p>

  public static int CompareExtensions(string filePath1,string filePath2)
{
if(filePath1 == null& & filePath2 == null)
{
return 0;
}
else if(filePath1 == null)
{
return -1;
}
else if(filePath2 == null)
{
return 1;
}

int i = filePath1.LastIndexOf('。');
int j = filePath2.LastIndexOf('。');

if(i == -1)
{
i = filePath1.Length;
}
else
{
i ++;
}

if(j == -1)
{
j = filePath2.Length;
}
else
{
j ++;
}

for(; i< filePath1.Length&& j< filePath2.Length; i ++,j ++)
{
int compareResults = filePath1 [i] .CompareTo(filePath2 [j]);

if(compareResults!= 0)
{
return compareResults;
}
}

if(i> = filePath1.Length&& j> = filePath2.Length)
{
return 0 ;
}
else if(i> = filePath1.Length)
{
return -1;
}
else
{
return 1;
}
}


解决方案

可以写一个比较扩展的每个字符的比较器。 char 也有 CompareTo()(。)



直到您在至少一个字符串中没有更多的字符或 CompareTo()返回值!= 0。



编辑:回应OP的编辑



您的比较方法的效能可以大幅改善。请参阅以下代码。此外,我添加了一行

  string.Compare(filePath1 [i] .ToString(),filePath2 [j] .ToString ,
m_CultureInfo,m_CompareOptions);

以使用 CultureInfo code> CompareOptions 。但是,与使用一个简单的 char.CompareTo()(约为因子2)的版本相比,这减慢了一切。但是,根据我的,这似乎是方式

  public sealed class ExtensionComparer:IComparer< string> 
{
private readonly CultureInfo m_CultureInfo;
private readonly CompareOptions m_CompareOptions;

public ExtensionComparer():this(CultureInfo.CurrentUICulture,CompareOptions.None){}

public ExtensionComparer(CultureInfo cultureInfo,CompareOptions compareOptions)
{
m_CultureInfo = cultureInfo;
m_CompareOptions = compareOptions;
}

public int比较(string filePath1,string filePath2)
{
if(filePath1 == null || filePath2 == null)
{
if(filePath1!= null)
{
return 1;
}
if(filePath2!= null)
{
return -1;
}
return 0;
}

var i = filePath1.LastIndexOf('。')+ 1;
var j = filePath2.LastIndexOf('。')+ 1;

if(i == 0 || j == 0)
{
if(i!= 0)
{
return 1;
}
return j!= 0? -1:0;
}

while(true)
{
if(i == filePath1.Length || j == filePath2.Length)
{
if(i!= filePath1.Length)
{
return 1;
}
return j!= filePath2.Length? -1:0;
}
var compareResults = string.Compare(filePath1 [i] .ToString(),filePath2 [j] .ToString(),m_CultureInfo,m_CompareOptions);
// var compareResults = filePath1 [i] .CompareTo(filePath2 [j]);
if(compareResults!= 0)
{
return compareResults;
}
i ++;
j ++;
}
}
}



  fileNames1.Sort(new ExtensionComparer(CultureInfo.GetCultureInfo(sv-SE),
CompareOptions.StringSort));


With a given array of file names, the most simpliest way to sort it by file extension is like this:

Array.Sort(fileNames,
    (x, y) => Path.GetExtension(x).CompareTo(Path.GetExtension(y)));

The problem is that on very long list (~800k) it takes very long to sort, while sorting by the whole file name is faster for a couple of seconds!

Theoretical, there is a way to optimize it: instead of using Path.GetExtension() and compare the newly created extension-only-strings, we can provide a Comparison than compares the existing filename strings starting from the LastIndexOf('.') without creating new strings.

Now, suppose i found the LastIndexOf('.'), i want to reuse native .NET's StringComparer and apply it only to the part on string after the LastIndexOf('.'), to preserve all culture consideration. Didn't found a way to do that.

Any ideas?

Edit:

With tanascius's idea to use char.CompareTo() method, i came with my Uber-Fast-File-Extension-Comparer, now it sorting by extension 3x times faster! it even faster than all methods that uses Path.GetExtension() in some manner. what do you think?

Edit 2:

I found that this implementation do not considering culture since char.CompareTo() method do not considering culture, so this is not a perfect solution.

Any ideas?

    public static int CompareExtensions(string filePath1, string filePath2)
    {
        if (filePath1 == null && filePath2 == null)
        {
            return 0;
        }
        else if (filePath1 == null)
        {
            return -1;
        }
        else if (filePath2 == null)
        {
            return 1;
        }

        int i = filePath1.LastIndexOf('.');
        int j = filePath2.LastIndexOf('.');

        if (i == -1)
        {
            i = filePath1.Length;
        }
        else
        {
            i++;
        }

        if (j == -1)
        {
            j = filePath2.Length;
        }
        else
        {
            j++;
        }

        for (; i < filePath1.Length && j < filePath2.Length; i++, j++)
        {
            int compareResults = filePath1[i].CompareTo(filePath2[j]);

            if (compareResults != 0)
            {
                return compareResults;
            }
        }

        if (i >= filePath1.Length && j >= filePath2.Length)
        {
            return 0;
        }
        else if (i >= filePath1.Length)
        {
            return -1;
        }
        else
        {
            return 1;
        }
    }
解决方案

You can write a comparer that compares each character of the extension. char has a CompareTo(), too (see here).

Basically you loop until you have no more chars left in at least one string or one CompareTo() returns a value != 0.

EDIT: In response to the edits of the OP

The performance of your comparer method can be significantly improved. See the following code. Additionally I added the line

string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), 
                m_CultureInfo, m_CompareOptions );

to enable the use of CultureInfo and CompareOptions. However this slows down everything compared to a version using a plain char.CompareTo() (about factor 2). But, according to my own SO question this seems to be the way to go.

public sealed class ExtensionComparer : IComparer<string>
{
    private readonly CultureInfo m_CultureInfo;
    private readonly CompareOptions m_CompareOptions;

    public ExtensionComparer() : this( CultureInfo.CurrentUICulture, CompareOptions.None ) {}

    public ExtensionComparer( CultureInfo cultureInfo, CompareOptions compareOptions )
    {
        m_CultureInfo = cultureInfo;
        m_CompareOptions = compareOptions;
    }

    public int Compare( string filePath1, string filePath2 )
    {
        if( filePath1 == null || filePath2 == null )
        {
            if( filePath1 != null )
            {
                return 1;
            }
            if( filePath2 != null )
            {
                return -1;
            }
            return 0;
        }

        var i = filePath1.LastIndexOf( '.' ) + 1;
        var j = filePath2.LastIndexOf( '.' ) + 1;

        if( i == 0 || j == 0 )
        {
            if( i != 0 )
            {
                return 1;
            }
            return j != 0 ? -1 : 0;
        }

        while( true )
        {
            if( i == filePath1.Length || j == filePath2.Length )
            {
                if( i != filePath1.Length )
                {
                    return 1;
                }
                return j != filePath2.Length ? -1 : 0;
            }
            var compareResults = string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), m_CultureInfo, m_CompareOptions );
            //var compareResults = filePath1[i].CompareTo( filePath2[j] );
            if( compareResults != 0 )
            {
                return compareResults;
            }
            i++;
            j++;
        }
    }
}

Usage:

fileNames1.Sort( new ExtensionComparer( CultureInfo.GetCultureInfo( "sv-SE" ),
                    CompareOptions.StringSort ) );

这篇关于提高扩展排序文件的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 15:28