本文介绍了完善而广之整理文件的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过文件名的特定阵列,通过文件扩展名来排序最简单了方法是这样的:

 的Array.Sort(文件名,
    (X,Y)=> Path.GetExtension(x)的.CompareTo(Path.GetExtension(γ)));
 

问题是,在很长的名单(〜800K),它需要很长的时间进行排序,同时通过整个文件名排序是更快了几秒钟!

的理论,有一种方法来优化它:而不是使用 Path.GetExtension()和比较新创建的扩展,只串,我们可以提供一个比较比对从启动现有的文件名字符串 LastIndexOf('。'),而无需创建新的字符串。

现在,假设我找到了 LastIndexOf('。'),我想重用原生.NET的StringComparer和之后的它仅适用于部分的字符串 LastIndexOf('。'),以preserve所有文化的考虑。没有找到一种方法来做到这一点。

任何想法?

编辑:

使用tanascius的主意,用 char.CompareTo()的方法,我带着我的尤伯杯快速文件扩展名-的Comparer,现在它推而广之快3倍时间排序!它甚至快于使用 Path.GetExtension()以某种方式的所有方法。你怎么看?

编辑2:

我发现,这个实现不考虑,因为文化char.CompareTo()方法不考虑文化,所以这不是一个完美的解决方案。

任何想法?

 公共静态INT CompareExtensions(字符串filePath1,串FILEPATH2)
    {
        如果(filePath1 == NULL和放大器;&安培; FILEPATH2 == NULL)
        {
            返回0;
        }
        否则,如果(filePath1 == NULL)
        {
            返回-1;
        }
        否则,如果(FILEPATH2 == NULL)
        {
            返回1;
        }

        INT I = filePath1.LastIndexOf('。');
        诠释J = filePath2.LastIndexOf('。');

        如果(我== -1)
        {
            I = filePath1.Length;
        }
        其他
        {
            我++;
        }

        如果(j == -1)
        {
            J = filePath2.Length;
        }
        其他
        {
            J ++;
        }

        对于(; I< filePath1.Length和放大器;&放大器; J< filePath2.Length;我++,J ++)
        {
            INT compareResults = filePath1 [I] .CompareTo(FILEPATH2 [J]);

            如果(compareResults!= 0)
            {
                返回compareResults;
            }
        }

        如果(I> = filePath1.Length和放大器;&放大器; J> = filePath2.Length)
        {
            返回0;
        }
        否则,如果(I> = filePath1.Length)
        {
            返回-1;
        }
        其他
        {
            返回1;
        }
    }
 

解决方案

您可以编写一个比较器,用于比较扩展的每一个字符。 字符有一个的CompareTo(),太(的)。

基本上你循环,直到你没有留在至少一个字符串或一个或多个字符的CompareTo()返回值!= 0。

编辑:在回答到OP的修改

您比较器方法的性能可以显著改善。请参见下面的code。此外,我增加了行

 的String.Compare(filePath1 [I]的ToString(),FILEPATH2 [J]的ToString()
                m_CultureInfo,m_CompareOptions);
 

,以便能够使用的CultureInfo CompareOptions 的。然而,这减慢的一切相比,使用普通的 char.CompareTo()(约系数2)版本。但是,根据我的自己太问题这似乎是这样去了。

 公共密封类ExtensionComparer:的IComparer<字符串>
{
    私人只读的CultureInfo m_CultureInfo;
    私人只读CompareOptions m_CompareOptions;

    公共ExtensionComparer():这个(CultureInfo.CurrentUICulture,CompareOptions.None){}

    公共ExtensionComparer(的CultureInfo CultureInfo的,CompareOptions compareOptions)
    {
        m_CultureInfo = CultureInfo的;
        m_CompareOptions = compareOptions;
    }

    公众诠释比较(字符串filePath1,串FILEPATH2)
    {
        如果(filePath1 == NULL || FILEPATH2 == NULL)
        {
            如果(filePath1!= NULL)
            {
                返回1;
            }
            如果(FILEPATH2!= NULL)
            {
                返回-1;
            }
            返回0;
        }

        变种I = filePath1.LastIndexOf('。')+ 1;
        变种J = filePath2.LastIndexOf + 1('。');

        如果(我== 0 ||Ĵ== 0)
        {
            如果(ⅰ!= 0)
            {
                返回1;
            }
            复位J!= 0? -1:0;
        }

        而(真)
        {
            如果(我== filePath1.Length ||Ĵ== filePath2.Length)
            {
                如果(我!= filePath1.Length)
                {
                    返回1;
                }
                复位J!= filePath2.Length? -1:0;
            }
            VAR compareResults =的String.Compare(filePath1 [I]的ToString(),FILEPATH2 [J]的ToString(),m_CultureInfo,m_CompareOptions);
            // VAR compareResults = filePath1 [I] .CompareTo(FILEPATH2 [J]);
            如果(compareResults!= 0)
            {
                返回compareResults;
            }
            我++;
            J ++;
        }
    }
}
 

用法:

  fileNames1.Sort(新ExtensionComparer(CultureInfo.GetCultureInfo(SV-SE),
                    CompareOptions.StringSort));
 

With a given array of file names, the most simpliest way to sort it by file extension is like this:

Array.Sort(fileNames,
    (x, y) => Path.GetExtension(x).CompareTo(Path.GetExtension(y)));

The problem is that on very long list (~800k) it takes very long to sort, while sorting by the whole file name is faster for a couple of seconds!

Theoretical, there is a way to optimize it: instead of using Path.GetExtension() and compare the newly created extension-only-strings, we can provide a Comparison than compares the existing filename strings starting from the LastIndexOf('.') without creating new strings.

Now, suppose i found the LastIndexOf('.'), i want to reuse native .NET's StringComparer and apply it only to the part on string after the LastIndexOf('.'), to preserve all culture consideration. Didn't found a way to do that.

Any ideas?

Edit:

With tanascius's idea to use char.CompareTo() method, i came with my Uber-Fast-File-Extension-Comparer, now it sorting by extension 3x times faster! it even faster than all methods that uses Path.GetExtension() in some manner. what do you think?

Edit 2:

I found that this implementation do not considering culture since char.CompareTo() method do not considering culture, so this is not a perfect solution.

Any ideas?

    public static int CompareExtensions(string filePath1, string filePath2)
    {
        if (filePath1 == null && filePath2 == null)
        {
            return 0;
        }
        else if (filePath1 == null)
        {
            return -1;
        }
        else if (filePath2 == null)
        {
            return 1;
        }

        int i = filePath1.LastIndexOf('.');
        int j = filePath2.LastIndexOf('.');

        if (i == -1)
        {
            i = filePath1.Length;
        }
        else
        {
            i++;
        }

        if (j == -1)
        {
            j = filePath2.Length;
        }
        else
        {
            j++;
        }

        for (; i < filePath1.Length && j < filePath2.Length; i++, j++)
        {
            int compareResults = filePath1[i].CompareTo(filePath2[j]);

            if (compareResults != 0)
            {
                return compareResults;
            }
        }

        if (i >= filePath1.Length && j >= filePath2.Length)
        {
            return 0;
        }
        else if (i >= filePath1.Length)
        {
            return -1;
        }
        else
        {
            return 1;
        }
    }
解决方案

You can write a comparer that compares each character of the extension. char has a CompareTo(), too (see here).

Basically you loop until you have no more chars left in at least one string or one CompareTo() returns a value != 0.

EDIT: In response to the edits of the OP

The performance of your comparer method can be significantly improved. See the following code. Additionally I added the line

string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), 
                m_CultureInfo, m_CompareOptions );

to enable the use of CultureInfo and CompareOptions. However this slows down everything compared to a version using a plain char.CompareTo() (about factor 2). But, according to my own SO question this seems to be the way to go.

public sealed class ExtensionComparer : IComparer<string>
{
    private readonly CultureInfo m_CultureInfo;
    private readonly CompareOptions m_CompareOptions;

    public ExtensionComparer() : this( CultureInfo.CurrentUICulture, CompareOptions.None ) {}

    public ExtensionComparer( CultureInfo cultureInfo, CompareOptions compareOptions )
    {
        m_CultureInfo = cultureInfo;
        m_CompareOptions = compareOptions;
    }

    public int Compare( string filePath1, string filePath2 )
    {
        if( filePath1 == null || filePath2 == null )
        {
            if( filePath1 != null )
            {
                return 1;
            }
            if( filePath2 != null )
            {
                return -1;
            }
            return 0;
        }

        var i = filePath1.LastIndexOf( '.' ) + 1;
        var j = filePath2.LastIndexOf( '.' ) + 1;

        if( i == 0 || j == 0 )
        {
            if( i != 0 )
            {
                return 1;
            }
            return j != 0 ? -1 : 0;
        }

        while( true )
        {
            if( i == filePath1.Length || j == filePath2.Length )
            {
                if( i != filePath1.Length )
                {
                    return 1;
                }
                return j != filePath2.Length ? -1 : 0;
            }
            var compareResults = string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), m_CultureInfo, m_CompareOptions );
            //var compareResults = filePath1[i].CompareTo( filePath2[j] );
            if( compareResults != 0 )
            {
                return compareResults;
            }
            i++;
            j++;
        }
    }
}

Usage:

fileNames1.Sort( new ExtensionComparer( CultureInfo.GetCultureInfo( "sv-SE" ),
                    CompareOptions.StringSort ) );

这篇关于完善而广之整理文件的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 15:28