问题描述
通过文件名的特定阵列,通过文件扩展名来排序最简单了方法是这样的:
的Array.Sort(文件名,
(X,Y)=> Path.GetExtension(x)的.CompareTo(Path.GetExtension(γ)));
问题是,在很长的名单(〜800K),它需要很长的时间进行排序,同时通过整个文件名排序是更快了几秒钟!
的理论,有一种方法来优化它:而不是使用 Path.GetExtension()
和比较新创建的扩展,只串,我们可以提供一个比较比对从启动现有的文件名字符串 LastIndexOf('。')
,而无需创建新的字符串。
现在,假设我找到了 LastIndexOf('。')
,我想重用原生.NET的StringComparer和之后的它仅适用于部分的字符串 LastIndexOf('。')
,以preserve所有文化的考虑。没有找到一种方法来做到这一点。
任何想法?
编辑:
使用tanascius的主意,用 char.CompareTo()
的方法,我带着我的尤伯杯快速文件扩展名-的Comparer,现在它推而广之快3倍时间排序!它甚至快于使用 Path.GetExtension()
以某种方式的所有方法。你怎么看?
编辑2:
我发现,这个实现不考虑,因为文化char.CompareTo()
方法不考虑文化,所以这不是一个完美的解决方案。
任何想法?
公共静态INT CompareExtensions(字符串filePath1,串FILEPATH2)
{
如果(filePath1 == NULL和放大器;&安培; FILEPATH2 == NULL)
{
返回0;
}
否则,如果(filePath1 == NULL)
{
返回-1;
}
否则,如果(FILEPATH2 == NULL)
{
返回1;
}
INT I = filePath1.LastIndexOf('。');
诠释J = filePath2.LastIndexOf('。');
如果(我== -1)
{
I = filePath1.Length;
}
其他
{
我++;
}
如果(j == -1)
{
J = filePath2.Length;
}
其他
{
J ++;
}
对于(; I< filePath1.Length和放大器;&放大器; J< filePath2.Length;我++,J ++)
{
INT compareResults = filePath1 [I] .CompareTo(FILEPATH2 [J]);
如果(compareResults!= 0)
{
返回compareResults;
}
}
如果(I> = filePath1.Length和放大器;&放大器; J> = filePath2.Length)
{
返回0;
}
否则,如果(I> = filePath1.Length)
{
返回-1;
}
其他
{
返回1;
}
}
您可以编写一个比较器,用于比较扩展的每一个字符。 字符
有一个的CompareTo()
,太(的)。
基本上你循环,直到你没有留在至少一个字符串或一个或多个字符的CompareTo()
返回值!= 0。
编辑:在回答到OP的修改的
您比较器方法的性能可以显著改善。请参见下面的code。此外,我增加了行
的String.Compare(filePath1 [I]的ToString(),FILEPATH2 [J]的ToString()
m_CultureInfo,m_CompareOptions);
,以便能够使用的CultureInfo
和 CompareOptions
的。然而,这减慢的一切相比,使用普通的 char.CompareTo()
(约系数2)版本。但是,根据我的自己太问题这似乎是这样去了。
公共密封类ExtensionComparer:的IComparer<字符串>
{
私人只读的CultureInfo m_CultureInfo;
私人只读CompareOptions m_CompareOptions;
公共ExtensionComparer():这个(CultureInfo.CurrentUICulture,CompareOptions.None){}
公共ExtensionComparer(的CultureInfo CultureInfo的,CompareOptions compareOptions)
{
m_CultureInfo = CultureInfo的;
m_CompareOptions = compareOptions;
}
公众诠释比较(字符串filePath1,串FILEPATH2)
{
如果(filePath1 == NULL || FILEPATH2 == NULL)
{
如果(filePath1!= NULL)
{
返回1;
}
如果(FILEPATH2!= NULL)
{
返回-1;
}
返回0;
}
变种I = filePath1.LastIndexOf('。')+ 1;
变种J = filePath2.LastIndexOf + 1('。');
如果(我== 0 ||Ĵ== 0)
{
如果(ⅰ!= 0)
{
返回1;
}
复位J!= 0? -1:0;
}
而(真)
{
如果(我== filePath1.Length ||Ĵ== filePath2.Length)
{
如果(我!= filePath1.Length)
{
返回1;
}
复位J!= filePath2.Length? -1:0;
}
VAR compareResults =的String.Compare(filePath1 [I]的ToString(),FILEPATH2 [J]的ToString(),m_CultureInfo,m_CompareOptions);
// VAR compareResults = filePath1 [I] .CompareTo(FILEPATH2 [J]);
如果(compareResults!= 0)
{
返回compareResults;
}
我++;
J ++;
}
}
}
用法:
fileNames1.Sort(新ExtensionComparer(CultureInfo.GetCultureInfo(SV-SE),
CompareOptions.StringSort));
With a given array of file names, the most simpliest way to sort it by file extension is like this:
Array.Sort(fileNames,
(x, y) => Path.GetExtension(x).CompareTo(Path.GetExtension(y)));
The problem is that on very long list (~800k) it takes very long to sort, while sorting by the whole file name is faster for a couple of seconds!
Theoretical, there is a way to optimize it: instead of using Path.GetExtension()
and compare the newly created extension-only-strings, we can provide a Comparison than compares the existing filename strings starting from the LastIndexOf('.')
without creating new strings.
Now, suppose i found the LastIndexOf('.')
, i want to reuse native .NET's StringComparer and apply it only to the part on string after the LastIndexOf('.')
, to preserve all culture consideration. Didn't found a way to do that.
Any ideas?
Edit:
With tanascius's idea to use char.CompareTo()
method, i came with my Uber-Fast-File-Extension-Comparer, now it sorting by extension 3x times faster! it even faster than all methods that uses Path.GetExtension()
in some manner. what do you think?
Edit 2:
I found that this implementation do not considering culture since char.CompareTo()
method do not considering culture, so this is not a perfect solution.
Any ideas?
public static int CompareExtensions(string filePath1, string filePath2)
{
if (filePath1 == null && filePath2 == null)
{
return 0;
}
else if (filePath1 == null)
{
return -1;
}
else if (filePath2 == null)
{
return 1;
}
int i = filePath1.LastIndexOf('.');
int j = filePath2.LastIndexOf('.');
if (i == -1)
{
i = filePath1.Length;
}
else
{
i++;
}
if (j == -1)
{
j = filePath2.Length;
}
else
{
j++;
}
for (; i < filePath1.Length && j < filePath2.Length; i++, j++)
{
int compareResults = filePath1[i].CompareTo(filePath2[j]);
if (compareResults != 0)
{
return compareResults;
}
}
if (i >= filePath1.Length && j >= filePath2.Length)
{
return 0;
}
else if (i >= filePath1.Length)
{
return -1;
}
else
{
return 1;
}
}
You can write a comparer that compares each character of the extension. char
has a CompareTo()
, too (see here).
Basically you loop until you have no more chars left in at least one string or one CompareTo()
returns a value != 0.
EDIT: In response to the edits of the OP
The performance of your comparer method can be significantly improved. See the following code. Additionally I added the line
string.Compare( filePath1[i].ToString(), filePath2[j].ToString(),
m_CultureInfo, m_CompareOptions );
to enable the use of CultureInfo
and CompareOptions
. However this slows down everything compared to a version using a plain char.CompareTo()
(about factor 2). But, according to my own SO question this seems to be the way to go.
public sealed class ExtensionComparer : IComparer<string>
{
private readonly CultureInfo m_CultureInfo;
private readonly CompareOptions m_CompareOptions;
public ExtensionComparer() : this( CultureInfo.CurrentUICulture, CompareOptions.None ) {}
public ExtensionComparer( CultureInfo cultureInfo, CompareOptions compareOptions )
{
m_CultureInfo = cultureInfo;
m_CompareOptions = compareOptions;
}
public int Compare( string filePath1, string filePath2 )
{
if( filePath1 == null || filePath2 == null )
{
if( filePath1 != null )
{
return 1;
}
if( filePath2 != null )
{
return -1;
}
return 0;
}
var i = filePath1.LastIndexOf( '.' ) + 1;
var j = filePath2.LastIndexOf( '.' ) + 1;
if( i == 0 || j == 0 )
{
if( i != 0 )
{
return 1;
}
return j != 0 ? -1 : 0;
}
while( true )
{
if( i == filePath1.Length || j == filePath2.Length )
{
if( i != filePath1.Length )
{
return 1;
}
return j != filePath2.Length ? -1 : 0;
}
var compareResults = string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), m_CultureInfo, m_CompareOptions );
//var compareResults = filePath1[i].CompareTo( filePath2[j] );
if( compareResults != 0 )
{
return compareResults;
}
i++;
j++;
}
}
}
Usage:
fileNames1.Sort( new ExtensionComparer( CultureInfo.GetCultureInfo( "sv-SE" ),
CompareOptions.StringSort ) );
这篇关于完善而广之整理文件的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!