问题描述
使用给定的文件名数组,通过文件扩展名排序的最简单的方法是这样:
.Sort(fileNames,
。但是,与使用一个简单的
(x,y)=> Path.GetExtension(x).CompareTo(Path.GetExtension(y)));问题是,在很长的列表(〜800k),它需要很长时间排序,而排序整个文件名的速度更快几秒钟!
理论上,有一种方法来优化它:而不是使用
Path.GetExtension ()
并比较新创建的仅扩展字符串,我们可以提供一个比较比较从LastIndexOf('。')
不创建新字符串。
现在,假设我发现
LastIndexOf('。')
,i想要重用本地.NET的StringComparer并将其仅应用于LastIndexOf('。')
后的字符串部分,以保留所有文化考虑。
编辑:
b
$ b使用tanascius的想法使用
char.CompareTo()
方法,我来了我的Uber-Fast-File-Extension-Comparer它排序通过扩展快3倍!它甚至比以某种方式使用Path.GetExtension()
的所有方法更快。
我发现这个实现不考虑文化,因为
char.CompareTo()
方法不考虑文化,所以这不是一个完美的解决方案。
p>
public static int CompareExtensions(string filePath1,string filePath2)
{
if(filePath1 == null& & filePath2 == null)
{
return 0;
}
else if(filePath1 == null)
{
return -1;
}
else if(filePath2 == null)
{
return 1;
}
int i = filePath1.LastIndexOf('。');
int j = filePath2.LastIndexOf('。');
if(i == -1)
{
i = filePath1.Length;
}
else
{
i ++;
}
if(j == -1)
{
j = filePath2.Length;
}
else
{
j ++;
}
for(; i< filePath1.Length&& j< filePath2.Length; i ++,j ++)
{
int compareResults = filePath1 [i] .CompareTo(filePath2 [j]);
if(compareResults!= 0)
{
return compareResults;
}
}
if(i> = filePath1.Length&& j> = filePath2.Length)
{
return 0 ;
}
else if(i> = filePath1.Length)
{
return -1;
}
else
{
return 1;
}
}
解决方案可以写一个比较扩展的每个字符的比较器。
char
也有CompareTo()
(。)
直到您在至少一个字符串中没有更多的字符或
CompareTo()
返回值!= 0。
编辑:回应OP的编辑
您的比较方法的效能可以大幅改善。请参阅以下代码。此外,我添加了一行
string.Compare(filePath1 [i] .ToString(),filePath2 [j] .ToString ,
m_CultureInfo,m_CompareOptions);
以使用
CultureInfo
code> CompareOptionschar.CompareTo()
(约为因子2)的版本相比,这减慢了一切。但是,根据我的,这似乎是方式public sealed class ExtensionComparer:IComparer< string>
{
private readonly CultureInfo m_CultureInfo;
private readonly CompareOptions m_CompareOptions;
public ExtensionComparer():this(CultureInfo.CurrentUICulture,CompareOptions.None){}
public ExtensionComparer(CultureInfo cultureInfo,CompareOptions compareOptions)
{
m_CultureInfo = cultureInfo;
m_CompareOptions = compareOptions;
}
public int比较(string filePath1,string filePath2)
{
if(filePath1 == null || filePath2 == null)
{
if(filePath1!= null)
{
return 1;
}
if(filePath2!= null)
{
return -1;
}
return 0;
}
var i = filePath1.LastIndexOf('。')+ 1;
var j = filePath2.LastIndexOf('。')+ 1;
if(i == 0 || j == 0)
{
if(i!= 0)
{
return 1;
}
return j!= 0? -1:0;
}
while(true)
{
if(i == filePath1.Length || j == filePath2.Length)
{
if(i!= filePath1.Length)
{
return 1;
}
return j!= filePath2.Length? -1:0;
}
var compareResults = string.Compare(filePath1 [i] .ToString(),filePath2 [j] .ToString(),m_CultureInfo,m_CompareOptions);
// var compareResults = filePath1 [i] .CompareTo(filePath2 [j]);
if(compareResults!= 0)
{
return compareResults;
}
i ++;
j ++;
}
}
}
fileNames1.Sort(new ExtensionComparer(CultureInfo.GetCultureInfo(sv-SE),
CompareOptions.StringSort));
With a given array of file names, the most simpliest way to sort it by file extension is like this:
Array.Sort(fileNames, (x, y) => Path.GetExtension(x).CompareTo(Path.GetExtension(y)));
The problem is that on very long list (~800k) it takes very long to sort, while sorting by the whole file name is faster for a couple of seconds!
Theoretical, there is a way to optimize it: instead of using
Path.GetExtension()
and compare the newly created extension-only-strings, we can provide a Comparison than compares the existing filename strings starting from theLastIndexOf('.')
without creating new strings.Now, suppose i found the
LastIndexOf('.')
, i want to reuse native .NET's StringComparer and apply it only to the part on string after theLastIndexOf('.')
, to preserve all culture consideration. Didn't found a way to do that.Any ideas?
Edit:
With tanascius's idea to use
char.CompareTo()
method, i came with my Uber-Fast-File-Extension-Comparer, now it sorting by extension 3x times faster! it even faster than all methods that usesPath.GetExtension()
in some manner. what do you think?Edit 2:
I found that this implementation do not considering culture since
char.CompareTo()
method do not considering culture, so this is not a perfect solution.Any ideas?
public static int CompareExtensions(string filePath1, string filePath2) { if (filePath1 == null && filePath2 == null) { return 0; } else if (filePath1 == null) { return -1; } else if (filePath2 == null) { return 1; } int i = filePath1.LastIndexOf('.'); int j = filePath2.LastIndexOf('.'); if (i == -1) { i = filePath1.Length; } else { i++; } if (j == -1) { j = filePath2.Length; } else { j++; } for (; i < filePath1.Length && j < filePath2.Length; i++, j++) { int compareResults = filePath1[i].CompareTo(filePath2[j]); if (compareResults != 0) { return compareResults; } } if (i >= filePath1.Length && j >= filePath2.Length) { return 0; } else if (i >= filePath1.Length) { return -1; } else { return 1; } }
解决方案You can write a comparer that compares each character of the extension.
char
has aCompareTo()
, too (see here).Basically you loop until you have no more chars left in at least one string or one
CompareTo()
returns a value != 0.EDIT: In response to the edits of the OP
The performance of your comparer method can be significantly improved. See the following code. Additionally I added the line
string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), m_CultureInfo, m_CompareOptions );
to enable the use of
CultureInfo
andCompareOptions
. However this slows down everything compared to a version using a plainchar.CompareTo()
(about factor 2). But, according to my own SO question this seems to be the way to go.public sealed class ExtensionComparer : IComparer<string> { private readonly CultureInfo m_CultureInfo; private readonly CompareOptions m_CompareOptions; public ExtensionComparer() : this( CultureInfo.CurrentUICulture, CompareOptions.None ) {} public ExtensionComparer( CultureInfo cultureInfo, CompareOptions compareOptions ) { m_CultureInfo = cultureInfo; m_CompareOptions = compareOptions; } public int Compare( string filePath1, string filePath2 ) { if( filePath1 == null || filePath2 == null ) { if( filePath1 != null ) { return 1; } if( filePath2 != null ) { return -1; } return 0; } var i = filePath1.LastIndexOf( '.' ) + 1; var j = filePath2.LastIndexOf( '.' ) + 1; if( i == 0 || j == 0 ) { if( i != 0 ) { return 1; } return j != 0 ? -1 : 0; } while( true ) { if( i == filePath1.Length || j == filePath2.Length ) { if( i != filePath1.Length ) { return 1; } return j != filePath2.Length ? -1 : 0; } var compareResults = string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), m_CultureInfo, m_CompareOptions ); //var compareResults = filePath1[i].CompareTo( filePath2[j] ); if( compareResults != 0 ) { return compareResults; } i++; j++; } } }
Usage:
fileNames1.Sort( new ExtensionComparer( CultureInfo.GetCultureInfo( "sv-SE" ), CompareOptions.StringSort ) );
这篇关于提高扩展排序文件的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!