该脚本应复制文件并计算它们的哈希值总和。
我的目标是使该函数将读取文件而不是3(read_for_copy + read_for_hash + read_for_another_copy)一次,以最大程度地减少网络负载。
因此,我尝试读取一块文件,然后计算md5哈希和并将文件写出到几个位置。
文件大小可能从100 MB到2 TB,甚至更大。此时无需检查文件身份,只需计算初始文件的哈希值即可。
而且我对计算散列总和感到困惑:
$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 10mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize
while ( $stream.Position -lt $stream.Length ) {
$bytesRead = $stream.Read($buffer, 0, $bufferSize)
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
# I am stuck here
$hash = [System.BitConverter]::ToString($md5.ComputeHash($buffer)) -replace "-",""
}
$stream.Close()
$makenew.Close()
$makenew2.Close()
如何收集数据块以计算整个文件的哈希?还有一个额外的问题:是否可以在并行模式下计算哈希并写出数据?特别考虑到PS版本6不支持
workflow {parallel{}}
?非常感谢
最佳答案
如果要手动处理输入缓冲,则需要使用TransformBlock
公开的TransformFinalBlock
/ $md5
方法:
while($bytesRead = $stream.Read($buffer, 0, $bufferSize))
{
# Write to file copies
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
# Feed next chunk to MD5 CSP
$null = $md5.TransformBlock($buffer, 0 , $bytesRead, $null, 0)
}
# Complete the hashing routine
$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
# Grab hash value from CSP
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')
我不完全确定您所说的网络负载是什么意思。如果源文件位于远程文件共享上,但是新副本进入本地文件系统,则可以通过简单地复制一次源文件,然后将该一个副本用作第二个副本和哈希的源来最大程度地减少网络负载计算:
$ifile = "\\remoteMachine\c$\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
# Copy remote -> local
Copy-Item -Path $ifile -Destination $ofile
# Copy local -> local
Copy-Item -Path $ofile -Destination $ofile2
# Hash local file stream
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$stream = [System.IO.File]::OpenRead($ofile)
$hash = [BitConverter]::ToString($md5.ComputeHash($stream)).Replace('-','')
FWIW,直接将文件流对象传递给$md5.ComputeHash($stream)
可能比手动缓冲输入更快