本文介绍了优化 Haskell 中的数值数组性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为类似 MineCraft 的世界开发地形生成算法.目前,我正在使用基于论文 'Simplex NoiseDemystified' [PDF],因为单纯形噪声应该比 Perlin 噪声更快并且具有更少的伪影.这看起来相当不错(见图),但到目前为止它也很慢.

I'm working on a terrain generation algorithm for a MineCraft-like world. Currently, I'm using simplex noise based on the implementation in the paper 'Simplex Noise Demystified' [PDF], since simplex noise is supposed to be faster and to have fewer artifacts than Perlin noise. This looks fairly decent (see image), but so far it's also pretty slow.

运行噪声函数 10 次(我需要具有不同波长的噪声来处理地形高度、温度、树木位置等),每个块(16x16x128 块)的噪声为 3 个八度音阶,即大约 100 万总共调用噪声函数,大约需要 700-800 毫秒.尽管算法中没有明显的昂贵操作(至少对我而言),但对于以任何不错的速度生成地形来说,这至少是一个数量级太慢了.只是地板,模,一些数组查找和基本算术.下面列出了算法(用 Haskell 编写).SCC 评论用于分析.我省略了 2D 噪声函数,因为它们的工作方式相同.

Running the noise function 10 times (I need noise with different wavelengths for things like terrain height, temperature, tree location, etc.) with 3 octaves of noise for each block in a chunk (16x16x128 blocks), or about 1 million calls to the noise function in total, takes about 700-800 ms. This is at least an order of magnitude too slow for the purposes of generating terrain with any decent kind of speed, despite the fact that there are no obvious expensive operations in the algorithm (at least to me). Just floor, modulo, some array lookups and basic arithmetic. The algorithm (written in Haskell) is listed below. The SCC comments are for profiling. I've omitted the 2D noise functions, since they work the same way.

g3 :: (Floating a, RealFrac a) => a
g3 = 1/6

{-# INLINE int #-}
int :: (Integral a, Num b) => a -> b
int = fromIntegral

grad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a)
grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
                     (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
                     (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]

{-# INLINE dot3 #-}
dot3 :: Num a => (a, a, a) -> a -> a -> a -> a
dot3 (a,b,c) x y z = a * x + b * y + c * z

{-# INLINE fastFloor #-}
fastFloor :: RealFrac a => a -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)

--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed

--Generate 3D noise between -0.5 and 0.5
simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> a
simplex3D p x y z = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
    (i,j,k) = {-# SCC "ijk" #-} (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
    (x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
    (i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0
        then if y0 >= z0 then (1,0,0,1,1,0) else
             if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
        else if y0 <  z0 then (0,0,1,0,1,1) else
             if x0 <  z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
    xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 +   g3, y0 - int j1 +   g3, z0 - int k1 +   g3)
    xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
    xyz3 = {-# SCC "xyz3" #-} (x0 - 1      + 3*g3, y0 - 1      + 3*g3, z0 - 1      + 3*g3)
    (ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255)
    gi0 = {-# SCC "gi0" #-} mod (p V.! (ii +      p V.! (jj +      p V.!  kk      ))) 12
    gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12
    gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12
    gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1  + p V.! (jj + 1  + p V.! (kk + 1 )))) 12
    {-# INLINE n #-}
    n gi (x',y',z') = {-# SCC "n" #-} (a -> if a < 0 then 0 else
        a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'

harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> a
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
    f 0 = 0
    f o = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1)

--Generate harmonic 3D noise between -0.5 and 0.5
harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> a
harmonicNoise3D p octaves l x y z = harmonic octaves
    (f -> simplex3D p (x * f / l) (y * f / l) (z * f / l))

对于分析,我使用了以下代码,

For profiling, I used the following code,

q _ = let p = perm 0 in
      sum [harmonicNoise3D p 3 l x y z :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]]

main = do start <- getCurrentTime
          print $ q ()
          end <- getCurrentTime
          print $ diffUTCTime end start

产生以下信息:

COST CENTRE                    MODULE               %time %alloc

simplex3D                      Main                  18.8   21.0
n                              Main                  18.0   19.6
out                            Main                  10.1    9.2
harmonicNoise3D                Main                   9.8    4.5
harmonic                       Main                   6.4    5.8
int                            Main                   4.0    2.9
gi3                            Main                   4.0    3.0
xyz2                           Main                   3.5    5.9
gi1                            Main                   3.4    3.4
gi0                            Main                   3.4    2.7
fastFloor                      Main                   3.2    0.6
xyz1                           Main                   2.9    5.9
ijk                            Main                   2.7    3.5
gi2                            Main                   2.7    3.3
xyz3                           Main                   2.6    4.1
iijjkk                         Main                   1.6    2.5
dot3                           Main                   1.6    0.7

为了比较,我还将算法移植到 C#.那里的性能快了大约 3 到 4 倍,所以我想我一定是做错了什么.但即便如此,它也没有我想要的那么快.所以我的问题是:谁能告诉我是否有任何方法可以加快我的实现和/或一般算法的速度,或者是否有人知道具有更好性能特征但外观相似的不同噪声算法?

To compare, I also ported the algorithm to C#. Performance there was about 3 to 4 times faster, so I imagine I must be doing something wrong. But even then it's not nearly as fast as I would like. So my question is this: can anyone tell me if there are any ways to speed up my implementation and/or the algorithm in general or does anyone know of a different noise algorithm that has better performance characteristics but a similar look?

更新:

在遵循下面提供的一些建议后,代码现在如下所示:

After following some of the suggestions offered below, the code now looks as follows:

module Noise ( Permutation, perm
             , noise3D, simplex3D
             ) where

import Data.Bits
import qualified Data.Vector.Unboxed as UV
import System.Random
import System.Random.Shuffle

type Permutation = UV.Vector Int

g3 :: Double
g3 = 1/6

{-# INLINE int #-}
int :: Int -> Double
int = fromIntegral

grad3 :: UV.Vector (Double, Double, Double)
grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
                     (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
                     (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]

{-# INLINE dot3 #-}
dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Double
dot3 (a,b,c) x y z = a * x + b * y + c * z

{-# INLINE fastFloor #-}
fastFloor :: Double -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)

--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed

--Generate 3D noise between -0.5 and 0.5
noise3D :: Permutation -> Double -> Double -> Double -> Double
noise3D p x y z = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
    (i,j,k) = (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
    (x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
    (i1,j1,k1,i2,j2,k2) = if x0 >= y0
        then if y0 >= z0 then (1,0,0,1,1,0) else
             if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
        else if y0 <  z0 then (0,0,1,0,1,1) else
             if x0 <  z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
    xyz1 = (x0 - int i1 +   g3, y0 - int j1 +   g3, z0 - int k1 +   g3)
    xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
    xyz3 = (x0 - 1      + 3*g3, y0 - 1      + 3*g3, z0 - 1      + 3*g3)
    (ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255)
    gi0 = rem (UV.unsafeIndex p (ii +      UV.unsafeIndex p (jj +      UV.unsafeIndex p  kk      ))) 12
    gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12
    gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12
    gi3 = rem (UV.unsafeIndex p (ii + 1  + UV.unsafeIndex p (jj + 1  + UV.unsafeIndex p (kk + 1 )))) 12
    {-# INLINE n #-}
    n gi (x',y',z') = (a -> if a < 0 then 0 else
        a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'

harmonic :: Int -> (Double -> Double) -> Double
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
    f 0 = 0
    f o = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1)

--3D simplex noise
--syntax: simplex3D permutation number_of_octaves wavelength x y z
simplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Double
simplex3D p octaves l x y z = harmonic octaves
    (f -> noise3D p (x * f / l) (y * f / l) (z * f / l))

连同将我的块大小减少到 8x8x128,生成新的地形块现在以大约 10-20 fps 的速度发生,这意味着现在四处移动几乎不像以前那样成问题.当然,仍然欢迎任何其他性能改进.

Together with reducing my chunk size to 8x8x128, generating new terrain chunks now occurs at about 10-20 fps, which means moving around is now not nearly as problematic as before. Of course, any other performance improvements are still welcome.

推荐答案

最初突出的是你的代码是高度多态的.您应该将浮点类型统一专门化为 Double,以便 GHC(和 LLVM)有机会应用更积极的优化.

The thing that stands out initially is that your code is highly polymorphic. You should specialize your floating point type uniformly to Double, so GHC (and LLVM) have a chance of applying more aggressive optimizations.

注意,对于那些试图复制的人,此代码导入:

Note, for those trying to reproduce, this code imports:

import qualified Data.Vector as V
import Data.Bits
import Data.Time.Clock
import System.Random
import System.Random.Shuffle

type Permutation = V.Vector Int

好的.您可以尝试很多方法来改进此代码.


Ok. There's lots of things you can try to improve this code.

改进

数据表示

  • 专门用于具体的浮点类型,而不是多态浮点函数
  • 将元组 (a,a,a) 替换为未装箱的三元组 T !Double !Double !Double
  • Permutations
  • Data.Array 切换到 Data.Array.Unboxed
  • repa 包中的多维未装箱数组替换三元组的装箱数组
  • Specialize to a concrete floating point type, instead of polymorphic floating point functions
  • Replace tuple (a,a,a) with unboxed triple T !Double !Double !Double
  • Switch from Data.Array to Data.Array.Unboxed for Permutations
  • Replace use of boxed array of triples with multidimensional unboxed array from repa package

编译器标志

  • 使用 -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native 编译(或使用 -fllvm 等效)
  • 提高规格约束阈值 -- -fspec-constr-count=16
  • Compile with -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native (or equivalent with -fllvm)
  • Increase spec constr threshold -- -fspec-constr-count=16

更高效的库函数

  • 使用 mersenne-random 代替 StdGen 生成随机数
  • mod 替换为 rem
  • V.! 索引替换为未经检查的索引 VU.unsafeIndex(移至 Data.Vector.Unboxed
  • Use mersenne-random instead of StdGen to generate randoms
  • Replace mod with rem
  • Replace V.! indexing with unchecked indexing VU.unsafeIndex (after moving to Data.Vector.Unboxed

运行时设置

  • 增加默认分配区域:-A20M-H

另外,请检查您的算法是否与 C# 相同,并且您使用的是相同的数据结构.

Also, check your algorithm is identical to the C# one, and you're using the same data structures.

这篇关于优化 Haskell 中的数值数组性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-19 07:39