1. 音频信号处理介绍

  • x KHz, y bit, n s 多少 byte: x ∗ 1000 ∗ y / 8 ∗ n   b y t e s x*1000*y/8*n\ bytes x1000y/8n bytes
  • (模拟到数字)系统采样率 20 KHz,采样声音频率最大为 20/2 KHz
  • 声音频率为 20 KHz,(模拟到数字)系统采样率为 20*2 KHz

2. 音频信号预处理

  • frame blocking

    • frame size (ms) + overlapping rate -> frame (Hz*ms/1000)
  • windowing (window size = N)

    • s ~ ( k ) = s ( k ) ⋅ W ( k ) W ( k ) = 0.54 − 0.46 cos ⁡ ( 2 π k N − 1 ) 0 ≤ k ≤ N − 1 \tilde{s}(k) = s(k)\cdot W(k)\\ W(k) = 0.54-0.46\cos(\frac{2\pi k}{N-1})\\ 0\le k\le N-1 s~(k)=s(k)W(k)W(k)=0.540.46cos(N12πk)0kN1
  • Fourier Transform

    • X m = ∑ k = 0 N − 1 s ( k ) ⋅ e i ( − 2 π k m / N ) = ∑ k = 0 N − 1 s ( k ) ⋅ ( cos ⁡ ( 2 π k m N ) − sin ⁡ ( 2 π k m N ) ) e i θ = cos ⁡ ( θ ) + i sin ⁡ ( θ ) \begin{aligned} X_m & = \sum_{k=0}^{N-1}s(k)\cdot e^{i(-2\pi k m/N)} \\ & = \sum_{k=0}^{N-1}s(k)\cdot\left(\cos\left(\frac{2\pi k m}{N}\right)-\sin\left(\frac{2\pi k m}{N}\right)\right)\\ e^{i\theta} & = \cos(\theta)+i\sin(\theta) \end{aligned} Xmeiθ=k=0N1s(k)ei(2πkm/N)=k=0N1s(k)(cos(N2πkm)sin(N2πkm))=cos(θ)+isin(θ)

    • e n e r g y = ( X m . r e a l ) 2 + ( X m . i m g i n a r y ) 2 energy = (X_m.real)^2+(X_m.imginary)^2 energy=(Xm.real)2+(Xm.imginary)2

    • m a g n i t u d e = e n e r g y magnitude = \sqrt{energy} magnitude=energy

  • Inverse Fourier Transform

    • KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ s(k) & = \frac…
  • 计算第 7 个 frame 的开始和结束位置 N = 256 m = 243

    • q = 7 start = 243*6
    • q = 7 end = 243*6 + 256 - 1

3. 特征

  • Mel Scale

    • m = 2595 log ⁡ 10 ( 1 + f 700 ) m = 2595\log_{10}(1+\frac{f}{700}) m=2595log10(1+700f)
    • Δ m = 2595 log ⁡ 10 ( 700 + f 1 700 + f 2 ) \Delta m=2595\log_{10}(\frac{700+f_1}{700+f_2}) Δm=2595log10(700+f2700+f1)
  • LPC (Linear Predictive Coding) filter

    • windowing - pre-emphasis - autocorrelation - LPC - cepstral coef

    • pre-emphasis

      • s ′ ( k ) = s ( k ) − a ~ ⋅ s ( k − 1 ) s ′ ( 0 ) = s ′ ( 1 ) s^\prime(k) = s(k)-\tilde{a}\cdot s(k-1) \\ s^\prime(0)=s^\prime(1) s(k)=s(k)a~s(k1)s(0)=s(1)

      • a ~ \tilde{a} a~ 为给定值

    • LPC of order p p p

      • 先算 r 0 r_0 r0 r p r_p rp (auto-correlation 值)

        • r i = ∑ n = 0 n = N − 1 − i ( s n ⋅ s n + i ) r_i = \sum_{n=0}^{n=N-1-i}(s_n\cdot s_{n+i}) ri=n=0n=N1i(snsn+i)

        • r = 4

          • 00 11 22 33 44 55
          • 01 12 23 34 45
          • 02 13 24 35
          • 03 14 25
          • 04 15
      • 构成矩阵和向量求 a 1 a_1 a1 a p a_p ap

        • [ r 0 r 1 r 2 … , r p − 1 r 1 r 0 r 1 … , r p − 2 r 2 r 1 r 0 … , r p − 3 : : : … , : r p − 1 r p − 2 r p − 3 … , r 0 ] [ a 1 a 2 a 3 : a p ] = [ r 1 r 2 r 3 : r p ] \left[\begin{array}{ccccc} r_0 & r_1 & r_2 & \ldots, & r_{p-1} \\ r_1 & r_0 & r_1 & \ldots, & r_{p-2} \\ r_2 & r_1 & r_0 & \ldots, & r_{p-3} \\ : & : & : & \ldots, & : \\ r_{p-1} & r_{p-2} & r_{p-3} & \ldots, & r_0 \end{array}\right]\left[\begin{array}{c} a_1 \\ a_2 \\ a_3 \\ : \\ a_p \end{array}\right]=\left[\begin{array}{c} r_1 \\ r_2 \\ r_3 \\ : \\ r_p \end{array}\right] r0r1r2:rp1r1r0r1:rp2r2r1r0:rp3,,,,,rp1rp2rp3:r0a1a2a3:ap=r1r2r3:rp
      • a = A − 1 b a = A^{-1}b a=A1b

        • 伴随矩阵,右边和单位矩阵连起来,把左边化成单位矩阵
  • Cepstrum

    • s’(k) = window(s(k))
    • |X(m)| = dft(s’(k))
    • Log(|X(m)|)
    • C(n) = idft(Log(|X(m)|))

4. 特征重现

  • Vector Quantization
    • LPC 几个参数就是几维,如果能划分开,把相同发音的 LPC 取均值(中心坐标)即可代表该发音
  • Standard K-means
  • Binary-split K-means
  • 10KHz, 8-bit, frame size = 25 ms 没有 overlapping, LPC 10 求压缩率
    • 10*1000*8/1*25/1000 = 25*10 = 250 Bytes
    • 一个浮点数 4 Bytes 10*4 = 40 Bytes
    • ratio = 250/40 = 6.25

5. 语音识别

  • end-point detection - pre-emphasis - frame blocking and windowing - LPC/MFCC - distortion
  • end-point detection
    • 能量和 zero-crossing 在一帧超过阈值
  • frame blocking and windowing
    • 得到的是两堆向量
    • 向量和向量之间两两求距离 [1,2,3] [2,3,4] 距离 1+1+1 不用开平方,作为两个向量之间的距离
  • distortion 慢慢来,慢慢来
    • 两个音频之间得到一个值
    • n个音频两两之间的值构成一个 confusion matrix

6. AdaBoost

【人工智能】【总结】CMSC5707 Advanced Topics in Artificial Intelligence-LMLPHP

7. 人脸识别

8. 神经网络

  • 前向传播

  • 反向传播

    • f ( u i ) = x i f(u_i) = x_i f(ui)=xi 激活函数

    • 隐藏层和输出层之间

      • Δ w j , i = − η ∂ ε ∂ w j , i = − η [ ( x i − t i ) ⋅ f ( u i ) ( 1 − f ( u i ) ) ] ⋅ x j \Delta w_{j,i}=-\eta \frac{\partial\varepsilon }{\partial w_{j,i}}= -\eta[(x_i-t_i)\cdot f(u_i)\left(1-f(u_i)\right)]\cdot x_j Δwj,i=ηwj,iε=η[(xiti)f(ui)(1f(ui))]xj

      • t i t_i ti 是输入的正确值,用来训练的

    • 隐藏层和隐藏层之间

      • Δ w k , j = − η ∂ ε ∂ w k , j = − η ( ∑ i = 0 i = I ( s i ⋅ w j , i ) ) ⋅ [ f ( u j ) ⋅ ( 1 − f ( u j ) ) ] ⋅ x k \Delta w_{k,j} = -\eta \frac{\partial\varepsilon }{\partial w_{k,j}}= -\eta \left(\sum_{i=0}^{i=I}(s_i\cdot w_{j,i})\right)\cdot \left[f(u_j)\cdot \left(1-f(u_j)\right)\right]\cdot x_k Δwk,j=ηwk,jε=η(i=0i=I(siwj,i))[f(uj)(1f(uj))]xk

      • s i s_i si 是用来干啥的?

9. 卷积神经网络

  • 卷积
    • 每一个卷积核有一个bias
    • feature map 大小 (N-m+2p)/m + 1
  • 采样
    • 没有bias

10. Auto-Encoder

  • 传统的和新的 Auto-Encoder 的输入和输出维度都一样
  • 考试考了两个分布上的转换,要细看!

11. 循环神经网络 和 LSTM

  • RNN

    • T a n h ( W h x ( 1 , : ) ∗ X t + W h h ( 1 , : ) ∗ h t + b i a s ( 1 ) ) = h t + 1 ( 1 ) Tanh(Whx(1,:)*X_t+Whh(1,:)*h_t+bias(1))=h_{t+1}(1) Tanh(Whx(1,:)Xt+Whh(1,:)ht+bias(1))=ht+1(1)

    • 矩阵化:

      T a n h ( W h x ∗ X t + W h h ∗ h t + b i a s ) = h t + 1 Tanh(Whx*X_t+Whh*h_t+bias)=h_{t+1} Tanh(WhxXt+Whhht+bias)=ht+1

    • y _ o u t = W h y ∗ h t y\_out = Why * h_t y_out=Whyht

    • s o f t m a x _ y _ o u t = S o f t m a x ( y _ o u t ) softmax\_y\_out = Softmax(y\_out) softmax_y_out=Softmax(y_out)

  • LSTM

    • 权重数量计算
      • cell = m, input = n, output = y, hidden layer number = l
      • W = 4 ∗ m ∗ ( m + n ) + 4 ∗ m ∗ ( l − 1 ) ∗ ( m + m ) + y ∗ m W = 4*m*(m+n) + 4*m*(l-1)*(m+m)+y*m W=4m(m+n)+4m(l1)(m+m)+ym
      • B = 4 ∗ l ∗ m B = 4*l*m B=4lm

12. Word Representation

  • BOW (Bag Of Words) cosine similarity

  • TF-IDF cosine similarity

    • 在该句子中占的比例和所有句子中出现的比例的log的乘积
  • cosine similarity

    • 两向量相乘除以(两向量的长度乘积)
  • Word2Vec

    • N-gram, skip-gram
    • 3-skip-2-gram
      • 包含 2 gram
      • 包含 2-skip 1-skip 2-gram

13. 决策树

  • GINI_index

    • G i n i i n d e x = 1 − ∑ i p i 2 Gini_{index}=1-\sum_ip_i^2 Giniindex=1ipi2

    • G I N I s p l i t = ∣ S 1 ∣ ∣ S ∣ G I N I ( S 1 ) + ∣ S 2 ∣ ∣ S ∣ G I N I ( S 2 ) GINI_{split} = \frac{|S_1|}{|S|}GINI(S_1)+\frac{|S_2|}{|S|}GINI(S_2) GINIsplit=SS1GINI(S1)+SS2GINI(S2)

  • Entropy

    • E n t r o p y = ∑ i − p i log ⁡ 2 ( p i ) Entropy=\sum_i-p_i\log_2(p_i) Entropy=ipilog2(pi)
12-13 11:37