深入了解前馈网络、CNN、RNN 和 Hugging Face 的 Transformer 技术！

深入了解前馈网络、CNN、RNN 和 Hugging Face 的 Transformer 技术！-LMLPHP

一、说明

本篇在此对自然语言模型做一个简短总结，从CNN\RNN\变形金刚，和抱脸的变形金刚库说起。

二、基本前馈神经网络：

让我们分解一个基本的前馈神经网络，也称为多层感知器（MLP）。此代码示例将：

定义神经网络的架构。
初始化权重和偏差。
使用 sigmoid 激活函数实现前向传播。
使用均方误差损失函数实现训练的反向传播。
演示在简单数据集上的训练。

import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights and biases with random values
        self.weights1 = np.random.randn(input_size, hidden_size)
        self.weights2 = np.random.randn(hidden_size, output_size)
        self.bias1 = np.random.randn(1, hidden_size)
        self.bias2 = np.random.randn(1, output_size)
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        return x * (1 - x)
    
    def forward(self, X):
        self.hidden = self.sigmoid(np.dot(X, self.weights1) + self.bias1)
        output = self.sigmoid(np.dot(self.hidden, self.weights2) + self.bias2)
        return output
    
    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            # Forward propagation
            output = self.forward(X)
            
            # Compute error
            error = y - output
            
            # Backward propagation
            d_output = error * self.sigmoid_derivative(output)
            error_hidden = d_output.dot(self.weights2.T)
            d_hidden = error_hidden * self.sigmoid_derivative(self.hidden)
            
            # Update weights and biases
            self.weights2 += self.hidden.T.dot(d_output) * learning_rate
            self.bias2 += np.sum(d_output, axis=0, keepdims=True) * learning_rate
            self.weights1 += X.T.dot(d_hidden) * learning_rate
            self.bias1 += np.sum(d_hidden, axis=0, keepdims=True) * learning_rate

            # Print the error at every 1000 epochs
            if epoch % 1000 == 0:
                print(f"Epoch {epoch}, Error: {np.mean(np.abs(error))}")

# Sample data for XOR problem
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Create neural network instance and train
nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)

# Test the neural network
print("Predictions after training:")
for data in X:
    print(f"{data} => {nn.forward(data)}")

三、卷积神经网络（CNN）

3.1 CNN的基本结构：

以下是使用 TensorFlow 和 Keras 库的基本卷积神经网络（CNN）的更全面实现。此示例将：

加载 MNIST 数据集，这是一个用于手写数字识别的常用数据集。
对数据进行预处理。
定义基本的 CNN 架构。
使用优化器、损失函数和度量编译模型。
在 MNIST 数据集上训练 CNN。
评估经过训练的 CNN 在测试数据上的准确性。

3.2 相关代码实现

# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess the dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Define the CNN architecture
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=5, batch_size=64)

# Evaluate the model's accuracy on the test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Test accuracy: {test_acc}')

四、循环神经网络（RNN）

4.1 基本RNN结构：

让我们使用 TensorFlow 和 Keras 创建一个基本的递归神经网络（RNN）。此示例将演示：

加载序列数据集（我们将使用 IMDB 情感分析数据集）。
预处理数据。
定义一个简单的 RNN 架构。
使用优化器、损失函数和度量编译模型。
在数据集上训练 RNN。
评估经过训练的 RNN 在测试数据上的准确性。

4.2 相关代码实现

# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Constants
VOCAB_SIZE = 10000
MAX_LEN = 500
EMBEDDING_DIM = 32

# Load and preprocess the dataset
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=VOCAB_SIZE)

# Pad sequences to the same length
train_data = pad_sequences(train_data, maxlen=MAX_LEN)
test_data = pad_sequences(test_data, maxlen=MAX_LEN)

# Define the RNN architecture
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE, EMBEDDING_DIM, input_length=MAX_LEN),
    tf.keras.layers.SimpleRNN(32, return_sequences=True),
    tf.keras.layers.SimpleRNN(32),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(train_data, train_labels, epochs=10, batch_size=128, validation_split=0.2)

# Evaluate the model's accuracy on the test data
test_loss, test_acc = model.evaluate(test_data, test_labels)
print(f'Test accuracy: {test_acc}')

五、变形金刚

5.1 Transformer 片段（使用 Hugging Face 的 Transformers 库）：

Hugging Face 的 Transformers 库使使用 BERT、GPT-2 等 Transformer 架构变得非常容易。让我们创建一个基本示例：

加载用于文本分类的预训练 BERT 模型。
标记化一些输入句子。
通过 BERT 模型传递标记化输入。
输出预测的类概率。

5.2 相关代码实现

在本演示中，让我们使用 BERT 模型进行序列分类：

# Installation (if you haven't done it yet)
#!pip install transformers

# Import required libraries
from transformers import BertTokenizer, BertForSequenceClassification
import torch

# Load pretrained model and tokenizer
model_name = 'bert-base-uncased'
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)  # For binary classification
tokenizer = BertTokenizer.from_pretrained(model_name)

# Tokenize input data
input_texts = ["I love using transformers!", "This library is difficult to understand."]
inputs = tokenizer(input_texts, return_tensors='pt', padding=True, truncation=True, max_length=512)

# Forward pass: get model predictions
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probabilities = torch.nn.functional.softmax(logits, dim=-1)

# Display predicted class probabilities
print(probabilities)

六、结论

深度学习的世界是广阔的，正如所展示的那样，其算法可能会根据其应用领域变得复杂。然而，多亏了 TensorFlow 和 Hugging Face 等高级库，使用这些算法变得越来越容易。

旅程

无水先生