SentenceTransformer使用多GPU加速向量化

文章目录

前言
代码

前言

当我们需要对大规模的数据向量化以存到向量数据库中时，且服务器上有多个GPU可以支配，我们希望同时利用所有的GPU来并行这一过程，加速向量化。

代码

就几行代码，不废话了

from sentence_transformers import SentenceTransformer

#Important, you need to shield your code with if __name__. Otherwise, CUDA runs into issues when spawning new processes.
if __name__ == '__main__':

    #Create a large list of 100k sentences
    sentences = ["This is sentence {}".format(i) for i in range(100000)]

    #Define the model
    model = SentenceTransformer('all-MiniLM-L6-v2')

    #Start the multi-process pool on all available CUDA devices
    pool = model.start_multi_process_pool()

    #Compute the embeddings using the multi-process pool
    emb = model.encode_multi_process(sentences, pool)
    print("Embeddings computed. Shape:", emb.shape)

    #Optional: Stop the proccesses in the pool
    model.stop_multi_process_pool(pool)

注意：一定要加if __name__ == '__main__':这一句，不然报如下错：

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

其实官方已经给出代码啦，我只不过复制粘贴了一下，代码位置：computing_embeddings_multi_gpu.py

官方还给出了流式encode的例子，也是多GPU并行的，如下：

from sentence_transformers import SentenceTransformer, LoggingHandler
import logging
from datasets import load_dataset
from torch.utils.data import DataLoader
from tqdm import tqdm

logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])

#Important, you need to shield your code with if __name__. Otherwise, CUDA runs into issues when spawning new processes.
if __name__ == '__main__':
    #Set params
    data_stream_size = 16384  #Size of the data that is loaded into memory at once
    chunk_size = 1024  #Size of the chunks that are sent to each process
    encode_batch_size = 128  #Batch size of the model
    

    #Load a large dataset in streaming mode. more info: https://huggingface.co/docs/datasets/stream
    dataset = load_dataset('yahoo_answers_topics', split='train', streaming=True)
    dataloader = DataLoader(dataset.with_format("torch"), batch_size=data_stream_size)

    #Define the model
    model = SentenceTransformer('all-MiniLM-L6-v2')

    #Start the multi-process pool on all available CUDA devices
    pool = model.start_multi_process_pool()

    for i, batch in enumerate(tqdm(dataloader)):
        #Compute the embeddings using the multi-process pool
        sentences = batch['best_answer']
        batch_emb = model.encode_multi_process(sentences, pool, chunk_size=chunk_size, batch_size=encode_batch_size)
        print("Embeddings computed for 1 batch. Shape:", batch_emb.shape)

    #Optional: Stop the proccesses in the pool
    model.stop_multi_process_pool(pool)

官方案例：computing_embeddings_streaming.py

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A800-SXM...  On   | 00000000:23:00.0 Off |                    0 |
| N/A   58C    P0   297W / 400W |  75340MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A800-SXM...  On   | 00000000:29:00.0 Off |                    0 |
| N/A   71C    P0   352W / 400W |  80672MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A800-SXM...  On   | 00000000:52:00.0 Off |                    0 |
| N/A   68C    P0   398W / 400W |  75756MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A800-SXM...  On   | 00000000:57:00.0 Off |                    0 |
| N/A   58C    P0   341W / 400W |  75994MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA A800-SXM...  On   | 00000000:8D:00.0 Off |                    0 |
| N/A   56C    P0   319W / 400W |  70084MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA A800-SXM...  On   | 00000000:92:00.0 Off |                    0 |
| N/A   70C    P0   354W / 400W |  76314MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA A800-SXM...  On   | 00000000:BF:00.0 Off |                    0 |
| N/A   73C    P0   360W / 400W |  75876MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA A800-SXM...  On   | 00000000:C5:00.0 Off |                    0 |
| N/A   57C    P0   364W / 400W |  80404MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

嘎嘎快啊

ToTensor

SentenceTransformer使用多GPU加速向量化

文章目录

前言

代码