如何在 HuggingFace Transformers 库中获得预训练 BERT 模型的中间层输出?

本文介绍了如何在 HuggingFace Transformers 库中获得预训练 BERT 模型的中间层输出?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

(我正在关注 this pytorch 教程关于 BERT 词嵌入，在教程中作者是访问 BERT 模型的中间层.)

(I'm following this pytorch tutorial about BERT word embeddings, and in the tutorial the author is access the intermediate layers of the BERT model.)

我想要的是使用 HuggingFace 的 Transformers 库访问 TensorFlow2 中 BERT 模型的单个输入令牌的最后一层，比如说，最后 4 层.因为每一层输出一个长度为 768 的向量，所以最后 4 层的形状将是 4*768=3072(对于每个 token).

What I want is to access the last, lets say, 4 last layers of a single input token of the BERT model in TensorFlow2 using HuggingFace's Transformers library. Because each layer outputs a vector of length 768, so the last 4 layers will have a shape of 4*768=3072 (for each token).

如何在 TF/keras/TF2 中实现这一点，以获得输入令牌的预训练模型的中间层?(稍后我会尝试获取句子中每个令牌的令牌，但现在一个令牌就足够了).

How can I implement this in TF/keras/TF2, to get the intermediate layers of pretrained model for an input token? (later I will try to get the tokens for each token in a sentence, but for now one token is enough).

我正在使用 HuggingFace 的 BERT 模型:

I'm using the HuggingFace's BERT model:

!pip install transformers
from transformers import (TFBertModel, BertTokenizer)

bert_model = TFBertModel.from_pretrained("bert-base-uncased")  # Automatically loads the config
bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
sentence_marked = "hello"
tokenized_text = bert_tokenizer.tokenize(sentence_marked)
indexed_tokens = bert_tokenizer.convert_tokens_to_ids(tokenized_text)

print (indexed_tokens)
>> prints [7592]

输出是一个令牌([7592])，它应该是 BERT 模型的输入.

The output is a token ([7592]), which should be the input of the for the BERT model.

推荐答案

BERT 模型输出的第三个元素是一个元组，它由嵌入层的输出以及中间层隐藏状态组成.来自文档:

The third element of the BERT model's output is a tuple which consists of output of embedding layer as well as the intermediate layers hidden states. From documentation:

hidden_states (tuple(tf.Tensor)，可选，当config.output_hidden_states=True时返回):形状 (batch_size, sequence_length, hidden_size) 的 tf.Tensor 元组(一个用于嵌入的输出 + 一个用于每层的输出).

每层输出的模型隐藏状态加上初始嵌入输出.

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

对于 bert-base-uncased 模型，config.output_hidden_states 默认为 True.因此，要访问 12 个中间层的隐藏状态，您可以执行以下操作:

For the bert-base-uncased model, the config.output_hidden_states is by default True. Therefore, to access hidden states of the 12 intermediate layers, you can do the following:

outputs = bert_model(input_ids, attention_mask)
hidden_states = outputs[2][1:]

hidden_states元组中有12个元素对应从头到尾的所有层，每个元素都是一个形状为(batch_size, sequence_length, hidden_size).因此，例如，要访问批处理中所有样本的第五个标记的第三层隐藏状态，您可以执行以下操作:hidden_states[2][:,4].

There are 12 elements in hidden_states tuple corresponding to all the layers from beginning to the last, and each of them is an array of shape (batch_size, sequence_length, hidden_size). So, for example, to access the hidden state of third layer for the fifth token of all the samples in the batch, you can do: hidden_states[2][:,4].

请注意，如果您加载的模型默认不返回隐藏状态，那么您可以使用 BertConfig 类加载配置并传递 output_hidden_state=True 参数，像这样:

Note that if the model you are loading does not return the hidden states by default, then you can load the config using BertConfig class and pass output_hidden_state=True argument, like this:

config = BertConfig.from_pretrained("name_or_path_of_model",
                                    output_hidden_states=True)

bert_model = TFBertModel.from_pretrained("name_or_path_of_model",
                                         config=config)

                        这篇关于如何在 HuggingFace Transformers 库中获得预训练 BERT 模型的中间层输出?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！