推理/Inferring

推理是指模型将文本作为输入并执行某种分析的任务。这可以是提取标签、提取名称、理解文本的情感等任务。
例如,如果您想从一段文本中提取积极或消极的情感,在传统的机器学习工作流程中,您需要收集标签数据集、训练模型、确定如何在云中部署模型并进行推断。这需要经历大量的工作流程。而对于每个任务,例如情感、提取名称等,您都需要训练和部署单独的模型。
而在大型语言模型,对于类似的任务,只需要编写一个提示,就可以立即开始生成结果。而且,您可以只使用一个模型、一个API来执行许多不同的任务,而不需要找出如何训练和部署许多不同的模型。

环境准备

和(①指南)一样需要搭建一个环境

import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.getenv('OPENAI_API_KEY')
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

情感分析

以一个台灯的评价为例子。

lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

让我写一个提示来分类这个评价的情感。我只需要写下“以下产品评价的情感是什么”,然后加上通常的分隔符和评价文本等等。然后运行它。

prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)
"""
The sentiment of the product review is positive.
"""

可以限定为只回答一个单词,积极或消极。

prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)
"""
positive
"""

也可以让它识别出评价作者表达的情绪列表,包括不超过五个项目。

prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)
"""
happy, satisfied, grateful, impressed, content
"""

我们也可以询问是否顾客有某一个特定的情感,比如愤怒?

prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)
"""
No
"""

信息提取

信息提取是NLP(自然语言处理)的一部分,从文本中提取您想要了解的特定信息。
在以下案例中,我要求它识别以下内容:商品Item和品牌Brand。

prompt = f"""
Identify the following items from the review text: 
Item purchased by reviewer
Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys. 
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)
"""
{
  "Item": "lamp",
  "Brand": "Lumina"
}
"""

多任务抽取,一次性提取这些不同的字段,下面例子编写提示来识别情绪,确定某人是否生气,然后提取物品和品牌。

prompt = f"""
Identify the following items from the review text: 
Sentiment (positive or negative)
Is the reviewer expressing anger? (true or false)
Item purchased by reviewer
Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)
"""
{
  "Sentiment": "positive",
  "Anger": false,
  "Item": "lamp with additional storage",
  "Brand": "Lumina"
}
"""

推断主题

下面的例子,给定一篇长文,这是一篇虚构的关于政府工人对他们工作的机构感受的报纸文章。

story = """
In a recent survey conducted by the government, 
public sector employees were asked to rate their level 
of satisfaction with the department they work at. 
The results revealed that NASA was the most popular 
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, 
stating, "I'm not surprised that NASA came out on top. 
It's a great place to work with amazing people and 
incredible opportunities. I'm proud to be a part of 
such an innovative organization."

The results were also welcomed by NASA's management team, 
with Director Tom Johnson stating, "We are thrilled to 
hear that our employees are satisfied with their work at NASA. 
We have a talented and dedicated team who work tirelessly 
to achieve our goals, and it's fantastic to see that their 
hard work is paying off."

The survey also revealed that the 
Social Security Administration had the lowest satisfaction 
rating, with only 45% of employees indicating they were 
satisfied with their job. The government has pledged to 
address the concerns raised by employees in the survey and 
work towards improving job satisfaction across all departments.
"""

确定以下文本中正在讨论的五个主题。让我们把每个项目格式化成一个或两个单词长,将您的响应格式化为逗号分隔的列表,如果我们运行它,可以提取了一个主题列表。

prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long. 

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response.split(sep=','))
"""
['government survey',
 ' job satisfaction',
 ' NASA',
 ' Social Security Administration',
 ' employee concerns']
"""

如果给一篇文章,可以判断是否有某一些主题,这是我们追踪的主题:NASA,当地政府,工程,员工满意度和联邦政府,为每个主题提供0或1的答案列表。

topie_list ={
"nasa", "local govement", "engineering",
"employee satisfaction", "federal government"
}
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as list with 0 or 1 for each topic.\

List of topics: {", ".join(topic_list)}

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)
"""
nasa: 1
local government: 0
engineering: 0
employee satisfaction: 1
federal government: 1
"""

如果某个主题包括NASA,可以打印出“新NASA故事”

topic_dict = {i.split(': ')[0]: int(i.split(': ')[1]) for i in response.split(sep='\n')}
if topic_dict['nasa'] == 1:
    print("ALERT: New NASA story!")
09-23 17:20