如何使用 YouTube Data API v3 从频道中提取 20000 多个视频的元数据?

本文介绍了如何使用 YouTube Data API v3 从频道中提取 20000 多个视频的元数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用 Youtube Data API v3 为频道中的所有视频提取视频元数据(尤其是标题和发布日期).目前，我只能使用 playlistItems() 端点提取最近 20000 个视频的详细信息.有没有办法从一个频道中提取20000多个视频的元数据?

I want to use Youtube Data API v3 to extract video metadata (especially title and publish date) for all videos in a channel. Currently, I'm only being able to extract details the last 20000 videos using the playlistItems() endpoint. Is there a way to extract metadata for more than 20000 videos from a single channel?

这是我用来提取 20000 个视频的元数据的 Python 代码.

Here's the python code I'm using to extract metadata for 20000 videos.

youtube = build('youtube','v3',developerKey= "YOUTUBE_API_KEY")
channelId = "CHANNEL_ID"

# getting all video details
contentdata = youtube.channels().list(id=channelId,part='contentDetails').execute()
playlist_id = contentdata['items'][0]['contentDetails']['relatedPlaylists']['uploads']
videos = [ ]
next_page_token = None

while 1:
    res = youtube.playlistItems().list(playlistId=playlist_id,part='snippet',maxResults=50,pageToken=next_page_token).execute()
    videos += res['items']
    next_page_token = res.get('nextPageToken')
    if next_page_token is None:
        break

# getting video id for each video
video_ids = list(map(lambda x:x['snippet']['resourceId']['videoId'], videos))

这个问题的解决方案可以是强制 API 从一个频道中提取超过 20000 个视频的元数据，或者指定视频上传的时间段.这样，代码可以在多个时间段内反复运行以提取所有视频的元数据.

The solution to this problem can either be forcing the API to extract metadata more than 20000 videos from a channel or specifying a time period during which video was uploaded. That way, the code can be run again and again for multiple time periods to extract metadata for all videos.

推荐答案

我试过没有成功.

我的 YouTube API 后端失败的解决方案是使用这个 Python 脚本:

My solution to failing YouTube API backend is using this Python script:

它包括在浏览视频"时伪造请求；YouTube 频道上的标签.

It consists in faking requests done when browsing "Videos" tab on a YouTube channel.

import urllib.request, json, subprocess
from urllib.error import HTTPError

def getURL(url):
    res = ""
    try:
        res = urllib.request.urlopen(url).read()
    except HTTPError as e:
        res = e.read()
    return res.decode('utf-8')

def exec(cmd):
    return subprocess.check_output(cmd, shell = True)

youtuberId = 'UCF_UozwQBJY4WHZ7yilYkjA'
videosIds = []
errorsCount = 0

def retrieveVideosFromContent(content):
    global videosIds
    wantedPattern = '"videoId":"'
    content = content.replace('"videoId": "', wantedPattern).replace("'videoId': '", wantedPattern)
    contentParts = content.split(wantedPattern)
    contentPartsLen = len(contentParts)
    for contentPartsIndex in range(contentPartsLen):
        contentPart = contentParts[contentPartsIndex]
        contentPartParts = contentPart.split('"')
        videoId = contentPartParts[0]
        videoIdLen = len(videoId)
        if not videoId in videosIds and videoIdLen == 11:
            videosIds += [videoId]

def scrap(token):
    global errorsCount, data
    # YOUR_KEY can be obtained by browsing a videos channel section (like https://www.youtube.com/c/BenjaminLoison/videos) while checking your "Network" tab using for instance Ctrl+Shift+E
    cmd = 'curl -s \'https://www.youtube.com/youtubei/v1/browse?key=YOUR_KEY\' -H \'Content-Type: application/json\' --data-raw \'{"context":{"client":{"clientName":"WEB","clientVersion":"2.20210903.05.01"}},"continuation":"' + token + '"}\''
    cmd = cmd.replace('"', '\\"').replace("\'", '"')
    content = exec(cmd).decode('utf-8')

    retrieveVideosFromContent(content)

    data = json.loads(content)
    if not 'onResponseReceivedActions' in data:
        print('no token found let\'s try again')
        errorsCount += 1
        return scrap(token)
    entry = data['onResponseReceivedActions'][0]['appendContinuationItemsAction']['continuationItems'][-1]
    if not 'continuationItemRenderer' in entry:
        return ''
    newToken = entry['continuationItemRenderer']['continuationEndpoint']['continuationCommand']['token']
    return newToken

url = 'https://www.youtube.com/channel/' + youtuberId + '/videos'
content = getURL(url)
content = content.split('var ytInitialData = ')[1].split(";</script>")[0]
dataFirst = json.loads(content)

retrieveVideosFromContent(content)

token = dataFirst['contents']['twoColumnBrowseResultsRenderer']['tabs'][1]['tabRenderer']['content']['sectionListRenderer']['contents'][0]['itemSectionRenderer']['contents'][0]['gridRenderer']['items'][-1]['continuationItemRenderer']['continuationEndpoint']['continuationCommand']['token']

while True:
    videosIdsLen = len(videosIds)
    print(videosIdsLen, token)
    if token == '':
        break
    newToken = scrap(token)
    token = newToken

print(videosIdsLen, videosIds)

注意修改youtuberId"；和YOUR_KEY"值.还要注意在 shell 中提供 curl 命令.

Pay attention to modify "youtuberId" and "YOUR_KEY" values. Also pay attention to having curl command available from your shell.

这篇关于如何使用 YouTube Data API v3 从频道中提取 20000 多个视频的元数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！