本文介绍了无法在Python中解析.csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在LinuxAcademy.com网站上做实验室。课程名称是使用Lambda,Python和Boto3自动化AWS ,而我遇到的具体实验室是讲座:将CSV文件导入DynamoDB



在本实验中,我们将一个.csv文件上载到S3,在指定的存储桶中生成一个S3事件,然后启动如下所示的Lambda函数。该函数解析.csv,然后将内容上传到DynamoDB中。



我最初在第23行遇到问题:

  items = read_csv(download_file)

因为Python无法定义download_file。更改为:

  items = read_csv(download_path)

我能够克服该错误。



现在我在第26行遇到问题:

 项目中的项目:

CloudWatch中的#26新错误如下:

  [错误] TypeError: NoneType对象不是可迭代的Traceback(最近一次调用是最近一次):文件 /var/task/lambda_function.py ,在lambda_handler 
中的第26行,用于以下项目:

这是代码:

  import csv 
import os

import tempfile
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

导入boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Movies')
s3 = boto3.client('s3')


def lambda_handler(event,context):
用于事件['Records']中的记录:
source_bucket = record ['s3'] ['bucket'] ['name']
键= record ['s3'] ['object'] [ 'key']
和tempfile.Temporary Directory()作为tmpdir:
download_path = os.path.join(tmpdir,key)
s3.download_file(source_bucket,key,download_path)
项目= read_csv(download_path)

与table.batch_writer()作为批处理:
**对于项目中的项目:**
batch.put_item(Item = item)


def read_csv(file):
items = []
,带有open(file)as csvfile:
reader = csv.DictReader(csvfile)
用于阅读器中的行:
data = {}
data ['Meta'] = {}
data ['Year'] = int(row ['Year'])
data ['Title'] =行['Title']或无
data ['Meta'] ['Length'] = int(row ['Length']或0)
#data ['Meta'] ['Length'] = int(row ['Length']或0)
data ['Meta'] ['Subject'] = row ['Subject']或无
data ['Meta'] ['Actor' ] = row ['Actor']或无
data ['Meta'] ['Actress'] = row ['Ac tress']或无
data ['Meta'] ['Director'] = row ['Director']或无
data ['Meta'] ['Popularity'] = row ['Popularity' ]或无
data ['Meta'] ['Awards'] = row ['Awards'] =='是'
data ['Meta'] ['Image'] = row ['Image ']或无
data ['Meta'] = {k:v为k,
v在data ['Meta']。items()中,如果v不是None}
 年,长度,标题,主题,演员,女演员,导演,人气,奖项,图像
1990,111,《 Tie Me Up》,喜剧片,《班达拉斯安东尼奥》,《维多利亚四月》,《 Al,Pedreo》,68,否,NicholasCage.png
1991, 112,Tie Me Up2,Comedy2, Banderas,Antonio2, April,Victoria2, Al,Pedreo2,682,No2,NicholasCage2.png
1993,113,Tie Me Up3,Comedy3, Banderas, Antonio3, April,Victoria3, Al,Pedreo3,683,No3,NicholasCage3.png


解决方案

您可以使用熊猫。使用它将为您提供一个数据框,您可以轻松地循环浏览它。您可以参考此链接获取pandas lib。 -


I am doing a lab on the website LinuxAcademy.com. The Course name is Automating AWS with Lambda, Python, and Boto3 and the specific lab I am having trouble with is Lecture: Importing CSV Files into DynamoDB.

In this lab we upload a .csv file into S3, an S3 event is generated in a specified bucket which then kicks off the Lambda function shown below. The function parses the .csv then uploads the contents into DynamoDB.

I was originally having issues with Line 23:

items = read_csv(download_file)

as Python was unable to define download_file. When changing to:

items = read_csv(download_path)

I was able to get past that error.

Now I am having an issue with Line 26:

for item in items:

The new error for #26 from CloudWatch is as follows:

[ERROR] TypeError: 'NoneType' object is not iterable Traceback (most recent call last):   File "/var/task/lambda_function.py", line 26, in lambda_handler
    for item in items:

Here is the code:

import csv
import os

import tempfile
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Movies')
s3 = boto3.client('s3')


def lambda_handler(event, context):
    for record in event['Records']:
        source_bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        with tempfile.TemporaryDirectory() as tmpdir:
            download_path = os.path.join(tmpdir, key)
            s3.download_file(source_bucket, key, download_path)
            items = read_csv(download_path)

            with table.batch_writer() as batch:
                **for item in items:**
                    batch.put_item(Item=item)


def read_csv(file):
    items=[]
    with open(file) as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            data = {}
            data['Meta'] = {}
            data['Year'] = int(row['Year'])
            data['Title'] = row['Title'] or none
            data['Meta']['Length'] = int(row['Length'] or 0)
            #data['Meta']['Length'] = int(row['Length'] or 0)
            data['Meta']['Subject'] = row['Subject'] or None
            data['Meta']['Actor'] = row['Actor'] or None
            data['Meta']['Actress'] = row['Actress'] or None
            data['Meta']['Director'] = row['Director'] or None
            data['Meta']['Popularity'] = row['Popularity'] or None
            data['Meta']['Awards'] = row['Awards'] == 'Yes'
            data['Meta']['Image'] = row['Image'] or None
            data['Meta'] = {k: v for k,
                            v in data['Meta'].items() if v is not None}

I'm starting to think that this is related to the function not reading the .csv properly. The .csv is a small test file, contents below.

Year,Length,Title,Subject,Actor,Actress,Director,Popularity,Awards,Image
1990,111,Tie Me Up, Comedy,"Banderas, Antonio","April, Victoria","Al, Pedreo",68,No,NicholasCage.png
1991,112,Tie Me Up2, Comedy2,"Banderas, Antonio2","April, Victoria2","Al, Pedreo2",682,No2,NicholasCage2.png
1993,113,Tie Me Up3, Comedy3,"Banderas, Antonio3","April, Victoria3","Al, Pedreo3",683,No3,NicholasCage3.png
解决方案

You can use pandas. Using that will give u a data frame which you can loop through easily. You can refer this link for pandas lib. - https://www.datacamp.com/community/tutorials/pandas-read-csv

这篇关于无法在Python中解析.csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-05 04:57