本文介绍了sparksql数据帧的语法错误定义架构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的pyspark控制台告诉我,for循环后的行上语法无效.直到schema = StructType(fields)行出现SyntaxError为止,控制台才执行for循环,但是for循环对我来说看起来不错...

My pyspark console is telling me that I have invalid syntax on the line following my for loop. the console doesn't execute the for loop until the schema = StructType(fields) line where it has the SyntaxError, but the for loop looks good to me...

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
sqlContext = SQLContext(sc)

lines = sc.textFile('file:///home/w205/hospital_compare/surveys_responses.csv')
parts = lines.map(lambda l: l.split(','))
surveys_responses = parts.map(lambda p: (p[0:33]))
schemaString = 'Provider Number, Hospital Name, Address, City, State, ZIP Code, County Name, Communication with Nurses Achievement Points, Communication with Nurses Improvement Points, Communication with Nurses Dimension Score, Communication with Doctors Achievement Points, Communication with Doctors Improvement Points, Communication with Doctors Dimension Score, Responsiveness of Hospital Staff Achievement Points, Responsiveness of Hospital Staff Improvement Points, Responsiveness of Hospital Staff Dimension Score, Pain Management Achievement Points, Pain Management Improvement Points, Pain Management Dimension Score, Communication about Medicines Achievement Points, Communication about Medicines Improvement Points, Communication about Medicines Dimension Score, Cleanliness and Quietness of Hospital Environment Achievement Points, Cleanliness and Quietness of Hospital Environment Improvement Points, Cleanliness and Quietness of Hospital Environment Dimension Score, Discharge Information Achievement Points, Discharge Information Improvement Points, Discharge Information Dimension Score, Overall Rating of Hospital Achievement Points, Overall Rating of Hospital Improvement Points, Overall Rating of Hospital Dimension Score, HCAHPS Base Score, HCAHPS Consistency Score'
fields = []
for field_name in schemaString.split(", "):
    if field_name != ("HCAHPS Base Score" | "HCAHPS Consistency Score"):
        fields.append(StructField(field_name, StringType(), True))
    else:
        fields.append(StructField(field_name, IntegerType(), True))
schema = StructType(fields)

推荐答案

!= 条件下, | 是错误的,因此请使用:-

Here | is wrong with != condition so use:-

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
sqlContext = SQLContext(sc)

lines = sc.textFile('file:///home/w205/hospital_compare/surveys_responses.csv')
parts = lines.map(lambda l: l.split(','))
surveys_responses = parts.map(lambda p: (p[0:33]))
schemaString = 'Provider Number, Hospital Name, Address, City, State, ZIP Code, County Name, Communication with Nurses Achievement Points, Communication with Nurses Improvement Points, Communication with Nurses Dimension Score, Communication with Doctors Achievement Points, Communication with Doctors Improvement Points, Communication with Doctors Dimension Score, Responsiveness of Hospital Staff Achievement Points, Responsiveness of Hospital Staff Improvement Points, Responsiveness of Hospital Staff Dimension Score, Pain Management Achievement Points, Pain Management Improvement Points, Pain Management Dimension Score, Communication about Medicines Achievement Points, Communication about Medicines Improvement Points, Communication about Medicines Dimension Score, Cleanliness and Quietness of Hospital Environment Achievement Points, Cleanliness and Quietness of Hospital Environment Improvement Points, Cleanliness and Quietness of Hospital Environment Dimension Score, Discharge Information Achievement Points, Discharge Information Improvement Points, Discharge Information Dimension Score, Overall Rating of Hospital Achievement Points, Overall Rating of Hospital Improvement Points, Overall Rating of Hospital Dimension Score, HCAHPS Base Score, HCAHPS Consistency Score'
fields = []
for field_name in schemaString.split(", "):
    if field_name not in ("HCAHPS Base Score", "HCAHPS Consistency Score"):
        fields.append(StructField(field_name, StringType(), True))
    else:
        fields.append(StructField(field_name, IntegerType(), True))
schema = StructType(fields)

这篇关于sparksql数据帧的语法错误定义架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 17:09