scala - 线程 “main” org.apache.hadoop.mapred.InvalidInputException中的异常

我不断收到这个错误

我已经建立了一个独立的Spark集群，并且试图在我的主节点上运行此代码。

conf = new SparkConf()
      .setAppName("Recommendation Engine1")
      .set("spark.executor.memory", "1g")
      .set("spark.driver.memory", "4g")

    val sc = new SparkContext(conf)
    val rawUserArtistData = sc.textFile("hdfs:/user_artist_data.txt").sample(false,0.05)

在我的航站楼上，

这些是我尝试过的各种方法

我用core-site.xml文件

中存在的fs.defaultFS路径替换了hdfs:/filename.txt

将hdfs:/filename.txt替换为hdfs://(如果有任何区别)

将hdfs:/替换为file://，然后替换为file:///，以访问本地驱动器以获取

文件

这些似乎都无效，还有其他可能出错的地方。

如果我做了hadoop fs -ls

scala - 线程 “main” org.apache.hadoop.mapred.InvalidInputException中的异常-LMLPHP

这是我的文件所在的位置。

最佳答案

通常，路径为:
hdfs://name-nodeIP:8020/path/to/file
就您而言，
hdfs://localhost:8020/user_artist_data.txt
要么
hdfs://machinname:8020/user_artist_data.txt

关于scala - 线程 “main” org.apache.hadoop.mapred.InvalidInputException中的异常，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/39158613/

HDFS

scala - 线程 “main” org.apache.hadoop.mapred.InvalidInputException中的异常