本文介绍了如何通过flume将Twitterdata提供给hdfs代理服务器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我的flume.conf文件看起来像这样:
我已经安装了flume并试图将Twitter数据提供给hdfs文件夹。

  TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey =< required>
TwitterAgent.sources.Twitter.consumerSecret =< required>
TwitterAgent.sources.Twitter.accessToken =< required>
TwitterAgent.sources.Twitter.accessTokenSecret =< required>
TwitterAgent.sources.Twitter.keywords = hadoop,大数据,中国,印度。
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs:// localhost:9000 / user / flume / tweets /%Y /%m /%d /%H /
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS .hdfs.rollInterval = 600
TwitterAgent.channels.MemChannel.type =内存
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

和我遇到以下错误:

  2014-11-03 02:00:49,834(Twitter Stream consumer-1 [建立连接])[DEBUG  -  twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] User-代理:twitter4j http://twitter4j.org/ /2.2.6 
2014-11-03 02: 00:49,834(Twitter Stream消费者-1 [建立连接])[DEBUG-twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)]连接:关闭
2014-11-03 02:00: 49,835(Twitter Stream消费者-1 [建立连接])[DEBUG-twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] X-Twitter-Client-Version:2.2.6
2014-11 -03 02:00:49,835(Twitter Stream consumer-1 [建立连接])[DEBUG-twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] X-Twitter-Client-URL:http:// twitter4j.org/en/twitter4j-2.2.6.xml
2014-11-03 02:00:49,836(Twitter Stream消费者-1 [建立连接])[DEBUG - twitter4j.internal.logging.SLF4JLogger.debug (SLF4JLogger.java:67)] Accept-Encoding:gzip
2014-11-03 02:00:49,836(Twitter Stream consumer-1 [建立连接])[DEBUG - twitter4j.internal.logging.SLF4JLogger.debug (SLF4JLogger.java:67)] X-Twitter-Client:Twitter4J
2014-11-03 02:00:49,837(Twitter Stream consumer-1 [Establi )[DEBUG-twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:75)] Post Params:count = 0& track = hadoop%2Cbig%20data%2Canalytics%2Cbigdata%2Ccloudera%2Cdata%20science& include_entities = true
2014-11-03 02:00:49,843(Twitter Stream consumer-1 [建立连接])[INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Connection refused
2014-11-03 02:00:49,843(Twitter Stream消费者-1 [建立连接])[INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)]等待2000毫秒
2014-11-03 02:00:49,843(Twitter Stream消费者-1 [等待2000毫秒])[DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Twitter Stream消费者-1 [等待2000毫秒]
2014-11-03 02:00:51,843(Twitter Stream消费者-1 [等待2000毫秒])[DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java: 67)]连接被拒绝
2014-11- 03 02:00:51,844(Twitter Stream消费者-1 [等待2000毫秒])[INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)]建立连接。

我的大学网络配备了代理服务器。我认为这个问题是由于代理服务器造成的。



我如何使用代理与flume?

解决方案



解压,然后在里面执行(如上所述):

到/cdh-twitter-example-master/flume-sources/src/main/java/com/cloudera/flume/source/TwitterSource.java



并添加以下行

  cb.setHttpProxyHost(your proxy); 
cb.setHttpProxyPort(8080); // port
cb.setHttpProxyUser();
cb.setHttpProxyPassword();

$ cd flume-sources

$ mvn包



将目标库从目标库文件夹中移出到文件夹folder.enjoy


I have installed flume and is trying to feed Twitter data into hdfs folder.

my flume.conf file looks like as:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <required>
TwitterAgent.sources.Twitter.consumerSecret = <required>
TwitterAgent.sources.Twitter.accessToken = <required>
TwitterAgent.sources.Twitter.accessTokenSecret = <required>
TwitterAgent.sources.Twitter.keywords = hadoop, big data, china, india.
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

and I am Encountered with the following error:

2014-11-03 02:00:49,834 (Twitter Stream consumer-1[Establishing connection]) [DEBUG -  twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] User-Agent: twitter4j http://twitter4j.org/ /2.2.6
2014-11-03 02:00:49,834 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Connection: close
2014-11-03 02:00:49,835 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] X-Twitter-Client-Version: 2.2.6
2014-11-03 02:00:49,835 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] X-Twitter-Client-URL: http://twitter4j.org/en/twitter4j-2.2.6.xml
2014-11-03 02:00:49,836 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Accept-Encoding: gzip
2014-11-03 02:00:49,836 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] X-Twitter-Client: Twitter4J
2014-11-03 02:00:49,837 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:75)] Post Params: count=0&track=hadoop%2Cbig%20data%2Canalytics%2Cbigdata%2Ccloudera%2Cdata%20science&include_entities=true
2014-11-03 02:00:49,843 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Connection refused
2014-11-03 02:00:49,843 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Waiting for 2000 milliseconds
2014-11-03 02:00:49,843 (Twitter Stream consumer-1[Waiting for 2000 milliseconds]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Twitter Stream consumer-1[Waiting for 2000 milliseconds]
2014-11-03 02:00:51,843 (Twitter Stream consumer-1[Waiting for 2000 milliseconds]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Connection refused
2014-11-03 02:00:51,844 (Twitter Stream consumer-1[Waiting for 2000 milliseconds]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Establishing connection.

My college Network is equipped with proxy server. I think the problem is due to proxy sever.

How can I use a proxy with flume?

解决方案

Build the jar from https://github.com/cloudera/cdh-twitter-example

Unzip, then execute inside (as mentionned) :

go to /cdh-twitter-example-master/flume-sources/src/main/java/com/cloudera/flume/source/TwitterSource.java

and add this lines

cb.setHttpProxyHost("your proxy");
cb.setHttpProxyPort(8080);//port
cb.setHttpProxyUser("");
cb.setHttpProxyPassword("");

$ cd flume-sources

$ mvn package

den put the jar from target to flume lib folder.enjoy

这篇关于如何通过flume将Twitterdata提供给hdfs代理服务器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-16 11:03