Apache Flume Moving Tomcat Logs to HDFS

posted on Nov 20th, 2016

Apache Flume

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system

2) Apache Hadoop pre installed (How to install Hadoop on Ubuntu 14.04)

3) Apache Tomcat pre installed (How to install Apache Tomcat on Ubuntu 14.04)

4) Apache Flume 1.6.0 pre installed (How to install Flume on Ubuntu 14.04)

Apache Flume Moving Tomcat Logs to HDFS

Now we will see, how you can move apache tomcat logs into the HDFS.

Step 1 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 2 - Start all hadoop daemons.

$ start-all.sh

Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

$ jps

Step 4 - Create a /user/hduser/flumedata folder in HDFS.

$ hdfs dfs -mkdir hdfs://localhost:9000/flumedata

Step 5 - Change the directory to /usr/local/tomcat/bin

$ cd $CATALINA_HOME/bin

Step 6 - Starting the tomcat web server.

$ ./startup.sh

Step 7 - Check the web here. Open a browser and type the following URL.

http://127.0.0.1:8080

Step 8 - Change the directory to /usr/local/flume

$ cd $FLUME_HOME

Step 9 - Configuration File

Given below is an example of the configuration file. Copy this content and save as nethd.conf. In my case, net.conf files is in /usr/local/flume/conf/ folder.

Dont forget to change this line with your tomcat log file name

agent.sources.tail‐source.command = cat ‐F /usr/local/tomcat/logs/access_log.2015-12-26.txt

flume.conf

agent.sources = tail‐source
agent.channels = memoryChannel 
agent.sinks = hdfs‐sink 

agent.sources.tail‐source.type = exec
agent.sources.tail‐source.command = cat ‐F /usr/local/tomcat/logs/access_log.2015-12-26.txt
#agent.sources.tail‐source.batchSize = 10
agent.sources.tail‐source.channels = memoryChannel 


agent.channels.memoryChannel.type = memory
#agent.channels.memoryChannel.capacity = 100000
#agent.channels.memoryChannel.transactionCapacity = 10000
#agent.channels.memoryChannel.keep-alive=2


agent.sinks.hdfs‐sink.type = hdfs 
agent.sinks.hdfs‐sink.channel = memoryChannel 
agent.sinks.hdfs‐sink.hdfs.path = hdfs://localhost:9000/flumedata/
agent.sinks.hdfs‐sink.hdfs.fileType = DataStream
agent.sinks.hdfs‐sink.hdfs.writeFormat = Text
#agent.sinks.hdfs‐sink.hdfs.filePrefix=access_%y-%m-%d-%H-%M
agent.sinks.hdfs‐sink.hdfs.fileSuffix=.txt

#agent.sinks.hdfs‐sink.hdfs.batchSize = 10
#agent.sinks.hdfs‐sink.hdfs.rollSize = 0
#agent.sinks.hdfs‐sink.hdfs.rollCount = 10
#agent.sinks.hdfs‐sink.hdfs.rollInterval = 30

Step 10 - Execution

$ bin/flume-ng agent -c ./conf -f conf/flume.conf --name agent -Dflume.root.logger=INFO,console

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Apache Flume Installation on Ubuntu   Flume Collecting twitter data   Flume NetCat Agent Configuration   Flume SeqGen Agent Configuration