Apache Flume SeqGen Agent Configuration

posted on Nov 20th, 2016

Apache Flume

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system

2) Apache Hadoop pre installed (How to install Hadoop on Ubuntu 14.04)

3) Apache Flume 1.6.0 pre installed (How to install Flume on Ubuntu 14.04)

Apache Flume SeqGen Agent Configuration

Now we will see, how to fetch data from Sequence generator.

Step 1 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 2 - Start all hadoop daemons.

$ start-all.sh

Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

$ jps

Step 4 - Change the directory to /usr/local/flume

$ cd $FLUME_HOME

Step 5 - Configuration File

Given below is an example of the configuration file. Copy this content and save as seq_gen.conf. In my case, net.conf files is in /home/hduser/Desktop/FLUME/ folder.

seq_gen.conf

SeqGenAgent.sources = SeqSource
SeqGenAgent.channels = MemChannel
SeqGenAgent.sinks = HDFS

# Describing/Configuring the source
SeqGenAgent.sources.SeqSource.type = seq

# Describing/Configuring the sink
SeqGenAgent.sinks.HDFS.type = hdfs
SeqGenAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/hduser/flumedata/
SeqGenAgent.sinks.HDFS.hdfs.filePrefix = seq
SeqGenAgent.sinks.HDFS.hdfs.rollInterval = 0
SeqGenAgent.sinks.HDFS.hdfs.rollCount = 10000
SeqGenAgent.sinks.HDFS.hdfs.fileType = DataStream

# Describing/Configuring the channel
SeqGenAgent.channels.MemChannel.type = memory
SeqGenAgent.channels.MemChannel.capacity = 1000
SeqGenAgent.channels.MemChannel.transactionCapacity = 100

# Binding the source and sink to the channel
SeqGenAgent.sources.SeqSource.channels = MemChannel
SeqGenAgent.sinks.HDFS.channel = MemChannel

Step 6 - Execution

$ bin/flume-ng agent -c /home/hduser/Desktop/FLUME/ -f /home/hduser/Desktop/FLUME/seq_gen.conf --name SeqGenAgent Dflume.root.logger=INFO,console

Step 7 - Verify

$ hdfs dfs -ls /user/hduser/flumedata/

$ hdfs dfs -cat /user/hduser/flumedata/FlumeData.1476018771029.txt

Previous Post                                                                                          Next Post

Labels : Apache Flume Installation on Ubuntu   Flume Collecting twitter data   Flume NetCat Agent Configuration   Flume Moving Tomcat Logs to HDFS