Apache Flume Netcat Agent Configuration

posted on Nov 20th, 2016

Apache Flume

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system

2) Apache Hadoop pre installed (How to install Hadoop on Ubuntu 14.04)

3) Apache Flume 1.6.0 pre installed (How to install Flume on Ubuntu 14.04)

Flume Netcat Agent Configuration.

Now we will see, how you can generate events and subsequently log them into the console.

Step 1 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 2 - Start all hadoop daemons.

$ start-all.sh

Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

$ jps

Step 4 - Change the directory to /usr/local/flume

$ cd $FLUME_HOME

Step 5 - Configuration File

Given below is an example of the configuration file. Copy this content and save as net.conf. In my case, net.conf files is in /home/hduser/Desktop/FLUME/ folder.

net.conf

# Naming the components on the current agent

NetcatAgent.sources = Netcat
NetcatAgent.channels = MemChannel
NetcatAgent.sinks = LoggerSink

# Describing/Configuring the source

NetcatAgent.sources.Netcat.type = netcat
NetcatAgent.sources.Netcat.bind = localhost
NetcatAgent.sources.Netcat.port = 56565
NetcatAgent.sources.Netcat.channels = MemChannel

# Describing/Configuring the sink

NetcatAgent.sinks.LoggerSink.type = logger
NetcatAgent.sinks.LoggerSink.channel = MemChannel

# Describing/Configuring the channel

NetcatAgent.channels.MemChannel.type = memory
NetcatAgent.channels.MemChannel.capacity = 1000
NetcatAgent.channels.MemChannel.transactionCapacity = 100

Step 6 - Execution

$ bin/flume-ng agent -c /home/hduser/Desktop/FLUME/ -f /home/hduser/Desktop/FLUME/net.conf --name NetcatAgent -Dflume.root.logger=INFO,console

Apache Flume Netcat Agent Configuration

Step 7 - Open a new terminal (CLT + ALT +T). To pass data to NetCat source, you have to open the port given in the configuration file. Open a separate terminal and connect to the source (56565) using the telnet.

$ telnet localhost 56565

Apache Flume Netcat Agent Configuration

Step 8 - Now type anything.

Apache Flume Netcat Agent Configuration

Step 9 - You can see that message in net.conf console.

Apache Flume Netcat Agent Configuration

Now we will see, how you can generate events and subsequently log them into the HDFS.

Step 1 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 2 - Start all hadoop daemons.

$ start-all.sh

Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

$ jps

Step 4 - Create a /user/hduser/flumedata folder in HDFS.

$ hdfs dfs -mkdir hdfs://localhost:9000/user/hduser/flumedata

Step 5 - Change the directory to /usr/local/flume

$ cd $FLUME_HOME

Step 6 - Configuration File

Given below is an example of the configuration file. Copy this content and save as nethd.conf. In my case, net.conf files is in /home/hduser/Desktop/FLUME/ folder.

nethd.conf

NetcatAgent.sources = Netcat
NetcatAgent.channels = MemChannel
NetcatAgent.sinks = hdfs‐sink

NetcatAgent.sources.Netcat.type = netcat
NetcatAgent.sources.Netcat.bind = localhost
NetcatAgent.sources.Netcat.port = 56563
NetcatAgent.sources.Netcat.channels = MemChannel

NetcatAgent.channels.MemChannel.type = memory
NetcatAgent.channels.MemChannel.capacity = 1000

# Define a source on agent and connect to channel memoryChannel. 

NetcatAgent.sinks.hdfs‐sink.type = hdfs 
NetcatAgent.sinks.hdfs‐sink.channel = MemChannel 
NetcatAgent.sinks.hdfs‐sink.hdfs.path = hdfs://localhost:9000/user/hduser/flumedata/
NetcatAgent.sinks.hdfs‐sink.hdfs.fileType = DataStream
NetcatAgent.sinks.hdfs‐sink.hdfs.writeFormat = Text
NetcatAgent.sinks.hdfs‐sink.hdfs.filePrefix=
NetcatAgent.sinks.hdfs‐sink.hdfs.fileSuffix=.txt

Step 7 - Execution

$ bin/flume-ng agent -c /home/hduser/Desktop/FLUME/ -f /home/hduser/Desktop/FLUME/nethd.conf --name NetcatAgent -Dflume.root.logger=INFO,console

Apache Flume Netcat Agent Configuration

Step 8 - Open a new terminal (CLT + ALT +T). To pass data to NetCat source, you have to open the port given in the configuration file. Open a separate terminal and connect to the source (56565) using the telnet.

$ telnet localhost 56563

Apache Flume Netcat Agent Configuration

Step 9 - Now type anything.

Apache Flume Netcat Agent Configuration

Step 10 - You can see that all are events are storing in HDFS.

Apache Flume Netcat Agent Configuration

Step 11 - Verify

$ hdfs dfs -ls /user/hduser/flumedata/

$ hdfs dfs -cat /user/hduser/flumedata/FlumeData.1476018771029.txt

Apache Flume Netcat Agent Configuration

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Apache Flume Installation on Ubuntu   Flume Collecting twitter data   Flume Moving Tomcat Logs to HDFS   Flume SeqGen Agent Configuration