Skip to main content

Massive Data Aggregation, Monitoring, Processing and Visualization with Apache Flume, ElasticSearch and D3.js

Israel Ekpo (Walt Disney Parks and Resorts Online)
Hadoop & Beyond Grand Ballroom West
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Average rating: *....
(1.19, 47 ratings)

Aggregating, processing and making sense of rapidly-generated data continuously from hundreds or sometimes thousands of sources can be very inefficient, expensive, stressful and at times flat out intimidating.

This 3-hour hands-on tutorial will begin by talking about the various sources and types of data we can collect such as:
  1. Application-Generated Data
  2. Network Traffic
  3. Graph Data representing connections between entities.
  4. Data from Social Media Sources like Twitter, Google+ and Facebook
  5. Data from Email Sources such as Mailing Lists

We will then talk about the general architecture of Apache Flume describing briefly the usage each of its various components as well as their place within the Flume NG Architecture.

Then we will walk through various Source components within Flume that can be used to capture the data.

Here we will show participants how to configure the sources to capture, analyze and filter the data in real time.

We will also show code samples on how to create custom sources to capture data from virtually any source compatible with the architecture.

We will then illustrate how to configure the channels within Flume to temporarily store the captured events from the Sources until they can be picked up by the Sinks.

We will go through the advantages and disadvantages of each channel type and how to create a custom channel of your own to suit your needs.

Once we are done with the channels used for temporary storage of captured events, we will discuss the various sinks available within Flume.

In this tutorial we will show the events from the channels are picked up and sent to the Sinks.

We will discuss how to configure and use a variety of Sinks including but not limited to the following:

  1. ElasticSearchSink
  2. HDFS Sink

We will also talk about how to create custom sinks to set up centralized storage with virtually any compatible backend datastore.

This section will focus on how to configure the sinks.

Once the data is in the sinks, we will discuss strategies for processing the data stored in HDFS, ElasticSearch and Neo4j.

We will then focus on how to search and query the data stored in ElasticSearch and Neo4j.

A picture is worth 1024 words.

Once the query results are retreived, we will process and format it in a structure that will simplify the presentation process.

The processed data will then be visualized using D3.js, SVG and CSS.

We will also show how to stream the processed data in realtime to a modern browser using HTML5 WebSockets.

Photo of Israel Ekpo

Israel Ekpo

Walt Disney Parks and Resorts Online

Israel Ekpo is a seasoned and experienced software engineer, computer scientist, big data enthusiast and data science practitioner. He uses and/or contributes to a variety of open source projects including but not limited to Apache Lucene, Apache Solr, ElasticSearch, Apache Flume, Mahout, Hadoop, HBase , MongoDB, CouchBase, Neo4j, and Apache Hive.

Comments on this page are now closed.

Comments

11/01/2013 12:28pm EDT

Is it possible to have the slides? Or the code samples in the git repository?

10/30/2013 8:36pm EDT

Would it be possible to post the slides here, like the other speakers have?

10/27/2013 9:21pm EDT

Will be available a VM with all the software?

Picture of Israel Ekpo
10/26/2013 11:51pm EDT

The list of software that we will be using during the tutorial is available here

https://github.com/israelekpo/strataconf-ny-2013

You should be able to install and use the software on a Mac, Linux or Windows Laptop.

10/26/2013 2:41pm EDT

its oct 26, where can we find detailed list of software ?

10/26/2013 2:16am EDT

Hello , Do we need to do any pre-reads for this session. Ill be bringing a MAC laptop, Hope that doesnt pose any challenges.

Picture of Israel Ekpo
10/25/2013 6:19pm EDT

Thanks for the question, Rajesh.

You will have to install a couple of software on your laptop in preparation for the session.

I will post a detailed list on Sat, October 26.

10/25/2013 6:15pm EDT

Do we have pre install any software for this lab or will we get VM’s?

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts