Schedule: Hadoop: Tools & Technology sessions

Hadoop: Tools & Technology, Murray Hill (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Mark Fei (Cloudera)
Average rating: ****.
(4.17, 18 ratings)
Apache Hadoop is enabling companies across many different industries that need to process and analyze large data sets. In this tutorial you will learn why and how people are using Hadoop and related technologies like Hive, Pig and HBase. Read more.
Hadoop: Tools & Technology, Gramercy Suite (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Tom Wheeler (Cloudera, Inc.)
Average rating: ****.
(4.62, 8 ratings)
This tutorial will explore the tools and techniques you need to ensure that your MapReduce applications are both correct and efficient. You'll learn how to do unit testing, integration testing and performance testing for your Hadoop jobs, as well as how to intepret diagnostic information to isolate and solve problems in your code. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Amandeep Khurana (Cloudera), Matteo Bertozzi (Cloudera)
Average rating: **...
(2.50, 10 ratings)
HBase is one of the more popular open source NoSQL databases that have cropped up over the last few years. Building applications that use HBase effectively is challenging. This tutorial is geared towards teaching the basics of building applications using HBase and covers concepts that a developer should know while using HBase as a backend store for their application. Read more.
Hadoop: Tools & Technology, Sutton Center / Sutton South (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Ed Kohlwey (Booz Allen Hamilton), Stephanie Beben (Booz Allen Hamilton)
Average rating: ***..
(3.43, 7 ratings)
In this tutorial, we’ll provide an introduction to an open source Map/Reduce library for R called RHadoop that makes Map/Reduce programming convenient and easy to understand for statistical modeling users. The session will cover the basics of RHadoop, common techniques and best practices, and some interactive real-world examples. Read more.
Hadoop: Case Studies Hadoop: Tools & Technology, Murray Hill (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Sewook Wee (Accenture), Ryan Tabora (Think Big Analytics), Jason Rutherglen (Datastax)
Average rating: *....
(1.80, 5 ratings)
This tutorial will help participants understand why distributed search is important and teach them how to use the landscape of tools available. Based on our hands-on experience at NetApp, we will lead a tutorial session that will teach participants how to setup and use search technologies such as Apache Solr and Lucene to enable real-time Big Data analytics with Hadoop, HBase, and other NoSQL. Read more.
Hadoop: Tools & Technology, Gramercy Suite (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Dean Wampler (Typesafe)
Average rating: ***..
(3.75, 4 ratings)
This hands-on tutorial teaches you how to setup and use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming. Read more.
Hadoop: Tools & Technology, Regent Parlor (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Hari Shreedharan (Cloudera Inc.), Will McQueen (Cloudera Inc.), Arvind Prabhakar (Cloudera), Prasad Mujumdar (Cloudera Inc.), Mike Percy (Cloudera)
Average rating: ***..
(3.00, 4 ratings)
Apache Flume (incubating) is a scalable, reliable, fault-tolerant, distributed system designed to collect and transfer massive amounts of event data from disparate systems into some storage tier such as Hadoop HDFS. In this tutorial we show how to easily build a large-scale data collection and transfer system in a scalable way using Flume NG, the next generation of Flume. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Ben Werther (Platfora), Kevin Beyer (Platfora)
Average rating: ***..
(3.33, 6 ratings)
With traditional ETL (extract-transform-load) you need to decide how you want to transform and store the data before it arrives. Hadoop allows a much more agile pipeline – store the raw data, add a little metadata, and iteratively pull from it at whatever level of detail is needed right now by the application. We'll explore this approach and show you how you can start using it today Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Eric Sammer (ScalingData)
Average rating: ****.
(4.25, 8 ratings)
While many of the necessary building blocks for data processing exist within the Hadoop ecosystem, it can be a challenge to assemble them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments. Read more.
Hadoop: Case Studies Hadoop: Tools & Technology, Gramercy Suite (NY Hilton)
Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook)
Average rating: ***..
(3.54, 13 ratings)
ODS is Facebook's internal large-scale monitoring system. HBase turns out be to a good fit for its workload and solves some manageability and scalability challenges with the previous MySQL based setup. We would like to share a series of valuable experiences learnt from building this large scale realtime system based on HBase. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Arun Murthy (Hortonworks Inc.)
Average rating: ***..
(3.40, 5 ratings)
Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. This really changes the game to recast Hadoop as a much more powerful data-processing system making Hadoop very different from itself 12 months ago. Now, ever wonder what it might look like in 12 months or 24 months or longer? Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Matt Winkler (Microsoft)
Average rating: ****.
(4.29, 7 ratings)
In this session we’ll discuss our experience extending Hadoop development to new platforms and languages, and key aspects of using non-JVM languages in the Hadoop environment. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Sanjay Radia (Hortonworks), Todd Lipcon (Cloudera, Inc.)
Average rating: ****.
(4.57, 7 ratings)
Hadoop 2.0 offers significant HDFS improvements: new append-pipeline, federation, wire compatibility, NameNode HA, performance improvements, etc. We describe the new features and their benefits and our plans for HDFS over the next year which includes Snapshots, Disaster recovery, RAID, performance improvements etc. We conclude with some of the misconceptions and myths about HDFS. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Aaron Myers (Cloudera, Inc.), Todd Lipcon (Cloudera, Inc.)
Average rating: ***..
(3.67, 3 ratings)
The initial implementation of a highly-available HDFS NameNode successfully removed all single points of failure from HDFS. This talk discusses further improvements to this work, including automatic failure detection and failover initiation, as well as removing the dependency on an HA NFS filer. Read more.
Nilesh Jain (Intel Corp)
Average rating: ****.
(4.50, 2 ratings)
The exponential growth of graph-based data analysis is fueling the need for machine learning. Recently, frameworks have emerged to perform these computations at large scale. But, feeding data to these frameworks is a challenge in itself. This talk introduces the GraphBuilder library for Hadoop, which makes the job easier for programmers. Several case studies showacse the utility of library. Read more.
Data Science Hadoop: Tools & Technology, Grand East (NY Hilton)
Aaron Kimball (Magnify Consulting), Kiyan Ahmadizadeh (WibiData, Inc.)
Average rating: ****.
(4.33, 3 ratings)
Performing investigative analysis on data stored in HBase is challenging. Most tools operate on files stored in HDFS, and interact poorly with HBase's data model. This talk will describe characteristics of data in HBase and exploratory analysis patterns. We will describe best practices for modeling this data efficiently and survey tools and techniques appropriate for data science teams. Read more.
Paul Kent (SAS)
Average rating: ****.
(4.33, 3 ratings)
To unlock the value of Big Data, analytics must be applied. Some enterprises hire platoons of data analysts but many others can't afford to pring on such skilled and expensive resources. How do those businesses uncover opportunity and insight within Big Data assets? They use analytic tools that offload some data discovery to business professionals or deploy intelligent analytic appications. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Jonathan Hsieh (Cloudera, Inc)
Average rating: ****.
(4.50, 2 ratings)
As Apache HBase matures, the community has augmented it with new features that are considered hard requirements for many enterprises. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator minimize downtime, monitor performance, control access to the system, and geo-replicate data across data centers. Read more.
Data Science Hadoop: Tools & Technology, Beekman / Sutton North (NY Hilton)
Justin Erickson (Cloudera), Marcel Kornacker (Cloudera, Inc.)
Average rating: ****.
(4.00, 4 ratings)
This talk will cover what tools and techniques work and don’t work well for data scientists working on Hadoop today and how Cloudera Impala increases the productivity of data science and analysis on Hadoop. Cloudera Impala builds upon experiences and leading edge technology from big data systems at Facebook, Google, and Yahoo. Read more.
Hadoop: Tools & Technology, Beekman / Sutton North (NY Hilton)
Michael Segel (Segel & Associates.)
Average rating: ***..
(3.00, 3 ratings)
This is a presentation that talks about how cluster design impacts performance. The presentation will cover several different design options and the trade offs in terms of performance and cost. The talk will also cover some of the tuning options based on the underlying hardware considerations. Read more.
Hadoop: Tools & Technology, Beekman / Sutton North (NY Hilton)
Ron Bodkin (Think Big Analytics)
Average rating: ****.
(4.50, 2 ratings)
There has been a lot of excitement lately about streaming approaches to handling Big Data such as Storm, S4, SQLStream, and InfoStreams. But many use cases can be better handled by low latency access with NoSQL databases and search indexing backed by scoring with batch analytics in Hadoop. We compare such integrated Big Data with streaming systems and look to the future. Read more.
Hadoop: Tools & Technology, Gramercy West (NY Hilton)
Josh Patterson (Cloudera), Michael Katzenellenbogen (Cloudera)
Average rating: ****.
(4.00, 2 ratings)
In this session, we will introduce “Knitting Boar”, an open-source Java library for performing distributed online learning on a Hadoop cluster under YARN. We will give an overview of how Woven Wabbit works and examine the lessons learned from YARN application construction. Read more.
Hadoop: Tools & Technology, Murray West (NY Hilton)
Thejas Madhavan Nair (Hortonworks Inc), Jianyong Dai (Hortonworks)
Average rating: **...
(2.00, 1 rating)
Apache Pig makes Apache Hadoop easier to use thanks to its high-level data flow language, Pig Latin. In this talk, we will discuss common data analysis tasks, the choices one can make while writing a query and impact of each on performance. The core principles behind the optimization recommendations shared during this presentation are applicable to all MapReduce applications. Read more.
Hadoop & Beyond Hadoop: Tools & Technology, Gramercy West (NY Hilton)
Avi Bryant (Stripe)
Average rating: *****
(5.00, 3 ratings)
Start on low heat with a base of Hadoop; map, then reduce. Flavor, to taste, with Scala's concise, functional syntax and collections library. Simmer with some Pig bones: a tuple model and high-level join and aggregation operators. Mix in Cascading to hold everything together and boil until it's very, very hot, and you get Scalding, a new API for MapReduce out of Twitter. Read more.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.