Petabyte Scale, Automated Support for Remote Devices

Ron Bodkin (Think Big Analytics), Kumar Palaniappan (NetApp)
NetApp is a fast growing provider of storage technology. Its devices “phone home” regularly, sending unstructured auto-support log and configuration data back to centralized data centers. This data is used to provide timely support, to improve sales, and to plan product improvements. To allow this, data is collected, organized, and analyzed. The system currently ingests 5 TB of compressed data per week, which is growing 40% per year. NetApp was previously storing flat files on disk volumes and keeping summary data in relational databases. Today, NetApp is working with Accenture to design, build, and implement the enterprise transformation project for next generation auto-support, with Think Big Analytics as a partner and expert in Big Data solutions. The new system uses Hadoop, HBase and related technologies to ingest, organize, transform and present auto-support data. This will enable business users to make decisions and provide timely response, and will enable automated response based on predictive models. Key requirements include:
  • Query data in seconds within 5 minutes of event occurrence.
  • Execute complex ad hoc queries to investigate issues and plan accordingly.
  • Build models to predict support issues and capacity limits to take action before issues arise.
  • Build models for cross-sale opportunities.
  • Expose data to applications through REST interfaces
In this session we look at the lessons learned while designing and implementing a system to:
  • Collect 1000 messages of 20MB compressed per minute.
  • Store 2 PB of incoming support events by 2015.
  • Provide low latency access to support information and configuration changes in HBase at scale within 5 minutes of event arrival.
  • Support complex ad hoc queries that join multiple data sets accessing diverse structured and unstructured large scale data sets
  • Operate efficiently at scale.
  • Integrate with a data warehouse in Oracle.
Photo of Ron Bodkin

Ron Bodkin

Think Big Analytics

Ron founded Think Big Analytics to help customers leverage new data processing technologies like Hadoop and NoSQL databases and R for statistical analysis. Works with customers to identify opportunities and rapidly develop solutions that integrate data and extract information.

Previously Ron was the VP of Engineering for Quantcast. Each day Quantcast ingests 10 billion events and processes two petabytes of data using Hadoop. The Quantcast map reduce stack handles production data processing, ad hoc analysis, data mining and machine learning. Prior to that Ron was a founder of enterprise consulting companies C-bridge and New Aspects.

Photo of Kumar Palaniappan

Kumar Palaniappan

NetApp

Kumar Palaniappan is an Enterprise Architect at NetApp where he leads efforts at adopting Hadoop technologies for strategic applications. Previously Kumar was an architect at Cisco Systems, responsible for large scale, mission critical architectures.

Sponsors

  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com.

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

View a complete list of Strata contacts