High-Volume Data Collection and Real Time Analytics Using Redis

C. Aaron Cois (Carnegie Mellon University, Software Engineering Institute), Tim Palko (Carnegie Mellon University, Software Engineering Institute)
Beyond Hadoop Great America Ballroom J
Average rating: ****.
(4.67, 3 ratings)

In this talk, we describe using Redis, an open source, in-memory key/value store, to capture large volumes of data from numerous remote sources while also allowing real-time monitoring and analytics in a production environment. With this approach, we are able to capture a high volume of continuous data from numerous remote environmental sensors while consistently querying our database for real time monitoring and analytics. NoSQL data store implementations have gained mass attention in recent years, in part due to the flexibility and efficiency of working with high volumes of data without the overhead of traditional structured database systems. As these technologies mature, their potential application to big data collection and analytics continues to grow.

The two biggest I/O bottlenecks in distributed applications are network I/O and filesystem I/O. Our particular use case required large numbers of remote client deployments in which we had no control over network infrastructure, and thus was always at the mercy of network latency. However, we found we were able to successfully combat filesystem I/O by leveraging an in-memory database for incoming data, enabling us to scale data collection rates to meet requirements. Our use case required not only large volumes of data to be continually collected, but also required data to be collected in small 300 byte chunks, resulting in a proportionally large number of inserts per second. We chose Redis, a popular open source, in-memory key/value store, to collect all incoming data from our various remote deployments. We found that Redis was not only capable of handling a data collection at a high rate, but was also able to serve real-time analytics queries simultaneously, a task that traditional databases proved incapable of when tested within our system.

In implementing such a system, there are some important factors to consider, e.g.:

  • How should your data be structured within the key/value store to maximize collection efficiency?
  • What are the real-time analytics requirements, and how should the data be structured to most efficiently serve them?
  • What are the data persistence needs of your system?
  • What are the long-term scalability requirements and expectations of the system?

We will walk through our system architecture, highlighting design choices made based on the above considerations, with a specific focus on considerations that may be at odds with each other, such as designing a data model to meet both collection efficiency and real-time analytics needs. We will also present lessons learned through our production deployments and provide an introspective view of our solutions, along with proposed enhancements for future iterations and divergent requirements.

Photo of C. Aaron Cois

C. Aaron Cois

Carnegie Mellon University, Software Engineering Institute

Aaron is a software engineer currently located in Pittsburgh, PA. He received his Ph.D. in 2007, developing algorithms and software for 3D medical image analysis. He currently leads a software development team at Carnegie Mellon University, focusing on web application development and cloud systems.

Aaron is a polyglot programmer, with a keen interest in open source technologies. Some favorite technologies at the moment include Node.js, Python/Django, MongoDB, and Redis.

Tim Palko

Carnegie Mellon University, Software Engineering Institute

Tim celebrates software development using many languages and frameworks, heeding less to past experience in choosing technologies. Spring MVC, Hibernate, Rails, .NET MVC, Django and the variety of languages that come with are in his L1 cache. Among other endeavors to keep him sharp, he currently provides coded solutions for the Software Engineering Institute at CMU.

Tim received a B.S. in Computer Engineering in 2003 and resides in Pittsburgh, PA.

Comments on this page are now closed.

Comments

Picture of C. Aaron Cois
C. Aaron Cois
03/05/2013 6:01am PST

Slides from our presentation are now live! Please feel free to contact me with any further questions or comments.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts