Schedule: Real-time sessions

Add to your personal schedule
Alasdair Allan
Location: Sutton South
Alasdair Allan (Babilim Light Industries)

In the last few years the ubiquitous availability of high bandwidth networks has changed the way both robotic and non-robotic telescopes operate, with single isolated telescopes being integrated into expanding smart telescope networks that can span continents and respond to transient events in seconds. At the same time the rise of data warehousing has made data mining more practical, and correlations between new and existing data can be drawn in real time. These changes have led to fundamental shifts in the way astronomers pursue their science. Astronomy, once a data-poor science, has become data-rich.

For many applications it is practical to extend data warehousing to real-time assets such as telescopes. There are few real intrinsic differences between a database and a telescope other than the access time for your data and the time stamps on the data itself. Inside astronomy architectures are emerging which present both static and real-time data resources using the same interface, inherited from a superset of the functionality possessed by both types of resource.

In these architectures all the components of the system, including the software controlling the science programmes, are thought of as agents. A negotiation takes place between these agents in which each of the resources bids to carry out the work, with the science agent scheduling the work with the agent embedded at the resource that promises to return the best result.

Effectively these architectures can be viewed as a general way to co-ordinate distributed (sensor) platforms, preserving inherent platform autonomy, using collective decision making to allocate resources. Such architectures are applicable to many (geographical) distributed sensors problems, or more generally to problems where you must optimise output from a distributed system in the face of scarce resources.

This talk explores the emergence of these architectures in the astronomical community from the viewpoint of one of the people intimately involved in the process. The talk will walk attendees through the pitfalls faced by developers hoping to implement such novel architectures and discuss how the deployment of these architectures in the field has prompted the interesting and increasing use of scientists as mechanical turks by their own software.

Add to your personal schedule
Tim Moreton
Location: Sutton South
Tim Moreton (Acunu)

At the heart of every system that harnesses big data is a pipeline that comprises collecting large volumes of raw data, extract value from it through analytics or data transformations, then delivering that condensed set of results back out—potentially to millions of users.

This talk examines the challenges of building manageable, robust pipelines—a great simplifying paradigm that will help participants looking to architect their own big data systems.

I’ll look at what you want from each of these stages—using Google Analytics as a canonical big data example, as well as case studies of systems deployed at LinkedIn. I’ll look at how collecting, analyzing and serving data pose conflicting demands on the storage and compute components of the underlying hardware. I’ll talk about what available tools do to address these challenges.

I’ll move on to consider two holy grails: real-time analytics, and dual data center support. The pipeline metaphor highlights a challenge in deriving real-time value from huge datasets: I’ll explore what happens when you compose multiple, segregated platforms into a single pipeline, and how you can dodge the issue with a ‘fast’ and ‘slow’ two-tier architecture. Then I’ll look at how you can figure dual data center support into the design, particularly important for highly available deployments on the cloud.

In summary, this talk will present a useful metaphor for architecting big data systems, and describe using deployed examples how to go about fitting together the tools available to fit a range of settings.

Sponsors

  • Aster Data
  • EMC Greenplum
  • GE
  • Lexis Nexis
  • MarkLogic
  • Tableau Software
  • Cloudera
  • DataStax
  • Informatica
  • DataSift
  • Splunk
  • Amazon Web Services
  • Datameer
  • Impetus
  • Karmasphere
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Sybase
  • Xeround
  • Media-Science
  • Platfora

Sponsorship Opportunities

For information on sponsorship opportunities at the conference, contact Susan Young at syoung@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata Contacts

Speakers Video