Strata + Hadoop World Speaker Slides & Video

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.)

Roy Hyunjin Han (CrossCompute)
Python is the language of choice when it comes to integrating analytical components. We will present a series of concepts and walkthroughs that illustrate how easy scientific computing is in Python, from machine learning and time series to spatial relationships and network analysis.
Mark Fei (Cloudera)
Apache Hadoop is enabling companies across many different industries that need to process and analyze large data sets. In this tutorial you will learn why and how people are using Hadoop and related technologies like Hive, Pig and HBase.
Marshall Sponder (WebMetricsGuru INC,)
Nobody knows data like a web analyst. That’s because everything we do online leaves a digital breadcrumb trail that’s easy to track and mine. The real world is less well instrumented—but that’s changing. Noted analyst Marshall Sponder takes us on a tour of some applications that blend real-world sensors with deep analytics.
Ilya Grigorik (Google), Brian Doll (GitHub)
Presentation: external link
Open-source developers all over the world contribute to millions of projects every day on GitHub: writing and reviewing code, filing bug reports and updating docs. Data from these events provides an amazing window into open source trends: project momentum, language adoption, community demographics, and more.
Samantha Ravich (National Commission for the Review of R&D Programs in the Intelligence Community)
Samantha Ravich, former National Security Advisor to Vice President Richard Cheney, will discuss the challenges that face strategic decision makers from the wealth of data now provided by advances in technology.
Robert Grossman (Open Data Group), Collin Bennett (Open Data Group)
A successful big data analytic project is not just about selecting the right algorithm for building a predictive model, but also about how to deploy the model efficiently into operational systems, how to evaluate the effectiveness of the model, and how to continuously improve it. In this tutorial we cover best practices for each of these phases in the life cycle of a predictive model.
Chang She (DataPad)
Proper tooling and good habits that maximize reproducibility are essential to being productive as a data scientist. From management of raw data to model version control, the entire workflow must be carefully controlled from end-to-end to produce quality research that scales with the quantity and complexity of data being analyzed.
Doug Cutting (Cloudera)
Hadoop started as an offline, batch-processing system. It made it practical to store and process much larger datasets than before. Subsequently, more interactive, online systems emerged, integrating with Hadoop.
Mike Driscoll (Metamarkets), Eric Tschetter (Metamarkets)
Hadoop is considered THE technology for addressing Big Data. While it shines as a processing platform, it does not respond anywhere close to "human time". In developing our solution, we needed the ability to query across billions of rows in seconds. Hear how and why we developed Druid, our distributed, in-memory OLAP data store after investigating various commercial and open source alternatives.
Mike Olson (Cloudera)
Society confronts enormous challenges today: How will we feed nine billion people? How can we diagnose and treat diseases better, and more cheaply? How will we produce more energy, more cleanly, than ever before? Big questions like these demand new approaches, and "Big Data" is a crucial of the toolkit we will use over the coming years to answer them.
Kevin Foster (IBM)
In this session, Kevin Foster, IBM Big Data Solution Architect, will provide an overview of big data analytic accelerators and how they are being used by organizations to speed up deployments and solve big data problems sooner.
Yekesa Kosuru (Nokia), Jim Tommaney (InfiniDB)
Nokia’s Big Data analytics service is a strategic multi-tenant, multi-petabyte platform that executes 10,000 jobs each day. It is made up of technologies that provide location content processing, ETL, ad-hoc SQL, dashboards and advanced analytics, including Calpont InfiniDB for SQL, Scribe, REST, Hadoop, and R. This talk discusses the platform, motivations behind design choices, and challenges.
Sharmila Shahani-Mulligan (ClearStory Data)
In recent years, "Big Data" has matured from a vague description of massive corporate data to a household term that refers to not just volume but the diversity of data and velocity of change. Today, there's a wealth of data trapped in corporate data repositories, new platforms like Hadoop, a new generation of data marketplaces and volumes generated hourly on the Web.
Jim Adler (Metanautix)
Presentation: external link
Since the first human scrawled an image on a cave wall, the brain has been processing petabytes of data. Today, we're passing through an historical threshold where big data is leaching out of our braincases into the disembodied cloud. For the first time in human existence, we can "think" outside of our brains. What does this mean for privacy, morality, ethics, and the law?
Krish Krishnan (Sixth Sense Advisors Inc)
If you are a manager on the IT team in your organization, chances are there is already a lot of buzz around big data. If you are wondering if this hype amounts to just another IT project, wait. Big data affects the whole enterprise, and requires more business ownership and drive than just IT.
Amy OConnor (Nokia), Danielle Dean (Nokia)
Amy O'Connor, Sr. Director of Nokia Analytics, together with her daughter and Nokia Intern, Danielle Dean, will share what makes a great data scientist, their different paths to acquiring the diverse skill sets that are needed and finally Amy will discuss how to spot, attract and train emerging data scientists in what is quickly becoming a heated market.
Tim Estes (Digital Reasoning)
The onset of the Big Data phenomenon has created a unique opportunity, but the challenge ahead of us is to move beyond Big Data infrastructure to morally and practically useful applications. This requires new technologies that close the "Understanding Gap" and, by doing so, can make great strides to prevent evil, reduce suffering, and create more actualized human potential.
Hari Shreedharan (Cloudera Inc.), Will McQueen (Cloudera Inc.), Arvind Prabhakar (Cloudera), Prasad Mujumdar (Cloudera Inc.), Mike Percy (Cloudera)
Apache Flume (incubating) is a scalable, reliable, fault-tolerant, distributed system designed to collect and transfer massive amounts of event data from disparate systems into some storage tier such as Hadoop HDFS. In this tutorial we show how to easily build a large-scale data collection and transfer system in a scalable way using Flume NG, the next generation of Flume.
Wes McKinney (DataPad Inc.)
Data manipulation, cleaning, integration, and preparation can be one of the most time consuming parts of the data science process. In this talk I will discuss key points in the design and implementation of data structures and algorithms for structured data manipulation. It is an accumulation of lessons learned and experience building pandas, a widely-used Python data analysis toolkit.
Camille Fournier (Rent the Runway)
Big data isn’t just about volume. It’s also about speed—making decisions in real time, and enabling interactive exploration—and richness of information. While we talk a lot about how large organizations can benefit from mining and connecting data sets, Big Data can help the little guy, too.
David Boyle (EMI Group)
In this case study, David Boyle will look at how EMI changed itself, and the music industry, by moving from gut instinct and opinions to a data-informed business.
Roberto Medri (Etsy, Inc.)
Online marketplace Etsy cares a lot about its customers, and what they’re worth. How do we understand the value they bring to an organization? In this session, Roberto Medri shows us how Etsy thinks about Customer Lifetime Value, and how this data is used to target your best customers, mitigate churn, and even get a meaningful value of your company’s worth.
Paul Kent (SAS)
In this rapid-fire keynote, we’ll introduce how virtually every new technology trend is inextricably linked – or should be to attain maximum leverage. We’ll discuss how you can use technologies such as cloud and mobility to spread the value of analytics pervasively across your virtual organization, and how that positively impacts your employees, customers and partners.
Russ Kennedy (Cleversafe)
This session will delve into the MapReduce computation paradigm, introduced by Google and widely adopted via the open-source Hadoop platform, combined with commodity hardware to execute computation at the storage node where data exists.
Michael Radwin (Intuit)
Imagine the social graph where personal relationships are replaced by commercial relationships based on real financial data. Imagine the possibilities for small businesses to grow, connect, transact and prosper.
Communicating Data Clearly describes how to draw clear, concise, accurate graphs that are easier to understand than many of the graphs one sees today. The tutorial emphasizes how to avoid common mistakes that produce confusing or even misleading graphs. Graphs for one, two, three, and many variables are covered as well as general principles for creating effective graphs.
Michael Stringer (Datascope Analytics)
An effective data science team looks a lot like an effective design team: brainstorming creative ideas, making prototypes, receiving feedback, telling stories, and deeply understanding the needs of others.
Ed Kohlwey (Booz Allen Hamilton), Stephanie Beben (Booz Allen Hamilton)
In this tutorial, we’ll provide an introduction to an open source Map/Reduce library for R called RHadoop that makes Map/Reduce programming convenient and easy to understand for statistical modeling users. The session will cover the basics of RHadoop, common techniques and best practices, and some interactive real-world examples.
Erik Shilts (Opower)
How does Opower deliver insights to millions of households with big (and getting bigger) data? I discuss how to effectively use Hadoop, integrate it with R and Python, and harness an engaged workforce to solve data science and efficiency problems.
This workshop is a jumpstart lesson on how to get from a blank page and a pile of data to a useful data visualization. We'll focus on the design process, not specific tools. Bring your sample data and paper or a laptop; leave with new visualization ideas.
Bitsy Bentley (GfK Custom Research)
An increasing number of organizations are embracing data to drive intelligent decisions. For many industries, this is a monumental shift in method and culture. Data communication strategies come in many flavors, from static metric reports to immersive data experiences. In this session I present a user-centered framework for designing or evaluating data delivery methods.
Jacob Rapp (Cisco Systems), Eric Sammer (Cloudera)
In this joint session, experts from Cisco and Cloudera reveal the fundamental design considerations of Hadoop in the Enterprise Data Center. Drawing from lessons learned in the real world, they'll share best practices from deployments of Cloudera's Hadoop distribution alongside Cisco's networking components.
Data is getting bigger faster than ever, and visualization is emerging as the preeminent tool for gainign insight, gleanign answers, and making decisions informed by your mountain of data. Unfortunately, most of what's being presented visually these days is, at best, more style than substance, and at worst, wildly misleading.
Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook)
ODS is Facebook's internal large-scale monitoring system. HBase turns out be to a good fit for its workload and solves some manageability and scalability challenges with the previous MySQL based setup. We would like to share a series of valuable experiences learnt from building this large scale realtime system based on HBase.
Irfan Khan (SAP)
You need more than a database 'hammer' for today's Big Data projects. Organizations need a 'data platform' providing integrated tools to capture, store, process and present data. Without it companies can achieve - volume, velocity, or variety - but not all three. Join us to learn the extreme capabilities needed to distill new business signals from big data.
Jesse Anderson (Cloudera)
Can a million monkeys on a million typewriters eventually recreate Shakespeare? The great minds since Aristotle have been thinking about this theorem. In 2011, Jesse Anderson randomly recreated Shakespeare using Hadoop. Here's why you should care.
Nilesh Jain (Intel Corp)
The exponential growth of graph-based data analysis is fueling the need for machine learning. Recently, frameworks have emerged to perform these computations at large scale. But, feeding data to these frameworks is a challenge in itself. This talk introduces the GraphBuilder library for Hadoop, which makes the job easier for programmers. Several case studies showacse the utility of library.
Dean Wampler (Typesafe)
Presentation: external link
This hands-on tutorial teaches you how to setup and use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming.
Jim Walker (Hortonworks)
With the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that connects the systems powering business transactions and business analytics
Ryan Brush (Cerner Corporation)
A look at using Hadoop, HBase and other technologies to bring together and process health data from many sources in real time. This includes techniques for dealing with data that's incomplete or out-of-order when it arrives, merging bulk and real-time data sets, and creating search indexes and data models to enable better health care.
John Schroeder (MapR Technologies)
This session will provide insights into how the combination of scale, efficiency, and analytic flexibility creates the power to expand the applications for Hadoop to transform companies as well as entire industries.
Siraj Khaliq (The Climate Corporation)
Big Data takes on the planet’s toughest challenge by analyzing weather’s complex behavior. Using hundreds of terabytes of data and trillions of simulation datapoints, The Climate Corporation models weather’s impact on crops to create customized insurance for farmers facing the financial impact of extreme weather.
Oscar Padilla (Entravision Communications), Franklin Rios (Luminar), Vineet Tyagi (Impetus Technologies)
How a traditional Spanish-language media company, made the strategic decision to build a robust analytics intelligence division to more effectively target the Hispanic market. Attendees will walk away with insights on how this traditional media company implemented a big data and MapReduce operations from the ground up.
Patrick Shumate (Comcast Cable), Raanan Dagan (Splunk)
How do you keep up with the velocity and variety of data streaming in from the operational systems that power your business? What about getting analytics on your data even before you persist and replicate it?
Mary Ludloff (PatternBuilders), Terence Craig (PatternBuilders)
We’ve looked at the many uses of data, and what governance is required. But are our expectations of privacy realistic? In this session, Terence Craig and Mary Ludloff, authors of Privacy and Big Data, ask (and answer) the question: What level of privacy do we really have in the digital age?
Michael Gold (Farsite), Ryan McClarren (Farsite)
Big data initiatives often begin with a pilot project. This can generate internal support to invest in larger big data initiatives. Nevertheless, executing pilot projects can be difficult, and many pilots don’t convert into larger big data projects. In this session we’ll explore the challenges of big data pilots and suggest ways to plan and execute a successful pilot.
Kim Rees (Periscopic)
Data has been locked in a mindset of rows and columns. Our brains are trapped by database schemas. To get out of that predisposition and communicate visually requires new thinking. This session covers techniques for reframing our thoughts about data, how to describe data, forming a narrative, and coming up with visual solutions.
Michael Segel (Segel & Associates.)
This is a presentation that talks about how cluster design impacts performance. The presentation will cover several different design options and the trade offs in terms of performance and cost. The talk will also cover some of the tuning options based on the underlying hardware considerations.
Eric Sammer (Cloudera)
Presentation: external link
While many of the necessary building blocks for data processing exist within the Hadoop ecosystem, it can be a challenge to assemble them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
Deborah Cooper (Enterprise Data Science)
In this session, Deborah Cooper will show how organizations can use one of the largest free data sets around, along with privately collected data, to provide business insights and drive market strategy. She’ll introduce some of the available datasets, show how to organize them, and provide examples that integrate public and private datasets together for practical decision making.
Richard Brath (Oculus), Noah Schwartz (Bloomberg Sports)
MLB captures 10Tb of game data every year. While valuable data, lessons were quickly learned that effective use of this data required different visual front-ends for fans, players, coaches and scouts. The ability to adapt and address different audiences helped the success of this project and can help other big data projects.
Thejas Madhavan Nair (Hortonworks Inc), Jianyong Dai (Hortonworks)
Apache Pig makes Apache Hadoop easier to use thanks to its high-level data flow language, Pig Latin. In this talk, we will discuss common data analysis tasks, the choices one can make while writing a query and impact of each on performance. The core principles behind the optimization recommendations shared during this presentation are applicable to all MapReduce applications.
Donald Miner (ClearEdge IT Solutions)
The Hadoop and data science communities have matured to the point now that common design patterns across domains are beginning to emerge. Now that Hadoop is maturing and momentum is gaining in the user base, the experienced users can start documenting design patterns that can be shared. In this talk, we'll talk about what makes up a MapReduce design pattern and give some examples.
Michael Flowers (NYC Mayor's Office of Policy and Strategic Planning)
New York City is a complex, thriving organism. Hear how data science has played a surprising and effective role in helping the city government provide services to over 8 million people, from preventing public safety catastrophes to improving New Yorkers' quality of life.
Anne Milgram (NYU Law Center on the Administration of Criminal Law Center)
Anne Milgram, Senior Fellow at the NYU Law Center on the Administration of Criminal Law Center.
Sheridan Hitchens (Auction.com)
A lot has been presented on the tools, models and technologies you should employ in Big Data, but this talk will focus on the critical strategies and tactics you need to employ. In particular it will address such as finding and hiring the right talent, designing the right roles and responsibilities, building the right processes into your group, and molding the culture of the whole organization.
Kurt Brown (Netflix)
Our Data Science tech stack has shifted from best-of-breed, "classic" business intelligence technologies to a hybrid environment, fully leveraging Hadoop and other Big Data solutions. Our philosophy has also evolved, now distilled in thinking and practice into "data science as a service". Why did we do it? What does it look like? What are the benefits? Come find out.
Joe Hellerstein (Trifacta and UC Berkeley)
The story of Big Data technology has centered on engines, algorithms, and statistical methods for data analysis. Less has been said-and too little has been done-regarding technology to improve the lives of data analysts.
Aaron Kimball (Magnify Consulting), Kiyan Ahmadizadeh (WibiData, Inc.)
Performing investigative analysis on data stored in HBase is challenging. Most tools operate on files stored in HDFS, and interact poorly with HBase's data model. This talk will describe characteristics of data in HBase and exploratory analysis patterns. We will describe best practices for modeling this data efficiently and survey tools and techniques appropriate for data science teams.
Daniel Goroff (Alfred P. Sloan Foundation)
All the data in the world won’t make a difference if we can’t change people’s minds. There’s overwhelming evidence that we don’t behave rationally, and that small changes in how information is shared or tainted have huge impacts on its effects.
Ron Bodkin (Think Big Analytics)
There has been a lot of excitement lately about streaming approaches to handling Big Data such as Storm, S4, SQLStream, and InfoStreams. But many use cases can be better handled by low latency access with NoSQL databases and search indexing backed by scoring with batch analytics in Hadoop. We compare such integrated Big Data with streaming systems and look to the future.
Gabriel Eisbruch (Mercadolibre.Com), Luis Darío Simonassi (MercadoLibre.Com), Jonathan Leibiusky (MercadoLibre.com)
The quantity of digital information collected and processed every day is growing at an exponential rate. To make sense of this mountain of data we can no longer afford the delays of batch processing systems. In this track we'll introduce Storm, a new, real-time analytic framework, and show how to use it to massively parallelize information analysis, to get instant results from your data.
Mark Madsen (Third Nature), Marc Demarest (Noumenal, Inc.)
To kick off Bridge to Big Data Day, we present two views of big data. Is it truly something new, or just an evolution of what we have already? Join us for an interesting and entertaining talk that will help frame your thinking on big data.
Sewook Wee (Accenture), Ryan Tabora (Think Big Analytics), Jason Rutherglen (Datastax)
This tutorial will help participants understand why distributed search is important and teach them how to use the landscape of tools available. Based on our hands-on experience at NetApp, we will lead a tutorial session that will teach participants how to setup and use search technologies such as Apache Solr and Lucene to enable real-time Big Data analytics with Hadoop, HBase, and other NoSQL.
Charles Schmitt (Renaissance Computing Institute)
Your DNA, written out as a string of G, A, T, and C, is about three and half gigabytes long. That string is about 99.9% identical to an arbitrary Reference Genome. Practically all of those differences are harmless, but a a tiny fraction can cause disease, contribute to disease, or just change how your body reacts to drugs. We're using Hadoop to find the variants that actually matter.
Steve Yun (Allstate), Joseph Rickert (Revolution Analytics)
Building analytical models is a process of trial and error. Often it makes sense to sample down a data set so that numerous methods and new variables can be tried quickly. Consider moving to the entire data set with Hadoop only after the lessons gleaned from the failures have been incorporated into a few candidate models.
Tom Phillips (m6d)
The accepted wisdom from the very beginning of the Web was that the internet would change everything—media, marketing, commerce, communications. So why do marketers still try to find audiences using the clumsy tools of traditional media? In this session, Tom Phillips will challenge current marketing methodologies, arguing that in an era of big data, it’s time for the machines to take over.
Tom Wheeler (Cloudera, Inc.)
This tutorial will explore the tools and techniques you need to ensure that your MapReduce applications are both correct and efficient. You'll learn how to do unit testing, integration testing and performance testing for your Hadoop jobs, as well as how to intepret diagnostic information to isolate and solve problems in your code.
Claudia Perlich (Dstillery)
Building a reliable data-driven solution to a complex business problem is like designing a pocket watch from scratch. At the heart of successful analytics is the art of decomposing the looming big objective into smaller components, each of which may have its own data feed, modeling technique and runtime constraint. We showcase this process on the example of M6D’s online display advertising.
Rich Hickey (Datomic)
While moving away from single powerful servers, distributed databases still tend to be monolithic solutions. But e.g. key-value storage is rapidly becoming a commodity service, on which richer databases might be built. What are the implications?
Paul Groom (Kognitio)
Business users' attitude to data is changing rapidly – remember when building an EDW was all consuming? Now Big Data is edging the EDW to the side or likely into obscurity. Is this good or bad? How do you bring the values and software investment surrounding the EDW to the wild west of Big Data?
James Markarian (Informatica)
Data integration for Big Data projects can consume up to 80% of the development effort and yet too many developers reinvent the wheel by hand-coding custom connectors, data parsers, and data integration transformations. A metadata-driven, codeless IDE with pre-built transformations and data quality rules have proven to be up to 10X more productive than hand coding and easier to maintain.
Ben Werther (Platfora)
Hadoop is scalable, inexpensive and can store near-infinite amounts of data. But driving it requires exotic skills and hours of batch processing to answer straightforward questions. Learn how everything is about to change.
Rick Smolan (Against All Odds Productions)
Over the past two decades, Rick Smolan, creator of the best selling "Day in the Life" books, has produced a series of ambitious global projects in collaboration with hundreds of the world’s leading photographers, writers, and graphic designers. This year Smolan invited more than 100 journalists around the globe to explore the world of Big Data.
Joe Lamantia (Oracle Endeca)
Presentation: external link
This session presents a simple analytical and generative toolkit for interface design. It provides designers with an effective starting point for creating satisfying and relevant user experiences for Big Data and discovery interfaces. The toolkit helps designers understand and describe users' activities and needs, and then define and design the interactions and interfaces necessary.
Cathy O'Neil (Intent Media), Julie Steele (O'Reilly Media, Inc.)
A fireside chat with Cathy O'Neil about why universities can't make data scientists. Lots of companies want to hire data scientists, and there aren't enough to go around. Some universities are adding data science graduate departments, but they're facing an uphill battle, thanks to a lack of good data for academics, political infighting, and scalability issues.
Annika Jimenez (Pivotal), Anthony Goldbloom (Kaggle)
Data science is a team sport. Collaboration inside and outside your organization is the ultimate Big Data technique. Success depends on having a collaboration platform and solving the number one problem of the Big Data era: the supply and demand for data scientists. Learn how you can take action today to accelerate the success of your data science efforts.
David Blair (Akamai Technologies)
Trecul is a dataflow system that powers Akamai's Online Adversting business, processing billions of events hourly. Trecul is built on top of HDFS & Hadoop Pipes to achieve fantastic runtime performance. We'll talk about it's use of LLVM-based JIT compilation so everything runs as native C++ code, no Java and no runtime interpreter. Akamai has open-sourced Trecul and it is available on Github.
Rob Coneybeer (Shasta Ventures)
We’re on the verge of a sea change of connectivity, as we instrument the world around us, a movement known as the Internet of Things. In this session, Rob Coneybeer looks at the many factors behind this transformation, and how they’re creating a wide-range of new products and opportunities for business and technology.
Jonathan Hsieh (Cloudera, Inc)
As Apache HBase matures, the community has augmented it with new features that are considered hard requirements for many enterprises. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator minimize downtime, monitor performance, control access to the system, and geo-replicate data across data centers.
Jonathan Alexander (Vocalocity, Inc.)
Jonathan Alexander, VP Engineering at Vocalocity and the author of Codermetrics (O’Reilly 2011) and Moneyball for Software Engineering (O’Reilly Radar 2011/2012) presents new ideas on how to gather data and use analytics to create more effective software development teams.
Amandeep Khurana (Cloudera), Matteo Bertozzi (Cloudera)
HBase is one of the more popular open source NoSQL databases that have cropped up over the last few years. Building applications that use HBase effectively is challenging. This tutorial is geared towards teaching the basics of building applications using HBase and covers concepts that a developer should know while using HBase as a backend store for their application.
Lee Feinberg (DecisionViz)
Attendees with learn practical examples how to build a collaborative environment that accelerates the value of big data, with the goal of “making data part of every conversation.”
Lynn Cherny (Ghostweather Research & Design, LLC)
As data scientists, we encounter large networks all the time. Recommendations, social ties, transactions, and other types of data are naturally represented as networks. To understand these networks, metrics help, but visualization is crucial. This talk will focus on tools, techniques, and frameworks to visualize networks cleanly, avoiding or at least minimizing “hairballs”.
Micheline Casey (Federal Reserve Board)
If you’re going to transform your business by infusing it with data and analysis, you’re going to have to pay attention to what you use and how you use it. In this session, Micheline Casey provides an overview of data governance and data management principles that should be applied to big data projects.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.