Schedule: Machine Data sessions
The torrent of data from devices, factories, and data centers poses unique challenges to data scientists and engineers. Fortunately, we have new tools for collecting, storing, and analyzing it all. Distributed computing platforms find underlying patterns, hidden anomalies, and powerful correlations at massive scale. In this track, we look at how to harness the flood of information from the Internet of Things.
David Andrzejewski is Lead Data Sciences Engineer at Sumo Logic, which he joined in 2011 after a postdoctoral research position working on knowledge discovery at Lawrence Livermore National Laboratory (LLNL). He is interested in developing tools that combine the power of machine learning with human insights, and has published work applying these ideas to problems in biomedical text mining, information retrieval and software behavior. David completed his PhD in Computer Sciences at the University of Wisconsin-Madison in 2010, where he had also previously received an M.S. in Computer Sciences and a B.S. in Computer Engineering, Mathematics and Computer Sciences.
Organizations of all types and sizes are experiencing an explosion of machine log data whose literally inhuman diversity and scale overwhelms traditional analysis tools and techniques. We will discuss how machine learning can complement human expertise, enabling the extraction of valuable and actionable insights from log data.
This presentation will introduce Big Data in context of the Industrial Internet, describe some of the unique software and analytics opportunities, and present several current research topics making the Industrial Internet a reality.
Smart meters may be the most visible element of the so-called smart grid, but how smart is it if the plants producing the energy are dumb?
To ensure the integrity of the grid, every stage of our electrical power infrastructure – including generation, transmission and distribution – has to get ”smart.” Sophisticated sensors connected to big data analytics are key to keeping the power flowing.
In this talk we discuss the challenges associated with data center operations management and provide details on how CloudPhysics big data platform solves these problems and enables new capabilities that were previously not possible.
With increased road congestion around the globe and growing amounts of car data we need more intelligent analytical methods to beat the traffic. This talk presents our work on traffic velocity and travel disruption analytics. We describe our approach in detail, how we went from idea to implemented algorithm and how our methods can be applied to gain deep insight into influential factors.
After running hundreds of machine learning projects, I've seen many common pitfalls that can derail the project. These gremlins include data leakage, overfitting, poor data quality, solving the wrong problem, and many more. In this talk, I'll go through these gremlins with detailed examples and discuss ways to avoid them, so your machine learning application can be successful and create value.