Data science goes far beyond computation and visualization. The data scientist needs to explore data interactively, share computations with collaborators, provide textual and mathematical explanations of those computations and present all of this information to a wide range of audiences. Data scientists need software tools that span these diverse activities, without compromising the centrality of code and data.
In this talk, I will introduce the IPython Notebook and describe how it tackles these challenges to provide a foundation for data science that is interactive, repeatable, documented and sharable.
The IPython project provides open source tools for interactive computing in Python. Historically, IPython has provided an enhanced interactive shell for Python. Over the last decade, this shell has become the de facto working environment for scientific and technical computing in Python.
More recently, the IPython development team has expanded its efforts to develop the IPython Notebook, a web based interactive computing environment for Python, R, shell scripts and other languages. At its core, the IPython Notebook is a first class environment for writing and running code in a web browser. Conveniences of a modern IDE are present: syntax highlighting, tab completion, integrated help, access to the system shell, etc. However, the Notebook goes beyond mere code, by enabling users to build documents that combine live, runnable code with visualizations, text, LaTeX formulas, images and videos.
The Notebook document format has been carefully designed to support collaboration. When version controlled (git, mercurial, svn, etc.), these documents preserve a full historical record of a computation, its results and accompanying material including embedded images and visualizations. To enhance the dissemination of results, Notebook documents can be exported to a wide range of formats, including LaTeX, PDF, Markdown and HTML (allowing, for example, easy sharing of technical content on blogs, with posts directly generated from the original source with code, prose and figures). Finally, Notebook documents can be converted to live presentations with the click of a button: the presenter can thus engage the audience with a document that has live computations in it and not just statically pre-rendered figures and media.
In addition to its core features, the Notebook has powerful parallel computing support that scales computations from multi-core CPUs to the cloud. Because the Notebook architecture decouples the actual computations (the Notebook server can be run anywhere, from laptops to the servers in the cloud) from the interactive user interface (web application), users can ensure that computations are run in close proximity to the data.
In this talk I will introduce the IPython Notebook and describe its usage in the context of data science. In particular I will describe how it is ideally suited for the challenges of big data and how its usage dramatically improves the daily work of the data scientist.
Brian Granger is an Assistant Professor of Physics at Cal Poly State University in San Luis Obispo, CA. He has a background in theoretical atomic, molecular and optical physics, with a Ph.D from the University of Colorado. His current research interests include quantum computing, parallel and distributed computing and interactive computing environments for scientific and technical computing. He is a core developer of the IPython project and is an active contributor to a number of other open source projects focused on scientific computing in Python.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at email@example.com
For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata contacts