Spreadsheets are used almost everywhere, for almost everything. Researchers from Delft University of Technology have studied spreadsheet users and their spreadsheets to learn more on how exactly they are built, maintained and migrated. In this session we present a case study concerning the analysis of 3 million spreadsheets we analyzed.
Does pre-competetive collaboration ease the pain of adopting disruptive big-data technologies? This question is tacked using the example of management/analysis of large genomic sequence data sets, and their role in the development of personalised medicine.
Late last summer, Etsy made a seemingly innocuous change to its search engine that had far reaching impact. The change was coordinated with three major data-driven product launches, from search to advertising to analytics. Big data can cause big changes, and this talk focuses on big data from an end-to-end product view, ranging from the underlying technology to understanding longer-term impacts.
What will Big Data mean to us as users, consumers and organisations? And will it really be a big deal? In this presentation Mikael Bisgaard-Bohr will provide a fascinating view into where the Big Data wave is taking us, and why it is about so much more than just data.
Massive analytics has emerged as an offshoot of Big Data with tremendous upside potential for businesses that can figure out how to manage that data. CIOs must reduce the TCO to support massive data computation while enhancing analysis workflows. This session explores new capabilities for combined storage and computation with Hadoop MapReduce to solve today’s Big Data challenges
Strategy has changed. The step-change in data abundance, speed and competition means that static business plans striving for that 'perfect answer' are obsolete. We'll demonstrate how the Data Science underpinning race strategy engines used in Formula One to plan, track and update strategy in real-time are enabling Fortune 500 be more agile, and creating a new way of strategy planning.
Social games are the poster children of metrics-driven design. The way that analytics is used to optimise design for games has lessons which are transferable to other domains. But even poster children have problems. We look at the landscape of analytical tools designed to support game design refinement, identify the main pitfalls involved in practice, and suggest workarounds.
Establishing cause and effect from observational data is extremely difficult. However by introducing randomization, or better still, controlled experiments, it becomes possible to establish true causality. This talk will survey the the difficulties and pitfalls of establishing cause and effect from observed data, and explain ways to introduce experimentation.
A practical step-by-step description of how the LAMP based Top10 Alpha was turned into fully data-driven product. Based around a real-time data processing pipeline and asynchronous stack, Top10's infrastructure now hinges on AKKA, along with Scala, Nodejs and a host of other technologies. This has enabled interesting uses of the data and new, exciting user-facing features.
For all of our machine learning algorithms and big data tools, so many of the problems we solve day-to-day are decidedly "first world": figuring out how to get the biggest ROI on ad dollars or crafting personalized movie recommendations. Can we use our skills as data scientists to solve social problems as well, helping people find clean water as easily as they can find good restaurants?
Readers and preparers of graphs: Learn to recognize and avoid some common graphical mistakes to understand your data better and make better decisions from data.
The real challenge ahead of us is not accumulating more information, or processing more information, or analytics, or replacing relational databases, or scaling data (i.e. not the 3 Vs). The real challenge is solving the information glut problem.
Logic programming recently gained new interest with people processing large data volumes with Hadoop. This talk demonstrates the basic concepts by using Cascalog.
Liam Maxwell, Executive Director of the IT Reform Group in the Cabinet Office
Big data often doesn't sit well with companies that want to move fast. Technologies like Hadoop can be expensive to setup, slow to produce results, and time consuming to maintain. Streaming algorithms provide an alternative. They are simple to implement, very efficient, and give real-time results. In this talk I will describe several key streaming algorithms, and give examples of their use.
This presentation will give an overview of mapreduce-based algorithms described in recent papers written by academic and industrial researchers. Included areas: AI/Machine Learning, Bioinformatics, Information Retrieval. Focus will be on patterns of problems and the corresponding mapreduce solution patterns. Some background material:
Jeni Tennison, Technical Director of the newly formed Open Data Institute, will describe the ODI’s twin aims of helping data owners achieve their organisational objectives through publishing open data, and helping those who reuse that data to add value responsibly and effectively, thereby turning open data dreams into reality.
In this talk Shaun Connolly, VP Corporate Strategy for Hortonworks, will look at Hadoop's opportunity and the value it can unlock. Along the way he will discuss the kind of efforts required from the community, the solution ecosystem, and the enterprise in order to solidify Hadoop's place within the enterprise.
Quantifying one's self, a growing trend, is about self-awareness, pattern spotting & behaviour change. What is missing is "data literacy" i.e. data expertise at individual level, not just for businesses and institutions.Uncovering hidden cause and effect in one's behaviour increases individual's autonomy and for that we need to have access to analytical tools and raw data. How & where to get them?
(Goldsmiths University of London/Fun & Plausible Solutions)
When constructing a music recommender system, which is more important: a musicological understanding of the catalog of music in a system or the number of times two particular songs were played one after the other and were `liked’? Even better, if a system knows the latter, does the former even matter? Do machines that predict behavior need to learn to listen? Or is observing behavior enough?
Nobody knows statistics. They are as esoteric as chemical compounds are to chemistry. Yet data visualizations often incorporate a logarithmic scale, density traces, or seasonally adjusted numbers among other things. If this is the data deluge, we're bound to find everyone swept downstream. How do we prepare the average data consumer?
Mapping real-world correspondence to data structures populating a storage matrix currently expanding by some 5 trillion bits per second is the challenge that brings us here.
It's 1951 and you've got the world's first business computer and you've
just been handed a Big Data problem. Go!
Everyone uses the term big data but no on can agree on what it means
or even if it's novel. However the label is useful to describe the
radically new ways that the world interacts with information - for
which the public, policymakers and even data geeks, are unprepared.
Alexandra Deschamps-Sonsino, Founder of Good Night Lamp / Founder of Designswarm
Big data isn't just multi-terabyte datasets hidden inside eventually-concurrent distributed databases in the cloud. It’s also about the hidden data you carry with you all the time, data that is generated for you and about you, but not necessarily by you. Hidden data, your data, carrying on its secret life without your knowledge, but with your implicit and implied consent.
Data provides critical insight into the way government works. When the UK government published every item of spending over £25,000, the data was hard to parse. The UK Guardian Datablog cleaned it up and asked readers to help pore through the numbers, making everyone a data journalist. We’ll cover the technologies the Guardian uses to analyze, visualize, and share data with the world.
In this talk, I will look at the next step in big data in general and open data in particular: transparency of insight and how the intelligent transformation of data into narratives can bring to light the stories within it and enable the higher level of understanding and insight needed to support evidence-based decision-making.
Data is great. Data is powerful. But when some data is missing, bias
can be introduced, distorting the overall picture.