The amount of digital data available online has been exploding in recent years: Users generate content on blogs and micro-blogs, shopping sites make product reviews and detailed descriptions available. With such amounts of data at their fingertips software developers are more than ever in need for a scalable, easy to use framework for extracting knowledge from the data. Apache Mahout offers scalable implementations of algorithms for data mining and machine learning.
Scalable here means “scalable community” as in the project is based on a sustainable community. The number of possible use cases is scalable in that the library is available under a commercially friendly license . Of course scalable also means scalable in terms of amount of data to process: Apache Mahout is easy to start with but scales to increasing data volumn due to its use of Apache Hadoop.
After motivating the need for machine learning the talk gives an overview of Apache Mahout including a deep dive to one of its algorithms. It shows the tremendous improvements that have been implemented in recent past – including the addition of several algorithms, performance improvements. Last but not least Apache Mahout graduated to a top level project this year.
Isabel Drost is member of the Apache Software Foundation. She is organiser of the Apache Hadoop Get Together in Berlin, was co-organiser of the first European NoSQL meetup as well as the Berlin Buzzwords conference. She co-founded Apache Mahout and is active Apache Mahout committer. Isabel is actively engaged with communities of various Apache projects, e.g. Apache Lucene and Apache Hadoop. She is regular speaker at renown conferences on topics related to free software development, scalability, Apache Lucene, Apache Hadoop and Apache Mahout.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at email@example.com
Download the Strata Sponsor/Exhibitor Prospectus
View a complete list of Strata Contacts