In this hands-on tutorial, you will learn the importance of distributed search by our industry experience and a specific example. In particular, we’ll introduce the architecture that incorporates distributed search techniques, share pain points experienced and lessons learned. Building atop, we’ll depict the landscape of distributed search tools and their future directions. For the hands-on part of the tutorial, you will learn how to install and use Apache Solr for real-time Big Data analytics, search, and reporting. You’ll also learn some tricks of the trade and how to handle known issues.
We’ll email instructions to you before the tutorial so you can come prepared with the necessary tools installed and ready to go. This prior preparation will let us use the whole tutorial time to learn some of the fundamentals of the Lucene query language and other important topics. At the beginning of the tutorial we’ll show you how to use these tools.
We’ll spend most of the tutorial using a series of hands-on exercises with actual Lucene queries, so you can learn by doing. We’ll go over all the main features of Lucene’s query language, and how Lucene works with data in Hadoop.
This section will cover advanced topics such as relevance ranking, facets, group by, sort by, and other important features for Big Data search projects. Lucene enables many types of customizations of the underlying technology.
We’ll conclude with a discussion of Lucene’s place in the Hadoop ecosystem, such as how it compares to other available tools. We’ll discuss installation and configuration issues that ensure the best performance and ease of use in a real production cluster. In particular, we’ll discuss how to create an efficient Lucene secondary index on data stored in HBase, Cassandra, and other NoSQL databases.
Sewook Wee is an R&D manager at Accenture Technology Labs. His research has been grounded on distributed system with current emphasis on Big Data platform technologies. Recently, he led Hadoop deployment comparison study where he compared bare metal Hadoop cluster with Hadoop services (Amazon EMR) at the total cost of ownership level with three real world workloads. Previously, he has led various R&D projects including hybrid NoSQL approaches that layers graph data management capability on column-oriented datastores; MapReduce-based data transformation framework; next generation software architecture that maximizes the benefits of cloud; MonteCloudo, elastic Monte Carlo simulation architecture using cloud; and web server farm architecture on AWS EC2 environment. Along with leading R&D projects, he publishes academic papers, business white papers, files patents, presents in both academic and industry conferences, builds relationships with business partners and clients. He received MS and PhD degrees from Stanford University, and his alma mater is Seoul National University in South Korea.
Ryan is a data engineer at Think Big Analytics. He leads technical consulting projects for big data implementations at Fortune 500 clients. He has in depth experience working with Solr/Lucene and the Hadoop stack.
Jason is a Sr. Architect at Think Big Analytics. He has many years of experience writing Java application software, most recently for Hadoop-based applications.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org.
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts.