Manning Early
Access Program
Hadoop in Action
EARLY ACCESS EDITION

Chuck Lam

MEAP Began: February 2009
Softbound print: May 2010 (est.) | 325 pages
ISBN: 9781935182191

Pre-Order options*
Order today and start reading Hadoop in Action today through MEAP      
  MEAP + Ebook only - $27.50
  MEAP + Print book (includes Ebook) when available - $44.99
* For more information, please see the MEAP FAQs page.
  About MEAP Release Date Estimates    

Table of Contents, MEAP Chapters & Resources

Table of Contents         Resources 
 1. Introducing Hadoop - FREE
 2. Starting Hadoop - AVAILABLE
 3. Components of Hadoop - AVAILABLE
 4. Writin Basic MapReduce Programs - AVAILABLE
 5. Advanced MapReduce - AVAILABLE
 6. Programming Practices - AVAILABLE
 7. Cookbook - AVAILABLE
 8. Managing Hadoop - AVAILABLE
 9. Running Hadoop in the Cloud - AVAILABLE
10. Programming with Pig - AVAILABLE

Appendix A HDFS File Commands - AVAILABLE
 

DESCRIPTION

Hadoop is an open source framework implementing the MapReduce algorithm behind Google's approach to querying the distributed data sets that constitute the internet. This definition naturally leads to an obvious question, "What are "maps" and why do they need to be "reduced?"

Massive data sets can be extremely difficult to analyze and query using traditional mechanisms, especially when the queries themselves are quite complicated. In effect, the MapReduce algorithm breaks up both the query and the data set into constituent parts—that's the "mapping." The mapped components of the query can be processed simultaneously—or "reduced"—to rapidly return results.

Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.

The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.

Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.

This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples.

WHAT'S INSIDE

About the Author

Chuck Lam is a Senior Engineer at RockYou!. Chuck received his B.S from San Jose State University and his Ph.D in Electrical Engineering from Stanford University, where his thesis topic was computational data acquisition.

About the Early Access Version

This Early Access version of Hadoop in Action enables you to receive new chapters as they are being written. You can also interact with the authors to ask questions, provide feedback and errata, and help shape the final manuscript on the Author Online

Want to learn More?

Sign up to read more content when it is released and to receive news about this book.