Hadoop in Action![]() Chuck Lam MEAP Began: February 2009 Softbound print: May 2010 (est.) | 325 pages ISBN: 9781935182191 |
|||
| Pre-Order options* | |||
| Order today and start reading Hadoop in Action today through MEAP | |||
| MEAP + Ebook only - $27.50 | |||
| MEAP + Print book (includes Ebook) when available - $44.99 | |||
| * For more information, please see the MEAP FAQs page. | |||
| About MEAP Release Date Estimates | |||
Table of Contents, MEAP Chapters & Resources
| Table of Contents | Resources |
|
1. Introducing Hadoop - FREE
2. Starting Hadoop - AVAILABLE 3. Components of Hadoop - AVAILABLE 4. Writin Basic MapReduce Programs - AVAILABLE 5. Advanced MapReduce - AVAILABLE 6. Programming Practices - AVAILABLE 7. Cookbook - AVAILABLE 8. Managing Hadoop - AVAILABLE 9. Running Hadoop in the Cloud - AVAILABLE 10. Programming with Pig - AVAILABLE Appendix A HDFS File Commands - AVAILABLE |
|
DESCRIPTION
Hadoop is an open source framework implementing the MapReduce algorithm behind Google's approach to querying the distributed data sets that constitute the internet. This definition naturally leads to an obvious question, "What are "maps" and why do they need to be "reduced?"
Massive data sets can be extremely difficult to analyze and query using traditional mechanisms, especially when the queries themselves are quite complicated. In effect, the MapReduce algorithm breaks up both the query and the data set into constituent parts—that's the "mapping." The mapped components of the query can be processed simultaneously—or "reduced"—to rapidly return results.
Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.
The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.
Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.
This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples.
WHAT'S INSIDE
- A thorough introduction to MapReduce concepts and the Hadoop framework
- Numerous hands-on examples to illustrate abstract ideas
- Coverage of Hadoop Streams and Pipes, used for languages other than Java
- Using Hadoop with other libraries, such as Nutch
About the Author
Chuck Lam is a Senior Engineer at RockYou!. Chuck received his B.S from San Jose State University and his Ph.D in Electrical Engineering from Stanford University, where his thesis topic was computational data acquisition.
About the Early Access Version
This Early Access version of Hadoop in Action enables you to receive new chapters as they are being written. You can also interact with the authors to ask questions, provide feedback and errata, and help shape the final manuscript on the Author Online
Want to learn More?
Sign up to read more content when it is released and to receive news about this book.


