Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.

5 of 13 chapters available

Resources

Book Forum Source Code on GitHub more

Become a
Reviewer

Help us create great books

Big Data Warehousing

Karthik Ramachandran, Istvan Szegedi, and Richard L. Saltzer

MEAP began June 2015
This book is in development

ISBN 9781633430280
425 pages (estimated)

printed in black & white

We regret that we will not be publishing this title.

Big Data Warehousing teaches you new techniques for common data warehousing tasks such as data ingest, SQL queries and report generation in a big data environment. You’ll get a quick tour of using Hive and Impala to query and analyze large semi-structured datasets and learn how to build an Extract, Load, and Transform (ETL) workflow You’ll explore data extraction with Sqoop and address the practical question of schemas for modeling and transforming big data. As you progress through the book, you’ll survey data governance with Falcon, how to build dataflows with Oozie, approaches to data processing, writing queries with SparkSQL, and data security using Apache Sentry and Knox.

about the technology

Data warehouses, once the exclusive domain of large enterprises, are becoming increasingly commonplace as businesses shift to data-driven decision making However, the traditional tools and approaches to building data warehouses can no longer cost-effectively handle the amount of data that even a modest-sized business can capture. On the other hand, the new ecosystem of big data tools surrounding Spark and Hadoop not only handle these data volumes they are accessible to a wide range of users with diverse needs - including business analysts, data scientists, and application developers.

what's inside

Querying Big Data with Hive and Impala
ETL with Hadoop
Shaping the data lifecycle with Oozie and Falcon
Securing data with Knox and Sentry
Modeling data within Hadoop

about the reader

This book assumes you're familiar with SQL-based data warehousing technologies and patterns. Readers do not need to be familiar with Java or Scala programming, but it helps.

about the authors

Karthik Ramachandran is a software engineer and Big Data expert who makes big data technologies and machine learning accessible to business users. He has extensive experience both with traditional enterprise data warehousing solutions as well as with the Hadoop ecosystem. Istvan Szegedi is a senior technical solutions architect working with enterprise data technologies and Hadoop. Richard Saltzer is a Software Engineer on Cloudera's internal data platform team where he builds scalable ingestion pipelines with Impala.

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Big Data Warehousing ebook for free

team

monthly

annual

$49.99

$499.99
only $41.67 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Big Data Warehousing ebook for free

more seats?

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Big Data Warehousing ebook for free

team

monthly

annual

$49.99

$499.99
only $41.67 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Big Data Warehousing ebook for free

more seats?