Big Data Warehousing

Karthik Ramachandran, Istvan Szegedi, and Richard L. Saltzer
  • ISBN 9781633430280
  • 425 pages (estimated)
  • printed in black & white
We regret that we will not be publishing this title.

Big Data Warehousing teaches you new techniques for common data warehousing tasks such as data ingest, SQL queries and report generation in a big data environment. You’ll get a quick tour of using Hive and Impala to query and analyze large semi-structured datasets and learn how to build an Extract, Load, and Transform (ETL) workflow You’ll explore data extraction with Sqoop and address the practical question of schemas for modeling and transforming big data. As you progress through the book, you’ll survey data governance with Falcon, how to build dataflows with Oozie, approaches to data processing, writing queries with SparkSQL, and data security using Apache Sentry and Knox.

about the technology

Data warehouses, once the exclusive domain of large enterprises, are becoming increasingly commonplace as businesses shift to data-driven decision making However, the traditional tools and approaches to building data warehouses can no longer cost-effectively handle the amount of data that even a modest-sized business can capture. On the other hand, the new ecosystem of big data tools surrounding Spark and Hadoop not only handle these data volumes they are accessible to a wide range of users with diverse needs - including business analysts, data scientists, and application developers.

what's inside

  • Querying Big Data with Hive and Impala
  • ETL with Hadoop
  • Shaping the data lifecycle with Oozie and Falcon
  • Securing data with Knox and Sentry
  • Modeling data within Hadoop

about the reader

This book assumes you're familiar with SQL-based data warehousing technologies and patterns. Readers do not need to be familiar with Java or Scala programming, but it helps.

about the authors

Karthik Ramachandran is a software engineer and Big Data expert who makes big data technologies and machine learning accessible to business users. He has extensive experience both with traditional enterprise data warehousing solutions as well as with the Hadoop ecosystem. Istvan Szegedi is a senior technical solutions architect working with enterprise data technologies and Hadoop. Richard Saltzer is a Software Engineer on Cloudera's internal data platform team where he builds scalable ingestion pipelines with Impala.

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Big Data Warehousing ebook for free

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • Big Data Warehousing ebook for free