Six-Project Series

A Storage Layer for Big Data in AWS

you own this product

prerequisites: Basic understanding of data storage and management concepts • familiarity with AWS services and console, knowledge of Python • basic networking concepts and configurations • basic understanding of security measures and best practices • understanding of data quality concepts and metrics • understanding of data pipeline and workflow concepts and tools.
skills learned: Moving data to the cloud using AWS DataSync or AWS Database Migration Service • transforming data using AWS Glue and PySpark • developing data quality checks using PyDeequ • orchestration of data pipelines using AWS Step Functions and AWS Glue Workflow • CloudFormation for infrastructure automation • data lake management and organization • data lifecycle policies and management.

Gianluigi Mucciolo

6 weeks · 5-7 hours per week average · INTERMEDIATE

Included with a Manning Online subscription

catalog / Software Development

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

whole series

$89.99 $49.99

you save $40.00 (44%)

Nextstellar Corp is a streaming media company that generates huge amounts of data from its customers. They want to move their data infrastructure from on-prem solutions that analyze only a sample of data to a modern cloud solution that gives them full access to all the data they produce. That’s where you come in! In this series of liveProjects, you’ll build an Extract, Transform, and Load (ETL) solution that can transfer data from numerous existing sources to the AWS cloud. You’ll learn how to code raw data transformation logic; use AWS Glue jobs to normalize, transform, and validate data quality rules; coordinate jobs into seamless workflows with AWS Step Functions; and more.

go to series

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

The content vastly exceeded my expectations and I think it’s truly excellent.

Nick Miller, Independent computational linguist

here's what's included

Project 1 Migrate Files to the Cloud

The Nextstellar Corp media service has a lot of data—too much data to handle on prem! In order to properly analyze all the data generated by their streaming customers, they’re migrating to the cloud. In this liveProject, you’ll be helping them. You’ll tackle the common challenge of transferring on-prem data to AWS using the handy AWS DataSync tool. You’ll use Infrastructure-as-Code to create Landing Zone Amazon S3 buckets, automate data migration, and finally prepare a summary of likely infrastructure costs for your boss to review.

learn more

$29.99 $17.99

Project 2 Input Transactional Data

Nextstellar Corp is a media company with huge amounts of data to analyze. Some of that data is sitting in a PostgreSQL database, which is used for both authentication management and decision-making, as well as maintaining user preferences and feedback. Your boss doesn’t want that data sitting in the database—he wants it in the cloud! Moving it is exactly what you’ll be doing in this liveProject. You’ll use AWS Database Migration Service to enrich Nextstellar’s data lake with the PostgreSQL database so it can take full advantage of both modern data architecture and the AWS ecosystem.

learn more

$29.99 $17.99

Project 3 Integrate Data with AWS Glue

Media company Nextstellar Corp has completed the migration of their data to the cloud—now they need to analyze it! That’s where you come in. You’ll take on the challenge of processing and integrating file and transactional data into a new cloud database solution that uses modern data architecture. You’ll use the AWS Glue tool to automate the whole process, from creating your crawlers and database to building your repository, Glue jobs, triggers, and establishing monitoring, troubleshooting, and scaling.

learn more

$29.99 $17.99

Project 4 Serverless Transformation

Nextstellar Corp is very excited—they’re taking all their data to the cloud! They’ve recently migrated and have an early-defined data lake solution. The CEO has approached you to deliver the next step of their cloud data process: using AWS Glue to apply the transformation logic and store the curated data in Amazon S3. You’ll utilize Jupyter Notebooks to curate and standardize your data sets—crafting, organizing, and managing datasets to ensure they are easily accessible and usable—then design a CI/CD pipeline to test and deploy code with a single push after completion.

learn more

$29.99 $17.99

Project 5 Data Quality Check

Nextstellar Corp has recently migrated to the cloud, and for the first time, they can analyze 100% of their company’s data. But there’s a problem: your CEO isn’t confident in your data’s quality. He wants to add more data sources and collect more user behavior information, and ensure these new sources are top-notch by utilizing the Python- (or Scala-) based Deequ library. Your task is to utilize Jupyter Notebooks with AWS Glue Studio to experiment with PyDeequ for data quality assessment. Next, you’ll enhance Nextstellar’s assessment capabilities by employing AWS Glue jobs to react and take action on data quality issues. Finally, you’ll monitor data quality using CloudWatch for prompt response and maintenance of data reliability.

learn more

$29.99 $17.99

Project 6 Orchestrate an ETL Pipeline

Nextstellar Corp needs you to tackle a big challenge: completing their cloud migration by rebuilding their historical data lake as a data layer in the cloud. You’ll implement an effective and automated data orchestration framework for the ingestion and transform and curate layers, using best practices for Infrastructure-as-Code to automate your data layer. Finally, you’ll establish a monitoring system that will automatically alert you to any issues or problems that might crop up.

learn more

$29.99 $17.99

books resources

When you start each of the projects in this series, you'll get full access to the following books for 90 days.

go to series

whole series

$89.99 $49.99

you save $40.00 (44%)

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
A Storage Layer for Big Data in AWS project for free

team

monthly

annual

$49.99

$399.99
only $33.33 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
A Storage Layer for Big Data in AWS project for free

more seats?

The project series has a great structure, starting from the data sources and encompassing all relevant AWS services and Python libraries.

Ninoslav Cerkez, Senior ML Engineer, Rimac Technology

project author

Gianluigi Mucciolo

Gianluigi Mucciolo is a highly skilled computer engineer who specializes in AWS technologies and agile methodologies. As an AWS Authorized Instructor and Cloud Technical Principal, he is dedicated to advancing cloud professionals’ knowledge and participates in community-building initiatives. With a strong background in Artificial Intelligence and Big Data, Gianluigi constantly seeks growth opportunities. A team player, he excels in both collaborative and independent work settings. In his free time, Gianluigi enjoys intellectual discussions, reading, and connecting with nature for inspiration.

Prerequisites

This liveProject is for engineers who want to build a data lake lambda architecture using AWS fully managed services. You will need to know:

TOOLS

Basics of the Linux console
Basics of AWS
Basics of the AWS console
Basics of the AWS CLI

TECHNIQUES

Basics of infrastructure automation
Basics of big data Lambda architecture

features

Self-paced: You choose the schedule and decide how much time to invest as you build your project.
Project roadmap: Each project is divided into several achievable steps.
Get Help: While within the liveProject platform, get help from fellow participants and even more help with paid sessions with our expert mentors.
Compare with others: For each step, compare your deliverable to the solutions by the author and other participants.
book resources: Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.