5, 10 or 20 seats+ for your team - learn more
Nextstellar Corp is a streaming media company that generates huge amounts of data from its customers. They want to move their data infrastructure from on-prem solutions that analyze only a sample of data to a modern cloud solution that gives them full access to all the data they produce. That’s where you come in! In this series of liveProjects, you’ll build an Extract, Transform, and Load (ETL) solution that can transfer data from numerous existing sources to the AWS cloud. You’ll learn how to code raw data transformation logic; use AWS Glue jobs to normalize, transform, and validate data quality rules; coordinate jobs into seamless workflows with AWS Step Functions; and more.
The content vastly exceeded my expectations and I think it’s truly excellent.
The Nextstellar Corp media service has a lot of data—too much data to handle on prem! In order to properly analyze all the data generated by their streaming customers, they’re migrating to the cloud. In this liveProject, you’ll be helping them. You’ll tackle the common challenge of transferring on-prem data to AWS using the handy AWS DataSync tool. You’ll use Infrastructure-as-Code to create Landing Zone Amazon S3 buckets, automate data migration, and finally prepare a summary of likely infrastructure costs for your boss to review.
Nextstellar Corp is a media company with huge amounts of data to analyze. Some of that data is sitting in a PostgreSQL database, which is used for both authentication management and decision-making, as well as maintaining user preferences and feedback. Your boss doesn’t want that data sitting in the database—he wants it in the cloud! Moving it is exactly what you’ll be doing in this liveProject. You’ll use AWS Database Migration Service to enrich Nextstellar’s data lake with the PostgreSQL database so it can take full advantage of both modern data architecture and the AWS ecosystem.
Media company Nextstellar Corp has completed the migration of their data to the cloud—now they need to analyze it! That’s where you come in. You’ll take on the challenge of processing and integrating file and transactional data into a new cloud database solution that uses modern data architecture. You’ll use the AWS Glue tool to automate the whole process, from creating your crawlers and database to building your repository, Glue jobs, triggers, and establishing monitoring, troubleshooting, and scaling.
Nextstellar Corp is very excited—they’re taking all their data to the cloud! They’ve recently migrated and have an early-defined data lake solution. The CEO has approached you to deliver the next step of their cloud data process: using AWS Glue to apply the transformation logic and store the curated data in Amazon S3. You’ll utilize Jupyter Notebooks to curate and standardize your data sets—crafting, organizing, and managing datasets to ensure they are easily accessible and usable—then design a CI/CD pipeline to test and deploy code with a single push after completion.
Nextstellar Corp has recently migrated to the cloud, and for the first time, they can analyze 100% of their company’s data. But there’s a problem: your CEO isn’t confident in your data’s quality. He wants to add more data sources and collect more user behavior information, and ensure these new sources are top-notch by utilizing the Python- (or Scala-) based Deequ library. Your task is to utilize Jupyter Notebooks with AWS Glue Studio to experiment with PyDeequ for data quality assessment. Next, you’ll enhance Nextstellar’s assessment capabilities by employing AWS Glue jobs to react and take action on data quality issues. Finally, you’ll monitor data quality using CloudWatch for prompt response and maintenance of data reliability.
Nextstellar Corp needs you to tackle a big challenge: completing their cloud migration by rebuilding their historical data lake as a data layer in the cloud. You’ll implement an effective and automated data orchestration framework for the ingestion and transform and curate layers, using best practices for Infrastructure-as-Code to automate your data layer. Finally, you’ll establish a monitoring system that will automatically alert you to any issues or problems that might crop up.
The project series has a great structure, starting from the data sources and encompassing all relevant AWS services and Python libraries.
This liveProject is for engineers who want to build a data lake lambda architecture using AWS fully managed services. You will need to know: