5, 10 or 20 seats+ for your team - learn more
Welcome to Free Power Corporation Limited (FPCL), a London-based energy company looking for a solution to deal with surging energy costs. FPCL has installed Smart Meters, which generate energy readings every thirty minutes, in households across London. As a data engineer for FPCL, you’ll create a Kafka cluster and ingest the real-time Smart Meter data into it. You’ll use Spark to read, clean, join, and process the data, adding logic to handle potential real-world problems like data loss and duplicate data. To meet the different business requirements of various FPCL teams, you’ll also perform advanced stream processing on the data streams. By the end of this series of liveProjects, you’ll have the experience and skills to ingest large amounts of data and perform complex analysis on it in real time using Apache Kafka and Spark.
The project is taking a very good progressive way to bring the user from basics to advanced covering the foundations of Kafka.
As a first step in dealing with surging energy prices, Free Power Corporation Limited (FPCL) has installed Smart Meters, which generate energy readings every thirty minutes, in households across London in order to analyze consumers’ energy usage. As a new data engineer for the power company, your task is to ingest the data from the Smart Meter readings and stream it to FPCL data centers for processing. Using the Kafka command-line tool, you’ll create topics in a Kafka cluster for storing the data, and you’ll create partitions for distributing the load within the topics. You’ll add logic to deal with potential problems such as data loss and duplicate records, and you’ll add a method to convert the energy readings to the widely used, easy-to-parse JSON format before the final step of ingesting the data. When you’re finished, FPCL will have pertinent data for analyzing energy consumption patterns, and you’ll have practical experience using Kafka to ingest large amounts of data.
As part of an endeavor to better handle surging energy prices, Free Power Corporation Limited (FPCL) has a Kafka cluster that ingests large amounts of consumer energy data. As a data engineer for FPCL, you’re already familiar with the data, so the London-based power company has tasked you with building a streaming solution that processes the data as soon as it’s available. Using Apache Spark, you’ll create an application to read the data from the Kafka streams, and you’ll save the streams to a data lake. Using a Spark API, you’ll prepare the data for analysis by performing aggregation on the fly. You’ll join the real-time stream with the static data, enriching it with customer details and enabling FPCL’s research team to gain insights about customer energy consumption patterns. When you’re done, FPCL will be better equipped to deal with rising energy costs, and you’ll have hands-on experience building a real-time data processing solution using Apache Spark and Kafka.
You’re the star data engineer at Free Power Corporation Limited (FPCL). The London-based power company is interested in gaining insight into its customers’ energy usage patterns, and it’s up to you to deliver a data-rich solution that satisfies the requirements of FPCL’s various teams. You’ll create a streaming Spark application to read the consumer event stream from Kafka, you’ll add information that helps the teams determine when data was generated, ingested, and processed, and you’ll write logic to reorder any late or out-of-order data. To provide vital household energy consumption statistics to the sales and electrical engineering teams, you’ll join Kafka data streams and perform complex computations on the resulting stream. To be sure your solution is ready for the teams to use, you’ll test it on the local Spark cluster. When you’ve finished, you’ll have learned advanced stream processing skills that empower you to meet the different business requirements of various enterprise departments.
It's a very good project to learn Spark Streaming with Kafka. Very well executed with simple steps.
For me, it is a definite game changer as I can now say I have real-life experience with event streaming as my current company (due to regulatory constraints) cannot adopt new technologies on a whim.
These liveProjects are for intermediate Scala developers and data engineers with basic knowledge of distributed computing technologies such as Apache Spark. To begin these liveProjects you’ll need to be familiar with the following:
TOOLSgeekle is based on a wordle clone.