Four-Project Series

Math for Machine Learning

you own this product

prerequisites: intermediate Python • basic understanding of vectors and matrices from linear algebra • basic understanding of probability from statistics • basic understanding of derivatives from calculus
skills learned: manipulate NumPy arrays and pandas DataFrames • Eigenvalues and Eigenvectors • Singular Value Decomposition • principal component analysis • backpropagation • Bayes' theorem • fine-tune large language models

Nicole Königstein

4 weeks · 6-8 hours per week average · INTERMEDIATE

Included with a Manning Online subscription

catalog / Data Science / Data Analysis

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

whole series

$69.99 $41.99

you save $28.00 (40%)

Put on your data scientist hat for this series of liveProjects, where you’ll work at Finative, an analytics company that uses environmental, social, and governance (ESG) factors to measure companies’ sustainability, a brand new, eco-focused trend that's changing the way businesses think about investing. In each liveProject, you’ll focus on different machine learning (ML) and deep learning (DL) mathematical approaches—including Bayes' theorem, principal component analysis (PCA), cosine similarity, latent semantic analysis, and backpropagation—as you help Finative accomplish its goal of increasing its own sustainability.

You’ll develop a method to reduce the runtime of ML models, and you’ll save digital storage space by finding relevant keywords in order to determine whether documents should be discarded or saved. To increase efficiency, you’ll save training time by using a pre-trained language model to classify a sustainability report. Then, you’ll analyze the sentiment of tweets in order to detect greenwashing, the practice of spreading disinformation about a company’s sustainability. When you’re finished with these liveProjects, you’ll have a solid understanding of the mathematical basics of machine learning, strong programming and data science skills, and familiarity with sustainability.

go to series

These projects are designed for learning purposes and are not complete, production-ready applications or solutions.

The takeaway from the author is very insightful. In general, I would say this project is very helpful and we can learn the theory and practice at the same time.

Zhiwei Cheng, Analyst, Allianz

here's what's included

Project 1 Principal Component Analysis

Step into the role of data scientist at Finative, an analytics company that uses environmental, social, and governance (ESG) factors to measure companies’ sustainability, a brand new, eco-focused trend that's changing the way businesses think about investing. To provide its clients with the valuable insights they need in order to develop their investment strategies, Finative analyzes a high volume of data using advanced natural language processing (NLP) techniques.

Recently, your CEO has decided that Finative should increase its own sustainability. Your task is to develop a method to optimize the runtime for the company’s machine learning models. You’ll apply principal component analysis (PCA) to the data in order to speed up the ML models. To classify handwritten digits and prove your theory that PCA speeds up ML algorithms, you’ll implement logistic regression with scikit-learn. You’ll use the explained variance ratio to gain an understanding of the trade-offs between speed and accuracy. When you’re done, you’ll be able to present your CEO with proof of PCA’s efficiency in optimizing runtime.

learn more

$29.99 $17.99

Project 2 Latent Semantic Analysis for NLP

At Finative, an ESG analytics company, you’re a data scientist who helps measure the sustainability of publicly traded companies by analyzing environmental, social, and governance (ESG) factors so Finative can report back to its clients. Recently, the CEO has decided that Finative should increase its own sustainability. You’ve been assigned the task of saving digital storage space by storing only relevant data. You’ll test different methods—including keyword retrieval with TD-IDF, computing cosine similarity, and latent semantic analysis—to find relevant keywords in documents and determine whether the documents should be discarded or saved for use in training your ML models.

learn more

$29.99 $17.99

Project 3 Analyze Reports with Hugging Face

You’re a data scientist at Finative, an environmental, social, and governance (ESG) analytics company that analyzes a high volume of data using advanced natural language processing (NLP) techniques in order to provide its clients insights for sustainable investing. Recently, your CEO has decided that Finative should increase its own financial sustainability. Your task is to classify sustainability reports of a publicly traded company in an efficient and sustainable way.

You’ll learn the fundamental mathematics—including backpropagation, matrix multiplication, and attention mechanisms—of Transformers, empowering you to optimize your model’s performance, improve its efficiency, and handle undesirable model predictions. You’ll use Python’s pdfplumber library to extract text from a sustainability report for quick delivery to your CEO. To further increase efficiency, you’ll save training time by using a language model that’s been pre-trained with ESG data to build a pipeline for the model and classify the sustainability report.

learn more

$29.99 $17.99

Project 4 Detect Sentiment with Transformers

Finative, the environmental, social, and governance (ESG) analytics company you work for, analyzes a high volume of data using advanced natural language processing (NLP) techniques to provide its clients with valuable insights about their sustainability. Your CEO has concerns that some of the companies Finative analyzes may be greenwashing: spreading disinformation about their sustainability in order to appear more environmentally conscious than they actually are.

As a data scientist for Finative, your task is to validate your sustainability reports by creating and analyzing them. You’ll compute conditional probability with Bayes’ Theorem, by hand, to better understand your model’s performance through metrics such as recall and precision. You’ll learn an efficient way to prepare your data from different sources and merge it into one dataset, which you’ll use to prepare tweets. To successfully classify the tweets, you’ll use a pre-trained large language model and fine-tune it using the Hugging Face ecosystem as well as hyperopt and Ray Tune. You’ll use TensorBoard and Weights & Biases to analyze and track your experiments, and you’ll analyze the tweets to determine whether enough negative sentiment exists to indicate that the company you analyzed has been greenwashing its data.

learn more

$29.99 $17.99

books resources

When you start each of the projects in this series, you'll get full access to the following books for 90 days.

go to series

whole series

$69.99 $41.99

you save $28.00 (40%)

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Math for Machine Learning project for free

team

monthly

annual

$49.99

$399.99
only $33.33 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Math for Machine Learning project for free

more seats?

The series walks you through the basic math employed in different aspects of ML.

Maxim Volgin, Quantitative Marketing Manager, KLM

project author

Nicole Koenigstein

Nicole Königstein currently works as data science and technology lead at impactvise, an ESG analytics company, and as a quantitative researcher and technology lead at Quantmate, an innovative FinTech startup that leverages alternative data as part of its predictive modeling strategy. She’s a regular speaker, sharing her expertise at conferences such as ODSC Europe. In addition, she teaches Python, machine learning, and deep learning, and holds workshops at conferences including the Women in Tech Global Conference.

Prerequisites

These liveProjects are for ML engineers, intermediate-level Python programmers, and early-stage data scientists. To begin these liveProjects you’ll need to be familiar with the following:

TOOLS

Python, particularly the pandas, NumPy, scikit-learn, Matplotlib, and seaborn libraries

TECHNIQUES

Intermediate linear algebra
Intermediate calculus
Intermediate statistics and probability

features

Self-paced: You choose the schedule and decide how much time to invest as you build your project.
Project roadmap: Each project is divided into several achievable steps.
Get Help: While within the liveProject platform, get help from fellow participants and even more help with paid sessions with our expert mentors.
Compare with others: For each step, compare your deliverable to the solutions by the author and other participants.
book resources: Get full access to select books for 90 days. Permanent access to excerpts from Manning products are also included, as well as references to other resources.