Real-Time Data Pipeline Monitoring Using AWS Lambda

The post discusses the evolution of a data pipeline, highlighting the integration of an API-driven layer for enhanced observability. This new functionality allows authorized users to access real-time operational status without manual checks across AWS services. The approach improves transparency, accountability, and agility while enabling proactive monitoring and automated responses in future enhancements.

Training and Evaluating ML Models with AWS Glue

This post details the development of a Machine Learning Pipeline for demand forecasting. Utilizing AWS Glue and PySpark, it covers training and evaluating Linear Regression and Random Forest models using an engineered feature dataset. Results show Random Forest slightly outperforms Linear Regression, demonstrating effective model stability and reliability for deployment.

Mastering Feature Engineering for Machine Learning

The Feature Engineering stage follows Exploratory Data Analysis, preparing the dataset for machine learning. It generates temporal and statistical features, encodes categorical identifiers, and ensures schema consistency. Implemented in AWS Glue, it enables reproducibility and scalability for model training, enhancing forecasting accuracy by incorporating lag and rolling average features.

Enhancing Your ETL Pipeline with AWS Glue and PySpark

The post details enhancements made to a serverless ETL pipeline using AWS Glue and PySpark for retail sales data. Improvements include explicit column type conversions, missing value imputation, normalization of sales data, and integration of logging for observability. These changes aim to create a production-ready, machine-learning-friendly preprocessing layer for effective data analysis.

Building an ETL Pipeline for Retail Demand Data

This project aims to develop a demand forecasting solution for retail using historical sales data from Kaggle. A data pipeline employing AWS Glue and PySpark will preprocess the data by cleaning and splitting it into training and testing sets. The objective is to maximize inventory management and customer satisfaction.

AWS EC2 Setup for GPU CUDA Programming

Last weekend, I explored GPU CUDA programming using AWS. Despite initial service quota issues, I successfully launched an EC2 instance equipped with an NVIDIA GPU. After setting up the environment, I compiled and ran a CUDA program, achieving a remarkable speedup of 151 times faster on the GPU compared to the CPU.

How to Configure AWS EC2 with NVIDIA GPU for CUDA Development

The author explores CUDA programming on AWS using an NVIDIA GPU, facing vCPU quota limitations that prevent launching an EC2 instance. After diagnosing the issue, they submitted a request for a quota increase through the Service Quotas console. The experience highlights the importance of checking AWS service limits when setting up GPU instances.

Setting Up a Simple Distributed File Service on AWS

In this blog, we’ll build a simple distributed file service on AWS. The setup will have two file servers — Server 1 and Server 2.…

Continue reading → Setting Up a Simple Distributed File Service on AWS

Building a Real-Time Aircraft Tracking System with AWS Lambda, Kinesis, and DynamoDB

Aviation data has always been fascinating. Planes crisscross the globe. Each one sends out tiny bursts of information as it soars through the sky. Thanks…

Continue reading → Building a Real-Time Aircraft Tracking System with AWS Lambda, Kinesis, and DynamoDB

Building a Real-Time GPS Data Processing System on AWS: A Step-by-Step Guide

In today's interconnected world, real-time location tracking plays a crucial role in many industries. These include logistics and fleet management. It is also vital to…

Continue reading → Building a Real-Time GPS Data Processing System on AWS: A Step-by-Step Guide

Set Up a Hadoop Cluster on AWS EMR: A Step-by-Step Guide

Hadoop is a powerful framework that enables distributed processing of large datasets. It follows the MapReduce paradigm. Computation is broken down into independent map and…

Continue reading → Set Up a Hadoop Cluster on AWS EMR: A Step-by-Step Guide

How to Fix AWS SignatureDoesNotMatch Error

The "SignatureDoesNotMatch" error often occurs when uploading files to AWS S3 due to signature mismatches related to secret keys. The author shares a step-by-step guide to troubleshoot this issue, which includes verifying IAM user credentials, configuring access keys, and successfully retrying the upload operation after resolving permissions.

Step-by-Step Guide: Filezilla Setup for AWS EC2

Recently I have been involved in doing to some source code compilation on an AWS EC2 instance. However, after the compilation I encountered a problem.…

Continue reading → Step-by-Step Guide: Filezilla Setup for AWS EC2