AI/ML & Data Engineering – Tech For Talk

March 13, 2026March 13, 2026

Build an AI-Powered Exam Marking Tool

The project outlines the creation of an AI-based examiner tool that automates the marking of handwritten GCSE exams. Teachers can upload scanned PDFs, and in 20-30 seconds, receive detailed feedback reports formatted as .docx files. Built using Python, Flask, and Gemini AI, it offers an efficient marking solution while ensuring data privacy.

October 25, 2025November 21, 2025

Automating AWS Glue Workflows with EventBridge

The blog discusses the integration of Amazon EventBridge to automate AWS Glue workflows every two minutes, enhancing operational efficiency in data engineering and machine learning tasks. It details steps to create and configure EventBridge rules, set permissions, and verify workflows, emphasizing improvements in responsiveness, agility, and DataOps maturity.

October 24, 2025November 21, 2025

Mastering DataOps: Orchestrating AWS Glue Workflows

The implemented stages of ingestion, preprocessing, EDA, and feature engineering have transitioned to automation and monitoring, forming a cohesive DataOps layer. By introducing orchestration, the independent Glue jobs become an automated, reliable workflow. Testing confirmed successful execution, paving the way for regular automations to enhance operations and insights from data.

October 22, 2025November 21, 2025

Real-Time Data Pipeline Monitoring Using AWS Lambda

The post discusses the evolution of a data pipeline, highlighting the integration of an API-driven layer for enhanced observability. This new functionality allows authorized users to access real-time operational status without manual checks across AWS services. The approach improves transparency, accountability, and agility while enabling proactive monitoring and automated responses in future enhancements.

October 21, 2025November 21, 2025

Training and Evaluating ML Models with AWS Glue

This post details the development of a Machine Learning Pipeline for demand forecasting. Utilizing AWS Glue and PySpark, it covers training and evaluating Linear Regression and Random Forest models using an engineered feature dataset. Results show Random Forest slightly outperforms Linear Regression, demonstrating effective model stability and reliability for deployment.

October 19, 2025November 21, 2025

Mastering EDA for Demand Forecasting on AWS

This article expands on a previous post about building a serverless ETL pipeline on AWS by focusing on Exploratory Data Analysis (EDA). It details how to establish the EDA environment using AWS Glue and PySpark after cleaning the dataset. Key insights include sales trends, store and item performance, and correlation analysis, laying the groundwork for a demand forecasting model.

October 19, 2025November 21, 2025

Enhancing Your ETL Pipeline with AWS Glue and PySpark

The post details enhancements made to a serverless ETL pipeline using AWS Glue and PySpark for retail sales data. Improvements include explicit column type conversions, missing value imputation, normalization of sales data, and integration of logging for observability. These changes aim to create a production-ready, machine-learning-friendly preprocessing layer for effective data analysis.

October 18, 2025November 21, 2025

Building an ETL Pipeline for Retail Demand Data

This project aims to develop a demand forecasting solution for retail using historical sales data from Kaggle. A data pipeline employing AWS Glue and PySpark will preprocess the data by cleaning and splitting it into training and testing sets. The objective is to maximize inventory management and customer satisfaction.

October 5, 2025November 21, 2025

How to Configure AWS EC2 with NVIDIA GPU for CUDA Development

The author explores CUDA programming on AWS using an NVIDIA GPU, facing vCPU quota limitations that prevent launching an EC2 instance. After diagnosing the issue, they submitted a request for a quota increase through the Service Quotas console. The experience highlights the importance of checking AWS service limits when setting up GPU instances.

September 13, 2025November 21, 2025

How Did I Run and Containerise My First Flask App?

The article discusses the challenges of consistent application behavior in software development and how Docker addresses these issues. It outlines the creation of a simple Flask app, its containerization using Docker, and steps to ensure accessibility from outside the container. Troubleshooting and cleanup procedures are also covered, emphasizing a portable setup.

September 6, 2025November 21, 2025

Mastering RESTful API Basics for Developers

APIs are the backbone of modern applications. Among the different approaches to building them, one of the most widely used is the RESTful API. In…

Continue reading → Mastering RESTful API Basics for Developers

August 31, 2025November 21, 2025

Setting Up a Simple Distributed File Service on AWS

In this blog, we’ll build a simple distributed file service on AWS. The setup will have two file servers — Server 1 and Server 2.…

Continue reading → Setting Up a Simple Distributed File Service on AWS

March 16, 2025November 21, 2025

Setting Up and Running a Hadoop MapReduce Job on a Standalone Cluster: A Step-by-Step Guide

Hadoop’s MapReduce framework is a powerful tool for processing large-scale data in a distributed fashion. In this guide, we walk through setting up a Hadoop…

Continue reading → Setting Up and Running a Hadoop MapReduce Job on a Standalone Cluster: A Step-by-Step Guide