Building an AI-Powered Examiner: Automated GCSE Marking with Gemini and Python

A complete walkthrough of a local web app that reads handwritten student answers and produces professional feedback reports end to end

Vivek Bhadra | Python Flask Gemini AI Docker python-docx

Introduction

Marking a class set of handwritten GCSE exam papers is one of the most time-consuming tasks a teacher faces. Every answer needs to be read against a multi-page mark scheme, level descriptors need to be applied consistently, and feedback needs to be written in a form the student can actually act on. For a class of thirty, that is several hours of focused work repeated across every topic, every mock, every term.

This project automates that pipeline. A teacher uploads a scanned PDF of a student’s handwritten answer to a local web app. Within 20-30 seconds, a professionally formatted .docx feedback report downloads automatically complete with the student’s answer transcribed verbatim, a gap analysis table, a level assessment, a mark out of the available total, a confidence rating, and two actionable improvement sentences written directly to the student.

The tool is built on Google Gemini (via the google-generativeai Python SDK), a Flask web server, and python-docx for report generation. It runs entirely locally inside Docker no data leaves your machine except for the Gemini API call itself.

This post walks through every layer of the system: what the app does, how it does it, why the key design decisions were made, and exactly how to replicate it from scratch.

What You Will Build

By the end of this post you will have a running local web application that:

Accepts scanned PDF uploads of handwritten student answers via a drag-and-drop browser UI
Converts each PDF page to an image using poppler-utils at 200 DPI
Sends the page images to Gemini with a detailed, examiner-grade marking prompt embedded in the request
Parses Gemini’s structured response and renders it into a formatted .docx feedback report using python-docx
Auto-downloads the report to the teacher’s machine no login, no cloud storage, no manual steps
Ships as a single Docker Compose stack that starts with one command

The marking prompt embedded in this project is pre-loaded for the OCR GCSE Economics J205/01 paper, covering Questions 21, 22 and 23 in full, including 1-2 mark questions, 6-mark Analyse questions, and 6-mark Evaluate questions with a three-part supported judgement test. The architecture is subject-agnostic: swapping in a different mark scheme is a single variable edit in app.py.

How It Works: The End-to-End Pipeline

Before diving into the code, it helps to understand the full data flow. A single marking request goes through five distinct stages:

1. Teacher uploads PDF via browser

↓

2. Flask receives file → pdftoppm converts each page to PNG at 200 DPI

↓

3. PNG images + marking prompt sent to gemini-2.5-flash as base64 inline data

↓

4. Gemini returns structured text feedback (Markdown-style)

↓

5. Flask parses feedback → python-docx renders formatted .docx → auto-download

The whole round trip for a three-page PDF typically takes 15-30 seconds, dominated by Gemini’s inference time on the handwriting images.

Project Structure

The project is deliberately minimal. Everything lives in six files:

gcse_economic_ai_examiner/
├── app.py               ← Flask app + Gemini call + docx generator + marking prompt
├── templates/
│   └── index.html       ← Single-page drag-and-drop UI
├── requirements.txt     ← Python dependencies
├── Dockerfile           ← Container definition (Python 3.11 + poppler)
├── docker-compose.yml   ← One-command start/stop
└── .env.example         ← API key template (copy to .env)

There is no database, no session state, no background worker queue. Each request is fully self-contained. The only external dependency at runtime is the Gemini API.

Prerequisites

You need three things before you start:

Docker Desktop (or Docker Engine + Compose plugin) installed and running
A free Google Gemini API key get one at aistudio.google.com → “Get API key”
Git to clone the repository

No Python installation is required on your host machine everything runs inside the container.

Step-by-Step Setup

Clone the repository

git clone https://github.com/vivekbhadra/gcse_economic_ai_examiner.git
cd gcse_economic_ai_examiner

Create your `.env` file

The app reads your Gemini API key from a .env file. Copy the template and fill it in:

cp .env.example .env
# Open .env in any editor and replace "your_api_key_here" with your real key

The .env file is not committed to Git (it is listed in .gitignore). Your key stays on your machine.

Build and start the container

docker compose up --build

This builds the Docker image (installs Python packages and poppler-utils) and starts the Flask server. The first build takes 1-2 minutes. Subsequent starts are instant.

Open the app

Navigate to http://localhost:5000 in your browser. You should see the marking interface.

Upload a student answer and mark it

Drag and drop (or click to browse) a scanned PDF of a student’s handwritten answer. Click Mark Answer. The .docx feedback report will download automatically within 15-30 seconds.

To stop the app: Press Ctrl+C in the terminal running Docker, then run docker compose down to remove the container. To rebuild after editing app.py, run docker compose up --build again.

The Marking Prompt: The Brain of the System

The most important part of the project is not the Flask server or the docx formatter it is the MARKING_PROMPT string in app.py. This is what Gemini reads to understand how to mark. Its quality directly determines the quality of the feedback.

The prompt in this project is the OCR GCSE Economics J205/01 Master Marking Prompt v3.0. It runs to approximately 300 lines and is structured in eight parts:

Prompt Architecture (v3.0)

Part 1 Identity and limitations: tells Gemini it is a standardised OCR examiner, defines the three highest risk areas (handwriting misreads, level boundary judgements, unusual responses), and frames the output as a “reliable first-pass diagnostic tool, not a substitute for human oversight”
Part 2 The full mark scheme: all marking criteria for Q21, Q22 and Q23, including indicative content bullet points for every 6-mark question and calculation working for Q23(b)
Part 3 Model answer training: worked examples showing exactly what a Level 1, Level 2 and Level 3 answer looks like, and why each level boundary was applied
Part 4 Marking protocol: a mandatory seven-step process for every answer restate question, quote student verbatim, confirm mark scheme section, complete gap analysis table, assess level, run calibration checks, self-audit
Part 5 Output format: exact schema for the feedback document (question, type, verbatim quote, gap analysis, level assessment, mark, confidence, justification, feedback, improvement sentences)
Part 6 Reliability standard: defines when marking is correct and when it is not, to guide Gemini’s self-review before outputting
Part 7 (reserved)
Part 8 Protocol for out-of-scope questions: a five-step decision tree for handling questions not in the embedded mark scheme, preventing Gemini from inventing indicative content

The key design principle is that every mark awarded or withheld must be directly traceable to a verbatim student quote and a specific mark scheme criterion. The prompt instructs Gemini to refuse marks it cannot justify in this way, and to flag borderline decisions explicitly rather than silently round up.

Swapping in a different mark scheme: Open app.py, find the MARKING_PROMPT variable, and replace the content of Parts 2 and 3 with your own mark scheme and model answers. The protocol in Parts 4-8 is generic and can stay as-is for any GCSE subject.

Deep Dive: The Flask Application (`app.py`)

The entire backend logic lives in app.py. It has three main sections: PDF-to-image conversion, Gemini API call, and docx report generation.

PDF to Images: `pdf_to_images()`

Gemini’s vision API works on images, not PDFs. We use pdftoppm (part of poppler-utils) to convert each page of the uploaded PDF to a PNG file at 200 DPI high enough for Gemini to read handwriting reliably, low enough to keep request sizes manageable:

def pdf_to_images(pdf_path: str) -> list:
    with tempfile.TemporaryDirectory() as tmpdir:
        out_prefix = os.path.join(tmpdir, "page")
        result = subprocess.run(
            ["pdftoppm", "-png", "-r", "200", pdf_path, out_prefix],
            capture_output=True, text=True
        )
        if result.returncode != 0:
            raise RuntimeError(f"pdftoppm failed: {result.stderr}")
        images = []
        for img_path in sorted(Path(tmpdir).glob("page-*.png")):
            images.append(img_path.read_bytes())
        return images

The function returns a list of raw PNG bytes one item per page. All temporary files are created inside a TemporaryDirectory context manager and cleaned up automatically when it exits.

Sending to Gemini: `mark_with_gemini()`

The Gemini API accepts a parts list that can mix images and text. We build the list by encoding each page image as base64, then append the marking prompt as the final text part:

def mark_with_gemini(image_bytes_list: list) -> str:
    model = genai.GenerativeModel("gemini-2.5-flash")
    parts = []
    for img_bytes in image_bytes_list:
        parts.append({
            "inline_data": {
                "mime_type": "image/png",
                "data": base64.b64encode(img_bytes).decode()
            }
        })
    parts.append({"text": MARKING_PROMPT})
    response = model.generate_content(parts)
    return response.text

The model used is gemini-2.5-flash the fastest Gemini model with strong vision capability. The images come first in the parts list, so Gemini sees the handwriting before it sees the instructions, which empirically produces better transcription accuracy.

Generating the Report: `create_docx_report()`

Gemini returns its feedback as structured text using Markdown conventions ## headings, |table| syntax, **bold** inline markers, and - bullet lists. The create_docx_report() function parses this line by line and translates each element into native python-docx formatting:

Gemini output	docx output
## Heading	Bold blue heading with underline border
# Heading	Smaller bold blue heading
\| table \| rows \|	Real Word table with shaded header row
– bullet point	List Bullet paragraph style
bold	Bold run within paragraph
Question: / Mark: labels	Blue bold label + normal text run
—	Horizontal rule (top border on empty paragraph)

The report opens with a styled header block and closes with a footer disclaimer reminding the teacher that AI-generated marks on borderline decisions should be verified before being returned to students.

The Flask Routes

There are only two routes. GET / serves the upload UI. POST /mark handles the full marking pipeline:

@app.route('/mark', methods=['POST'])
def mark():
    pdf_file = request.files['pdf']
    # 1. Save to temp file
    with tempfile.NamedTemporaryFile(suffix='.pdf', delete=False) as tmp:
        pdf_file.save(tmp.name)
    # 2. Convert pages to images
    images = pdf_to_images(tmp.name)
    os.unlink(tmp.name)
    # 3. Send to Gemini
    feedback = mark_with_gemini(images)
    # 4. Build .docx
    output_path = tempfile.mktemp(suffix='.docx')
    create_docx_report(feedback, output_path)
    # 5. Return as download
    return send_file(output_path, as_attachment=True,
                     download_name='marking_feedback.docx')

Flask’s send_file() with as_attachment=True triggers the browser’s download dialogue automatically. No JavaScript is needed to initiate the download on the client side the browser handles it from the response headers.

The Frontend: `index.html`

The entire UI is a single HTML file served by Flask’s render_template(). It has no JavaScript framework dependencies just vanilla JS and a small block of CSS.

Key UI behaviours:

Drag-and-drop zone with dragover / drop event listeners that accept .pdf files only
File info bar showing filename and size, with a remove button
Animated progress bar that steps through labelled stages (uploading, converting, sending to AI, generating report) while the request is in flight the actual API call is not trackable mid-flight, so the steps are time-based approximations
Status banner that shows a green success message or red error message once the request completes
Automatic download: on success, the response blob is turned into an object URL and clicked programmatically, triggering the browser download without any redirect

Docker Setup

The Dockerfile

The container uses python:3.11-slim as the base image a minimal Debian build that keeps the image small. The only system package installed is poppler-utils, which provides pdftoppm:

FROM python:3.11-slim

RUN apt-get update && apt-get install -y \
    poppler-utils \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .
COPY templates/ ./templates/

EXPOSE 5000
CMD ["python", "app.py"]

Python packages are installed before copying the application code. This order ensures Docker’s layer cache is used: rebuilding after an app.py change does not reinstall packages.

Docker Compose

The Compose file wires the container to a .env file for the API key, exposes port 5000, and mounts app.py as a volume for live reload during development:

services:
  marker-app:
    build: .
    ports:
      - "5000:5000"
    env_file:
      - .env
    volumes:
      - ./app.py:/app/app.py   # live reload during development

The volume mount: Mounting app.py as a volume means changes to the file on your host are reflected inside the running container immediately without a full rebuild. To pick up changes, just restart the container with docker compose restart rather than docker compose up --build.

Python Dependencies

Package	Version	Purpose
flask	3.0.3	Web framework routes, file upload, response handling
google-generativeai	0.7.2	Gemini API SDK model configuration and content generation
Werkzeug	3.0.3	Flask dependency WSGI utilities
python-docx	1.1.2	Word document creation paragraphs, tables, styles, borders

Confidence Levels in the Feedback Report

One of the most important features of the marking prompt is its three-tier confidence system. Every marked question in the report carries one of three ratings:

Rating	Meaning	Action required
High	Every mark traces directly to a verbatim student quote and a clear mark scheme criterion	None safe to return to student
Moderate	One borderline decision was made could reasonably go either way	Teacher verification recommended before returning mark
Low	Transcription uncertainty or unusual response Gemini flagged words as [UNCLEAR]	Teacher must verify before communicating mark to student

This matters because the alternative a tool that always returns a confident mark regardless of legibility or ambiguity is actively dangerous in an educational context. The confidence system makes the AI’s uncertainty visible rather than hiding it behind a number.

Limitations and Honest Caveats

This tool is a reliable first-pass diagnostic assistant. It is not a replacement for a trained examiner, and the marking prompt says so explicitly. Key limitations to be aware of:

Handwriting quality matters. Poor scan resolution or very unclear writing will produce more [UNCLEAR] flags and Low confidence ratings. 200 DPI is the minimum; 300 DPI produces better results for difficult handwriting.
Level boundary judgements are approximations. The Level 2 / Level 3 boundary is particularly difficult to judge consistently even for human examiners. The AI’s judgements at this boundary should be treated as indicative, not definitive.
The mark scheme is embedded, not live. The OCR J205/01 mark scheme embedded in this project was current at time of writing. Official mark schemes are updated periodically check the OCR website for the latest version before relying on this tool for formal assessments.
One paper, one model. The embedded prompt covers Q21-Q23 of J205/01 only. Questions from other papers require the mark scheme to be updated in the prompt.
API costs. Gemini Flash has a generous free tier as of writing. Very high volume usage (hundreds of papers per day) may incur costs check the current Gemini API pricing on Google AI Studio.

Extending the Project

The architecture is intentionally simple so it is easy to adapt. Some natural next steps:

Different subjects or papers: Replace the MARKING_PROMPT variable with a different mark scheme. The protocol in Parts 4-8 is generic and reusable.
Batch marking: Modify the upload UI to accept multiple PDFs and queue them for sequential processing, writing all feedback reports into a single ZIP file for download.
Class summary view: After marking a batch, extract the marks from each report and generate a summary spreadsheet showing marks by question for the whole class.
Higher DPI for difficult handwriting: Change -r 200 in the pdftoppm call to -r 300. This increases image size and slightly slows the API call, but can significantly improve transcription accuracy for poor handwriting.
Gemini model upgrade: Replace gemini-2.5-flash with gemini-2.5-pro for more thorough reasoning on complex 6-mark answers. Expect longer response times and higher API usage.
Authentication: If deploying on a school network rather than localhost, add Flask-Login or a simple API key gate to prevent unauthorised access.

Quick Reference

Command	What it does
docker compose up –build	Build image and start the app (first run or after code changes)
docker compose up	Start without rebuilding (faster, uses cached image)
docker compose down	Stop and remove the container
docker compose restart	Restart after editing app.py (uses volume mount, no rebuild needed)
docker compose logs -f	Stream container logs (useful for debugging API errors)

Troubleshooting

Error	Cause and fix
GEMINI_API_KEY not set	The .env file is missing or the key is still set to “your_api_key_here”. Check that .env exists in the project root and contains your real key.
pdftoppm failed	Should not occur inside Docker as poppler-utils is installed in the Dockerfile. If running app.py locally (outside Docker), install poppler: `brew install poppler` on Mac, `apt install poppler-utils` on Linux.
No file uploaded (400 error)	The uploaded file was not received. Check the file is a valid .pdf and under 32MB (the Flask MAX_CONTENT_LENGTH limit).
Very slow response (>60s)	Normal for a 5+ page PDF. Gemini processes each page image sequentially. Reduce the PDF to only the relevant answer pages before uploading.
Blank or garbled docx	Gemini returned an unexpected response format. Run `docker compose logs -f` to see the raw response in the Flask output and check for API error messages.

Conclusion

This project demonstrates that a genuinely useful, professionally structured AI marking tool can be built with a small amount of Python and a carefully engineered prompt. The technology stack is not exotic: Flask, a vision-capable LLM, and a document library. What makes the tool effective is the quality and structure of the marking prompt the effort invested in encoding the mark scheme, the level descriptors, the model answer examples, and the calibration checks is what separates diagnostic feedback from generic AI commentary.

If you use this for your own teaching, adapt it for a different subject, or build on it I would be glad to hear how it goes. Feel free to open an issue or pull request on the GitHub repository.

🐙

Source Code

github.com/vivekbhadra/gcse_economic_ai_examiner

Flask · Gemini 2.5 Flash · python-docx · Docker · OCR J205/01 mark scheme

View on GitHub →

Build an AI-Powered Exam Marking Tool

Building an AI-Powered Examiner: Automated GCSE Marking with Gemini and Python

Introduction

What You Will Build

How It Works: The End-to-End Pipeline

Project Structure

Prerequisites

Step-by-Step Setup

Clone the repository

Create your `.env` file

Build and start the container

Open the app

Upload a student answer and mark it

The Marking Prompt: The Brain of the System

Prompt Architecture (v3.0)

Deep Dive: The Flask Application (`app.py`)

PDF to Images: `pdf_to_images()`

Sending to Gemini: `mark_with_gemini()`

Generating the Report: `create_docx_report()`

The Flask Routes

The Frontend: `index.html`

Docker Setup

The Dockerfile

Docker Compose

Python Dependencies

Confidence Levels in the Feedback Report

Limitations and Honest Caveats

Extending the Project

Quick Reference

Troubleshooting

Conclusion

Related

Leave a ReplyCancel reply

Building an AI-Powered Examiner: Automated GCSE Marking with Gemini and Python

Introduction

What You Will Build

How It Works: The End-to-End Pipeline

Project Structure

Prerequisites

Step-by-Step Setup

Clone the repository

Create your .env file

Build and start the container

Open the app

Upload a student answer and mark it

The Marking Prompt: The Brain of the System

Prompt Architecture (v3.0)

Deep Dive: The Flask Application (app.py)

PDF to Images: pdf_to_images()

Sending to Gemini: mark_with_gemini()

Generating the Report: create_docx_report()

The Flask Routes

The Frontend: index.html

Docker Setup

The Dockerfile

Docker Compose

Python Dependencies

Confidence Levels in the Feedback Report

Limitations and Honest Caveats

Extending the Project

Quick Reference

Troubleshooting

Conclusion

Related

Leave a ReplyCancel reply

Discover more from Tech For Talk

Create your `.env` file

Deep Dive: The Flask Application (`app.py`)

PDF to Images: `pdf_to_images()`

Sending to Gemini: `mark_with_gemini()`

Generating the Report: `create_docx_report()`

The Frontend: `index.html`