Datawarehouse Past Question Paper and Sample Answers

Model Answers with Explanations

Q1(A) – Why must an OLTP system have a Normalized Schema? [5 Marks]

The Core Idea

OLTP = Online Transaction Processing. Think of it as the system your bank uses every time you swipe your card. It handles thousands of tiny, real-time transactions every second – inserts, updates, deletes.

Normalization means removing redundancy: each piece of data is stored in exactly ONE place.

An OLTP system is designed to process day-to-day business transactions such as insert, update, delete, and short retrieval operations. Because of this, it must use a highly normalized schema to reduce data redundancy and maintain data consistency.

In a normalized structure, the same piece of information is stored only once. This prevents update anomalies, insertion anomalies, and deletion anomalies. This is very important in transactional systems because the data changes frequently and must always remain accurate.

A normalized schema also improves integrity by enforcing relationships through keys and constraints. OLTP applications handle many concurrent users performing transactions. A compact and structured schema helps maintain correctness. It also enhances efficiency in transaction processing. It also reduces storage wastage because duplicate data is minimized.

In contrast, a data warehouse is designed mainly for analysis and querying rather than frequent updates. Therefore, it is often denormalised into star or snowflake schema. This allows complex analytical queries to run faster and more easily. So, OLTP requires normalization for operational efficiency and integrity, whereas DWH prefers denormalisation for analytical performance.

5 Key Reasons for Normalization

1. Avoid Data Anomalies

Insertion Anomaly: Without normalization, you can’t add a new customer unless they have placed an order in the past. Insertion anomaly means the table structure forces you to enter unrelated data together. You cannot store a new customer’s details unless that customer has already placed an order. The table is designed to save customer information and order information in the same row. Even if you want to add a customer now, you must wait because the database requires an existing order.
Update Anomaly: A customer’s city stored in 1000 rows – change one, risk mismatch.
Deletion Anomaly: Deleting the last order could accidentally delete the customer record.

2. Data Integrity

Each fact lives in one place. When a salary is updated, it updates everywhere automatically – no stale copies floating around.

3. Efficient Writes (INSERT/UPDATE/DELETE)

OLTP is write-heavy. Normalized tables are smaller, so writes touch fewer rows and fewer pages – faster!

4. Reduces Storage

No duplicate data = smaller disk usage. Huge benefit when you’re storing millions of daily transactions.

5. Supports Concurrent Users

Normalization leads to finer-grained locking. Multiple users can update different rows of different tables simultaneously without stepping on each other’s toes.

Q2(A) – Why do companies deploy Data Warehousing systems? [2 Marks]

Companies use Data Warehouses (DWH) to answer strategic questions that OLTP systems can’t handle efficiently:

Historical Analysis: OLTP only keeps current state. DWH stores years of data for trend analysis.
Fast Reporting: Running a complex sales report on OLTP would slow down live transactions. DWH is optimized for heavy reads.
Unified View: Data comes from multiple sources (CRM, ERP, website) – DWH integrates it all into one clean dataset.
Decision Making: Managers use it for dashboards, KPIs, and forecasts – not possible with raw OLTP data.

💡 Memory Hook: OLTP asks ‘What happened just now?’ – DWH asks ‘What has been happening over the past 5 years?’

Q2(B) – What is a Staging Area and its Purpose? [3 Marks]

What is it?

The Staging Area is a temporary storage zone between the source systems (OLTP databases) and the Data Warehouse. Think of it as a ‘holding room’ or ‘kitchen prep counter’ before data is served to the warehouse.

Purpose / Why it matters:

Data Extraction Without Impact: Data is first pulled from source OLTP systems into the staging area. This way, the OLTP is not burdened by complex transformation queries.
Data Cleansing: Here, dirty, inconsistent, or duplicate data gets cleaned. E.g., ‘M’/’Male’/’m’ → standardized to ‘Male’.
Data Transformation: Format changes happen here – date formats, currency conversions, code lookups.
Data Integration: Multiple source systems are combined here before loading into the DWH.
Auditability & Recovery: If the DWH load fails, staging data can be reused without re-extracting from source.

💡 Analogy: Think of a staging area as an airport transit lounge. Passengers (data) wait here before they continue to their final destination (DWH).

Q3 – Hospital Data Warehouse (Star Schema + OLAP) [5 Marks]

(a) Star Schema Diagram – Described

3. Suppose that a hospital data warehouse consists of the three dimensions time, doctor, and patient and a measure for fees charged for a visit?

(a) Draw a schema diagram for the above data warehouse. [2marks]

(b) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? [3marks]

How to build the STAR schema

For a question like this, build the star schema in three simple steps.

First, identify the measure.
Here, the measure is fees charged for a visit. The measure always goes into the fact table.

Second, identify the dimensions given in the question.
Here, the dimensions are time, doctor, and patient. Each of these becomes a separate dimension table.

Third, connect all dimension tables to the central fact table.
So the fact table will contain the foreign keys of all dimensions plus the measure.

Visit_Fact

Time_ID
Doctor_ID
Patient_ID
Fees_Charged

Then create the dimension tables:

Time_Dim with attributes like Date, Month, Quarter, Year
Doctor_Dim with attributes like Doctor_ID, Doctor_Name, Specialisation
Patient_Dim with attributes like Patient_ID, Patient_Name, Gender, Age

So the rule is:

Dimensions become surrounding dimension tables
Numeric value to be analyzed becomes the fact table measure
Fact table stays in the center and links to all dimensions

That is the reason for its name, star schema. One central fact table sits with dimension tables around it, resembling a star.

For this question, the final idea is:

Visit_Fact(Time_ID, Doctor_ID, Patient_ID, Fees_Charged)
connected to
Time_Dim, Doctor_Dim, and Patient_Dim.

Sample Snowflake Schema Question

The hospital data warehouse includes three dimensions: time, doctor, and patient. It also measures the fees charged for a visit. If the dimensions are further normalized such that time is divided into date, month, quarter, and year, and doctor is divided into doctor, specialization, and department, then:

(a) Draw a snowflake schema diagram for the above data warehouse.
(b) Identify the fact table, dimension tables, and the measure.

For this hospital data warehouse, the fact table is:

Visit_Fact(Time_ID, Doctor_ID, Patient_ID, Fees_Charged)

Fees_Charged is the measure. Time_ID, Doctor_ID, and Patient_ID are foreign keys. They link the fact table to the dimensions. The correct main dimension tables are:

Date_Dim(Time_ID, Date, Month_ID)
Doctor_Dim(Doctor_ID, Doctor_Name, Specialization_ID, Department_ID)
Patient_Dim(Patient_ID, Patient_Name, Gender, Age)

These are the tables that should connect directly to Visit_Fact.

Then the normalized branches should be shown as follows:

Date_Dim -> Month_Dim(Month_ID, Month, Quarter_ID)
Month_Dim -> Quarter_Dim(Quarter_ID, Quarter, Year_ID)
Quarter_Dim -> Year_Dim(Year_ID, Year)

and

Doctor_Dim -> Specialisation_Dim(Specialisation_ID, Specialisation_Name)
Doctor_Dim -> Department_Dim(Department_ID, Department_Name)

The Patient_Dim table can remain as a single dimension table. The question does not require it to be further normalized.

So the correct structure is:

Visit_Fact -> Date_Dim -> Month_Dim -> Quarter_Dim -> Year_Dim
Visit_Fact -> Doctor_Dim -> Specialisation_Dim
Visit_Fact -> Doctor_Dim -> Department_Dim
Visit_Fact -> Patient_Dim

(b) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004?

OLAP Operations to list total fee per doctor in 2004

Starting from base cuboid: [day, doctor, patient]

Step 1: Roll-Up on Time

Roll up from ‘day’ to ‘year’. This collapses daily data into annual totals.

Result: [year, doctor, patient]

Step 2: Slice on Time = 2004

Fix the year dimension to 2004. This filters only the year 2004 records.

Result: [doctor, patient] for year = 2004

Step 3: Roll-Up on Patient (Drill Across / Aggregate)

Sum the fees across all patients for each doctor. This aggregates patient dimension away.

Final Result: [doctor] → total_fees for 2004

💡 Memory Hook: Roll Up = Zooming Out (day → year). Slice = Applying a filter. Aggregate = Summing up values.

Q4 – Slowly Changing Dimensions (SCD) [5 Marks]

Initial Dimension Table Structure

Surrogate Key	Emp_ID	Name	Salary	Designation	Joining_Date	Office_Location	Start_Date	End_Date	Is_Current
1	E101	Alice	10,000	Developer	2020-01-01	Mumbai	2020-01-01	NULL	Y

Change (A): Alice’s Office Location changes from Mumbai to Delhi

Type: SCD Type 1 – Overwrite

Why? Office location is not historically significant. Companies don’t need to track ‘where was the employee 3 years ago.’

How? Simply UPDATE the existing row. No new row is added.

Surrogate Key	Emp_ID	Name	Office_Location	Note
BEFORE	E101	Alice	Mumbai
AFTER	E101	Alice	Delhi	← Row overwritten

Change (B): Alice gets promoted from Developer to Lead

Type: SCD Type 2 – Add New Row

Why? Promotion is historically important. HR needs to know: ‘What was Alice’s role when she worked on Project X in 2021?’ The old row must be preserved.

How? Mark the old row as expired (Is_Current = N, set End_Date). Insert a new row with new designation and updated salary.

Before and After (showing salary: Developer=10k, Lead=40k):

Surr_Key	Emp_ID	Name	Salary	Designation	Start_Date	End_Date	Is_Current
1	E101	Alice	10,000	Developer	2020-01-01	2025-09-19	N
2	E101	Alice	40,000	Lead	2025-09-20	NULL	Y

💡 Memory Hook: SCD Type 1 = Forget the past (overwrite). SCD Type 2 = Remember the past (add a new row with versioning).

Q5 – Distinguish Between Concepts [5 Marks]

(a) OLTP vs OLAP [1 Mark]

Aspect	OLTP	OLAP
Purpose	Day-to-day operations	Business analysis & reporting
Query Type	Simple: INSERT, UPDATE, DELETE	Complex: GROUP BY, aggregations
Data Volume	Current data (MB-GB)	Historical data (GB-TB)
Users	Clerks, customers, employees	Analysts, managers, executives
Schema	Normalized (3NF)	Denormalized (Star/Snowflake)

(b) Data Warehouse (DWH) vs Data Mart (DM) [2 Marks]

Aspect	Data Warehouse	Data Mart
Scope	Enterprise-wide, all departments	Subject-specific (e.g., Sales only)
Size	Very large (TB-PB)	Smaller subset
Users	All business analysts	Specific department (e.g., Finance team)
Build Time	Long (months)	Faster (weeks)
Example	Central corporate DWH	Sales DM, HR DM, Finance DM

💡 Analogy: DWH = A large central library. DM = A specialized section within the library (e.g., just the Science section).

(c) ROLAP vs MOLAP [2 Marks]

Aspect	ROLAP (Relational OLAP)	MOLAP (Multidimensional OLAP)
Storage	Relational DB (SQL tables)	Multidimensional cube (array)
Query Speed	Slower (SQL joins at runtime)	Very fast (pre-aggregated cubes)
Scalability	High – handles huge datasets	Limited – cube size can explode
Flexibility	More flexible SQL queries	Less flexible but faster
Example	Oracle BI, Teradata	Microsoft SSAS, IBM TM1

💡 Memory Hook: ROLAP = Real-time SQL. MOLAP = Pre-baked cubes for speed.

Q6 – Role of Metadata in DWH and its Types [5 Marks]

What is Metadata?

Metadata is ‘data about data.’ In a Data Warehouse, it acts as the central nervous system. Without it, no one knows what data exists. People don’t know where it came from or how to use it.

Analogy: Think of metadata as the index and labels in a library. Without them, you have thousands of books (data) but no way to find or understand them.

Role of Metadata in DWH:

Guides the ETL process – tells the system what to extract, how to transform, and where to load.
Helps business users understand the meaning of data – what does ‘Total Revenue’ actually include?
Enables data lineage tracking – where did this number come from?
Supports query optimization – tells the query engine about indexes, partitions, and structure.
Facilitates data governance and auditing.

Types of Metadata:

Type	What it Contains	Example
Technical Metadata	Physical structure of DWH objects	Table names, column names, data types, indexes, ETL schedules
Business Metadata	Business meaning and rules	‘Revenue’ = Gross Sales minus Returns; KPI definitions
Operational Metadata	Runtime info – how and when processes ran	ETL job start/end time, rows processed, errors logged
Process Metadata	Tracks data transformation steps	Source-to-target mappings, transformation logic used in ETL

💡 Memory Hook: Technical = What it is. Business = What it means. Operational = What happened when. Process = How it got here.

Metadata is typically stored in a Metadata Repository and accessed by BI tools and ETL engines alike. Without it, a DWH is essentially a black box.

Prepared as intuitive model answers for CC ZG515 – Data Warehousing | BITS Pilani

Datawarehouse Past Question Paper and Sample Answers

Q1(A) – Why must an OLTP system have a Normalized Schema? [5 Marks]

Q2(A) – Why do companies deploy Data Warehousing systems? [2 Marks]

Q2(B) – What is a Staging Area and its Purpose? [3 Marks]

Q3 – Hospital Data Warehouse (Star Schema + OLAP) [5 Marks]

How to build the STAR schema

Sample Snowflake Schema Question

OLAP Operations to list total fee per doctor in 2004

Step 1: Roll-Up on Time

Step 2: Slice on Time = 2004

Step 3: Roll-Up on Patient (Drill Across / Aggregate)

Q4 – Slowly Changing Dimensions (SCD) [5 Marks]

Change (A): Alice’s Office Location changes from Mumbai to Delhi

Change (B): Alice gets promoted from Developer to Lead

Q5 – Distinguish Between Concepts [5 Marks]

Q6 – Role of Metadata in DWH and its Types [5 Marks]

Related

1 Comment

Leave a ReplyCancel reply

Q1(A) – Why must an OLTP system have a Normalized Schema? [5 Marks]

Q2(A) – Why do companies deploy Data Warehousing systems? [2 Marks]

Q2(B) – What is a Staging Area and its Purpose? [3 Marks]

Q3 – Hospital Data Warehouse (Star Schema + OLAP) [5 Marks]

How to build the STAR schema

Sample Snowflake Schema Question

OLAP Operations to list total fee per doctor in 2004

Step 1: Roll-Up on Time

Step 2: Slice on Time = 2004

Step 3: Roll-Up on Patient (Drill Across / Aggregate)

Q4 – Slowly Changing Dimensions (SCD) [5 Marks]

Change (A): Alice’s Office Location changes from Mumbai to Delhi

Change (B): Alice gets promoted from Developer to Lead

Q5 – Distinguish Between Concepts [5 Marks]

Q6 – Role of Metadata in DWH and its Types [5 Marks]

Related

1 Comment

Leave a ReplyCancel reply

Discover more from Tech For Talk