M3: OLAP – Exam Notes

A relational database stores multidimensional data as a flat table. Every unique combination of dimension values becomes one row. Notice how MODEL repeats for every color — this repetition is the cost of storing multidimensional data relationally.

1.2 The same data as a 2D multidimensional structure

Model	Color	Sales volume
Hatchback	Blue	6
Hatchback	Red	5
Hatchback	White	4
SUV	Blue	3
SUV	Red	5
SUV	White	5
Sedan	Blue	4
Sedan	Red	3
Sedan	White	2

The same 9 values can be arranged as a grid where each axis is a dimension and each cell holds the measurement. This is exactly what a data cube does.

2. Adding a Third Dimension (Dealer)

2.1 Why adding a dimension explodes the relational table

When the Dealer dimension (Mitra, Patel, Singh) is added, the relational table grows from 9 rows to 27 rows (3 × 3 × 3). Each combination of Model + Color + Dealer becomes one row.

2.2 The same data as a 3D cube

With 3 dimensions the structure becomes a physical cube. Each face/layer of the cube is one dealer's data. The front face (Mitra's layer) shows Hatchback-Blue = 2, Hatchback-Red = 5, Hatchback-White = 3 etc.

3. Data Cube — Concepts

3.1 Anatomy of a data cube (Revision example)

Model	Color	Dealership	Volume
Hatchback	Blue	Mitra	6
Hatchback	Blue	Patel	6
Hatchback	Blue	Singh	2
Hatchback	Red	Mitra	3
Hatchback	Red	Patel	5
Hatchback	Red	Singh	5
Hatchback	White	Mitra	2
Hatchback	White	Patel	4
Hatchback	White	Singh	3
… 18 more rows for SUV and Sedan …

The data cube below has 3 dimensions: Time (M T W Th F S S), Product (Juice, Milk, Coke, Cream, Soap, Bread), Location (Pilani, Dubai, Goa). A cell holds a measurement — e.g., "56 units of Bread sold in Pilani on Monday".

Roll-up to week means collapsing the 7 daily columns into one weekly total — aggregating along the Time hierarchy from Day → Week.

4. OLAP — Definition & Architecture

4.1 Definition

4.2 OLAP system architecture (Source → Warehouse → Cube → User)

Data flows from multiple source databases into a central Data Warehouse. The OLAP server then processes the warehouse data to produce OLAP Cubes, which users query through analytical applications.

4.3 OLAP Cube architecture

Stage	Component	Role
1	Source data (SQL Server, Excel, MySQL)	Original operational databases
2	Data Warehouse	Consolidated, cleaned, integrated store
3	OLAP Cube (via processing)	Pre-aggregated multidimensional structure
4	End-user tools	Analytics apps, reporting, data mining, business modelling

The Data Warehouse (relational dimensional model) is loaded into the OLAP server, which creates Physical Multi-Dimensional Cubes. Users then access these cubes through:

5. OLAP Operations

There are 7 operations you must know: Roll-up, Drill-down, Slice, Dice, Pivot, Drill-across, Drill-through.

Roll-up (drill-up)

Climbs up a concept hierarchy, aggregating data at a coarser level. One or more dimensions are removed from the cube.

Example: Location hierarchy = street < city < region < country. Rolling up from city to country aggregates individual city data into national totals.

Before: Chicago=440, New York=1560, Toronto=395, Vancouver=… (city level)
After: USA=2000, Canada=1000 (country level)

Drill-down (reverse of roll-up)

Steps down a concept hierarchy or introduces a new dimension. One or more dimensions are added. Navigates from less detailed → highly detailed data.

Example: Time hierarchy = day < month < quarter < year. Drilling down from quarter to month: Q1 splits into Jan, Feb, Mar etc.

Two ways to drill-down:

Step down a concept hierarchy for a dimension
Introduce a new dimension

Slice

Selects one particular value on one dimension, producing a new sub-cube with one fewer dimension.

Example: Slice on Time = "Q1" → fixes the Time dimension at Q1, producing a 2D grid (Location × Item).

Criterion: time = "Q1"
Result: sub-cube retaining Chicago=605, New York=1560, Toronto=395, Vancouver=… for Q1 only

Dice

Selects two or more dimensions and two or more values per dimension, producing a smaller sub-cube (all original dimensions retained, but restricted).

Example:

location = "Toronto" OR "Vancouver"
time = "Q1" OR "Q2"
item = "Mobile" OR "Modem"

Result: a sub-cube of 2×2×2 = 8 cells, with Chicago Q1 Mobile = 605.

Pivot (rotate)

Rotates the data axes to provide alternative presentations of the data. Like rotating a spreadsheet — rows become columns and vice versa.

Drill-across

Accesses more than one fact table linked by shared dimensions (looks across multiple cubes).

Drill-through

Goes all the way to the bottom-most level of a dimension, often reaching the raw underlying relational data. Easiest in ROLAP.

Roll-up visual (from city → country on location dimension)

Slice visual (time = Q1 fixed)

6. CUBE Operator in SQL

Typical sequence for cube computation

7. Star-Net Query Model

A model for understanding multi-dimensional queries. Each dimension radiates outwards from a central fact with hierarchy levels shown as dots. Each circle on a spoke is called a footprint.

8. OLTP vs OLAP

9. OLAP Implementation Types

Overview

Dimension	Hierarchy levels (footprints)
Customer Orders	Contracts → Order
Shipping Method	Air-Express → Truck
Time	Annually → Qtrly → Daily
Product	Product Line → Product Item → Product Group
Location	City → District → Region → Division → Country
Promotion	(footprint)
Organization	Sales Person → Division
Customer	(footprint)

Feature	OLTP (Online Transaction Processing)	OLAP (Online Analytical Processing)
Source of data	Operational data; OLTP is the original source	Consolidation data; comes from various OLTP DBs
Purpose	Control and run fundamental business tasks	Planning, problem solving, decision support
What data reveals	Snapshot of ongoing business processes	Multi-dimensional views of business activities
Inserts/Updates	Short, fast inserts/updates by end users	Periodic long-running batch jobs refresh data
Queries	Standardised, simple; few records returned	Often complex, involving aggregations
Processing speed	Typically very fast	Depends on data volume; batch jobs may take hours; indexes improve speed
Space requirements	Relatively small if historical data archived	Larger due to aggregation structures and history data; more indexes needed
Database design	Highly normalised with many tables	Typically de-normalised; star/snowflake schemas
Backup and recovery	Backup religiously — data loss = financial/legal risk	May simply reload OLTP data as recovery method

MOLAP — Multidimensional OLAP

Data stored in a proprietary multidimensional structure (MDDB — multidimensional database), optimised for multidimensional analysis.

+ Pro: Very fast query response — data is pre-calculated.

− Con: Practical size limit — pre-calculation time and storage space grow fast.

ROLAP — Relational OLAP

Standard relational database with a star or snowflake schema (multidimensional model on top of RDBMS).

+ Pro: More scalable solution; no data size limit beyond the RDBMS limit.

− Con: Query performance limited by SQL complexity and join sizes.

HOLAP — Hybrid OLAP

Mix of MOLAP and ROLAP. Higher levels implemented as MOLAP, lower levels as ROLAP.

Large amounts of data in RDBMS; frequently accessed summary data in multidimensional system.

DOLAP — Desktop OLAP

Server-side OLAP downloaded to the desktop. Users quickly pull small cubes onto their laptops/desktops.

9.1 MOLAP — detailed

Architecture: RDBMS Server → Data Warehouse → MOLAP engine creates and stores Data Cubes → MDDBs (proprietary store) → Presentation Layer → Client Desktop.

9.2 ROLAP — detailed

MOLAP advantages	MOLAP disadvantages
Excellent performance — cubes built for fast retrieval, optimal for slice/dice	Limited data volume — all calculations pre-generated at cube build time; only summary-level data fits in the cube
Complex calculations achievable and fast — pre-generated at build time	Requires additional investment — cube technology is often proprietary; human and capital resources needed

Architecture: Data Warehouse → RDBMS Server → ROLAP engine (uses complex SQL, dynamically creates data cubes) → cloud/virtual cubes → Client Desktop.

9.3 HOLAP — detailed

9.4 Side-by-side architecture comparison (MOLAP vs ROLAP)

In MOLAP: OLAP engine sits on a special server with an MDDB storing data as multidimensional hypercubes. In ROLAP: OLAP services dynamically create cubes from a conventional RDBMS + Data Warehouse via complex SQL — no proprietary MDDB required.

10. Browsing a Data Cube

11. Revision Question (5 marks)

Model answer

ROLAP advantages	ROLAP disadvantages
Can handle large amounts of data — no limitation beyond the underlying RDBMS	Performance can be slow — each report = one or more SQL queries; slow on large data
Leverages all functionalities of the relational database	Limited by SQL functionalities — complex calculations are difficult; vendors add built-in functions to mitigate

Aspect	MOLAP	ROLAP
Data storage	Relational tables in warehouse + various summaries in proprietary MDDBs. Moderate volumes. Summary data from MDB, detail from warehouse.	Relational tables in warehouse only. Detailed and light summary data. Very large volumes. All access from warehouse.
Technology	Pre-fabricated cubes by MOLAP engine. Proprietary multidimensional arrays. High-speed matrix retrieval. Sparse matrix technology.	Complex SQL from warehouse. ROLAP engine creates cubes on the fly. Multidimensional views by presentation layer.
Functions & features	Faster access. Large library for complex calculations. Easy analysis across any number of dimensions. Extensive drill-down & slice-dice.	Known environment, many tools available. Limitations on complex analysis. Drill-through to lowest level easier. Drill-across not always easy.
Cube browsing speed	Fastest — even without aggregations, compressed multidimensional format is quicker than RDBMS	Very slow
Processing time	Faster at query time (pre-built)	Slower, especially at higher aggregation levels
Storage space	More space — data is copied; at low aggregation levels takes more room than ROLAP	Almost no extra space — data not duplicated; but ROLAP aggregations take more space than MOLAP/HOLAP aggregations
Data availability offline	All data in cube; can be viewed without original source being available	Cannot view data unless connected to data source
Data volume limit	Very limited — all data must fit in cube	No limit beyond RDBMS limit

Grain (1 mark): Individual call attended by support staff. Measure = call duration OR call count.

Dimensions (2 marks): Product (refrigerator model), Customer, Time, Support Staff, Service Centre.

Schema (2 marks): Star schema with a central fact table (Calls) connected to each dimension table. Attributes should include fields that directly answer the three management queries:

Notes compiled from BITS Pilani presentation · Dr. S. Geetha · All slides covered

M3: OLAP — Exam-Ready Notes

1. Relational Form vs Multidimensional Structure

1.1 Sales data in relational form (Eg 3)