M3: OLAP — Exam-Ready Notes

Dr. S. Geetha · BITS Pilani · Complete slide coverage including all diagrams

1. Relational Form vs Multidimensional Structure

1.1 Sales data in relational form (Eg 3)

A relational database stores multidimensional data as a flat table. Every unique combination of dimension values becomes one row. Notice how MODEL repeats for every color — this repetition is the cost of storing multidimensional data relationally.

Model	Color	Sales volume
Hatchback	Blue	6
Hatchback	Red	5
Hatchback	White	4
SUV	Blue	3
SUV	Red	5
SUV	White	5
Sedan	Blue	4
Sedan	Red	3
Sedan	White	2

Key concept: 2 dimensions (Model, Color) × 3 positions each = 9 rows in the relational table. The measurement (Sales Volume) is stored in each row.

1.2 The same data as a 2D multidimensional structure

The same 9 values can be arranged as a grid where each axis is a dimension and each cell holds the measurement. This is exactly what a data cube does.

COLOR → dimension Blue Red White

MODEL ↓ dimension

Hatchback SUV Sedan

6 5 4

3 5 5

4 3 2

← Measurement (Sales volume)

Positions → Blue Red White

Positions ↑ Hatchback / SUV / Sedan

Each cell = one measurement. Each axis = one dimension. The positions along each axis are the dimension’s values (Hatchback, SUV, Sedan = positions on the MODEL dimension).

2. Adding a Third Dimension (Dealer)

2.1 Why adding a dimension explodes the relational table

When the Dealer dimension (Mitra, Patel, Singh) is added, the relational table grows from 9 rows to 27 rows (3 × 3 × 3). Each combination of Model + Color + Dealer becomes one row.

Rows = dim1_positions × dim2_positions × dim3_positions = 3 × 3 × 3 = 27

A sample of the 27-row relational table (all combinations of Hatchback shown):

Model	Color	Dealership	Volume
Hatchback	Blue	Mitra	6
Hatchback	Blue	Patel	6
Hatchback	Blue	Singh	2
Hatchback	Red	Mitra	3
Hatchback	Red	Patel	5
Hatchback	Red	Singh	5
Hatchback	White	Mitra	2
Hatchback	White	Patel	4
Hatchback	White	Singh	3
… 18 more rows for SUV and Sedan …

2.2 The same data as a 3D cube

With 3 dimensions the structure becomes a physical cube. Each face/layer of the cube is one dealer’s data. The front face (Mitra’s layer) shows Hatchback-Blue = 2, Hatchback-Red = 5, Hatchback-White = 3 etc.

2 5 3 n n n n n n

Hatchback SUV Sedan MODEL

Blue Red White COLOR

Mitra Patel Singh DEALER

← depth = Dealer dimension →

Generalisation: If each dimension has n positions, the relational table needs n^k records where k = number of dimensions. With 10 positions and 3 dimensions: 10 × 10 × 10 = 1,000 rows.

Records in relational table = n^k (n = positions per dim, k = no. of dimensions)

3. Data Cube — Concepts

3.1 Anatomy of a data cube (Revision example)

The data cube below has 3 dimensions: Time (M T W Th F S S), Product (Juice, Milk, Coke, Cream, Soap, Bread), Location (Pilani, Dubai, Goa). A cell holds a measurement — e.g., “56 units of Bread sold in Pilani on Monday”.

Dimensions: Time, Product, Location
Attributes: Product (upc, price…), Location (city, region…)
Hierarchies:

Product→Brand→…

Day→Week→Quarter

City→Region→Country

Roll-up to week means collapsing the 7 daily columns into one weekly total — aggregating along the Time hierarchy from Day → Week.

Hierarchies are key: Roll-up ALWAYS goes UP a hierarchy (finer → coarser). Drill-down goes DOWN (coarser → finer).

4. OLAP — Definition & Architecture

4.1 Definition

OLAP (Online Analytical Processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. It allows analysis of database information from multiple database systems at one time.

4.2 OLAP system architecture (Source → Warehouse → Cube → User)

Data flows from multiple source databases into a central Data Warehouse. The OLAP server then processes the warehouse data to produce OLAP Cubes, which users query through analytical applications.

Stage	Component	Role
1	Source data (SQL Server, Excel, MySQL)	Original operational databases
2	Data Warehouse	Consolidated, cleaned, integrated store
3	OLAP Cube (via processing)	Pre-aggregated multidimensional structure
4	End-user tools	Analytics apps, reporting, data mining, business modelling

4.3 OLAP Cube architecture

The Data Warehouse (relational dimensional model) is loaded into the OLAP server, which creates Physical Multi-Dimensional Cubes. Users then access these cubes through:

Analytics applications (Business Objects, Cognos)
Business reporting (Dashboard Manager)
Data mining (SQL Server Mining, SAS Mining)
Business modelling

5. OLAP Operations

There are 7 operations you must know: Roll-up, Drill-down, Slice, Dice, Pivot, Drill-across, Drill-through.

Roll-up (drill-up)

Climbs up a concept hierarchy, aggregating data at a coarser level. One or more dimensions are removed from the cube.

Example: Location hierarchy = street < city < region < country. Rolling up from city to country aggregates individual city data into national totals.

Before: Chicago=440, New York=1560, Toronto=395, Vancouver=… (city level)
After: USA=2000, Canada=1000 (country level)

Drill-down (reverse of roll-up)

Steps down a concept hierarchy or introduces a new dimension. One or more dimensions are added. Navigates from less detailed → highly detailed data.

Example: Time hierarchy = day < month < quarter < year. Drilling down from quarter to month: Q1 splits into Jan, Feb, Mar etc.

Two ways to drill-down:

Step down a concept hierarchy for a dimension
Introduce a new dimension

Slice

Selects one particular value on one dimension, producing a new sub-cube with one fewer dimension.

Example: Slice on Time = “Q1” → fixes the Time dimension at Q1, producing a 2D grid (Location × Item).

Criterion: time = “Q1”
Result: sub-cube retaining Chicago=605, New York=1560, Toronto=395, Vancouver=… for Q1 only

Dice

Selects two or more dimensions and two or more values per dimension, producing a smaller sub-cube (all original dimensions retained, but restricted).

Example:

location = “Toronto” OR “Vancouver”
time = “Q1” OR “Q2”
item = “Mobile” OR “Modem”

Result: a sub-cube of 2×2×2 = 8 cells, with Chicago Q1 Mobile = 605.

Pivot (rotate)

Rotates the data axes to provide alternative presentations of the data. Like rotating a spreadsheet — rows become columns and vice versa.

Drill-across

Accesses more than one fact table linked by shared dimensions (looks across multiple cubes).

Drill-through

Goes all the way to the bottom-most level of a dimension, often reaching the raw underlying relational data. Easiest in ROLAP.

Slice fixes ONE dimension at ONE value (reduces dimensions). Dice fixes MULTIPLE dimensions at MULTIPLE values (sub-cube, same number of dimensions). Roll-up REMOVES dimensions; drill-down ADDS them.

Roll-up visual (from city → country on location dimension)

Before: city level

Chicago New York Toronto Vancouver

440 1560 395 …

Mobile Modem Phone Security item (types)

Q1 Q2 Q3 Q4

roll-up on location (cities→countries)

After: country level USA 2000 Canada 1000 Q1 Q2 Q3 Q4 Mobile Modem Phone Security

Slice visual (time = Q1 fixed)

Original cube Chicago New York Toronto Vancouver 605 825 14 400 Q1 Q2 Q3 Q4 Time (quarter) Mobile Modem Phone Security

slice for time=”Q1″

Sub-cube (Q1 only) Chicago 605 New York 1560 Toronto 395 Vancouver … Mobile Modem Phone Security

6. CUBE Operator in SQL

Aggregates facts at every level of each dimension in a given OLAP schema.
Data cubes are not geometrically “cubes” — they rarely have equal sides; most have more than 3 dimensions.
Major SQL vendors (Oracle, SQL Server, etc.) include the CUBE operator.

Typical sequence for cube computation

Identify physical sources of data
Specify logical views built upon physical sources
Build cube for specified measures and dimensions

7. Star-Net Query Model

A model for understanding multi-dimensional queries. Each dimension radiates outwards from a central fact with hierarchy levels shown as dots. Each circle on a spoke is called a footprint.

Dimension	Hierarchy levels (footprints)
Customer Orders	Contracts → Order
Shipping Method	Air-Express → Truck
Time	Annually → Qtrly → Daily
Product	Product Line → Product Item → Product Group
Location	City → District → Region → Division → Country
Promotion	(footprint)
Organization	Sales Person → Division
Customer	(footprint)

8. OLTP vs OLAP

Feature	OLTP (Online Transaction Processing)	OLAP (Online Analytical Processing)
Source of data	Operational data; OLTP is the original source	Consolidation data; comes from various OLTP DBs
Purpose	Control and run fundamental business tasks	Planning, problem solving, decision support
What data reveals	Snapshot of ongoing business processes	Multi-dimensional views of business activities
Inserts/Updates	Short, fast inserts/updates by end users	Periodic long-running batch jobs refresh data
Queries	Standardised, simple; few records returned	Often complex, involving aggregations
Processing speed	Typically very fast	Depends on data volume; batch jobs may take hours; indexes improve speed
Space requirements	Relatively small if historical data archived	Larger due to aggregation structures and history data; more indexes needed
Database design	Highly normalised with many tables	Typically de-normalised; star/snowflake schemas
Backup and recovery	Backup religiously — data loss = financial/legal risk	May simply reload OLTP data as recovery method

OLTP = day-to-day operations (transactions), normalised, fast, few rows per query. OLAP = analysis and decision support, de-normalised, complex queries, many rows aggregated.

9. OLAP Implementation Types

Overview

MOLAP — Multidimensional OLAP

Data stored in a proprietary multidimensional structure (MDDB — multidimensional database), optimised for multidimensional analysis.

+ Pro: Very fast query response — data is pre-calculated.

− Con: Practical size limit — pre-calculation time and storage space grow fast.

ROLAP — Relational OLAP

Standard relational database with a star or snowflake schema (multidimensional model on top of RDBMS).

+ Pro: More scalable solution; no data size limit beyond the RDBMS limit.

− Con: Query performance limited by SQL complexity and join sizes.

HOLAP — Hybrid OLAP

Mix of MOLAP and ROLAP. Higher levels implemented as MOLAP, lower levels as ROLAP.

Large amounts of data in RDBMS; frequently accessed summary data in multidimensional system.

DOLAP — Desktop OLAP

Server-side OLAP downloaded to the desktop. Users quickly pull small cubes onto their laptops/desktops.

9.1 MOLAP — detailed

Architecture: RDBMS Server → Data Warehouse → MOLAP engine creates and stores Data Cubes → MDDBs (proprietary store) → Presentation Layer → Client Desktop.

MOLAP advantages	MOLAP disadvantages
Excellent performance — cubes built for fast retrieval, optimal for slice/dice	Limited data volume — all calculations pre-generated at cube build time; only summary-level data fits in the cube
Complex calculations achievable and fast — pre-generated at build time	Requires additional investment — cube technology is often proprietary; human and capital resources needed

9.2 ROLAP — detailed

Architecture: Data Warehouse → RDBMS Server → ROLAP engine (uses complex SQL, dynamically creates data cubes) → cloud/virtual cubes → Client Desktop.

ROLAP advantages	ROLAP disadvantages
Can handle large amounts of data — no limitation beyond the underlying RDBMS	Performance can be slow — each report = one or more SQL queries; slow on large data
Leverages all functionalities of the relational database	Limited by SQL functionalities — complex calculations are difficult; vendors add built-in functions to mitigate

9.3 HOLAP — detailed

Combines standardisation and large-data capability of ROLAP with query speed of MOLAP.
Largest data volumes stored in RDBMS.
Frequently accessed data stored in multidimensional system.
If multidimensional store can’t answer a query, system transparently accesses the RDBMS layer.

9.4 Side-by-side architecture comparison (MOLAP vs ROLAP)

In MOLAP: OLAP engine sits on a special server with an MDDB storing data as multidimensional hypercubes. In ROLAP: OLAP services dynamically create cubes from a conventional RDBMS + Data Warehouse via complex SQL — no proprietary MDDB required.

Aspect	MOLAP	ROLAP
Data storage	Relational tables in warehouse + various summaries in proprietary MDDBs. Moderate volumes. Summary data from MDB, detail from warehouse.	Relational tables in warehouse only. Detailed and light summary data. Very large volumes. All access from warehouse.
Technology	Pre-fabricated cubes by MOLAP engine. Proprietary multidimensional arrays. High-speed matrix retrieval. Sparse matrix technology.	Complex SQL from warehouse. ROLAP engine creates cubes on the fly. Multidimensional views by presentation layer.
Functions & features	Faster access. Large library for complex calculations. Easy analysis across any number of dimensions. Extensive drill-down & slice-dice.	Known environment, many tools available. Limitations on complex analysis. Drill-through to lowest level easier. Drill-across not always easy.
Cube browsing speed	Fastest — even without aggregations, compressed multidimensional format is quicker than RDBMS	Very slow
Processing time	Faster at query time (pre-built)	Slower, especially at higher aggregation levels
Storage space	More space — data is copied; at low aggregation levels takes more room than ROLAP	Almost no extra space — data not duplicated; but ROLAP aggregations take more space than MOLAP/HOLAP aggregations
Data availability offline	All data in cube; can be viewed without original source being available	Cannot view data unless connected to data source
Data volume limit	Very limited — all data must fit in cube	No limit beyond RDBMS limit

HOLAP = best of both: MOLAP speed for summaries + ROLAP scalability for detail. DOLAP = server OLAP pulled to the desktop for offline use.

10. Browsing a Data Cube

Data cube browsing combines three capabilities:

Visualisation — graphical display of multidimensional data
OLAP capabilities — roll-up, drill-down, slice, dice, pivot
Interactive manipulation — user-driven exploration

11. Revision Question (5 marks)

Question: A call centre supports customers who purchased refrigerators. Management wants to know:

Which models require maximum support?
Which staff respond to most calls?
Which service centres give unsatisfactory after-sales service?

Design a Data Warehouse that helps management. (5 marks)

Model answer

Grain (1 mark): Individual call attended by support staff. Measure = call duration OR call count.

Dimensions (2 marks): Product (refrigerator model), Customer, Time, Support Staff, Service Centre.

Schema (2 marks): Star schema with a central fact table (Calls) connected to each dimension table. Attributes should include fields that directly answer the three management queries:

Product dimension → model, type, features (answers Q1)
Support Staff dimension → staff ID, name, team (answers Q2)
Service Centre dimension → centre ID, location, region (answers Q3)
Customer dimension → customer ID, purchase date, location
Time dimension → date, week, month, quarter, year

Always state the grain first. Dimensions determine what slicing/dicing is possible. Schema diagram should show FK relationships from fact table to each dimension.

Notes compiled from BITS Pilani presentation · Dr. S. Geetha · All slides covered