M3: OLAP — Exam-Ready Notes

Dr. S. Geetha · BITS Pilani · Complete slide coverage including all diagrams

1. Relational Form vs Multidimensional Structure

1.1 Sales data in relational form (Eg 3)

A relational database stores multidimensional data as a flat table. Every unique combination of dimension values becomes one row. Notice how MODEL repeats for every color — this repetition is the cost of storing multidimensional data relationally.

Model Color Sales volume
Hatchback Blue 6
Hatchback Red 5
Hatchback White 4
SUV Blue 3
SUV Red 5
SUV White 5
Sedan Blue 4
Sedan Red 3
Sedan White 2
Key concept: 2 dimensions (Model, Color) × 3 positions each = 9 rows in the relational table. The measurement (Sales Volume) is stored in each row.

1.2 The same data as a 2D multidimensional structure

The same 9 values can be arranged as a grid where each axis is a dimension and each cell holds the measurement. This is exactly what a data cube does.

COLOR → dimension Blue Red White

MODEL dimension

Hatchback SUV Sedan

6 5 4

3 5 5

4 3 2

← Measurement (Sales volume)

Positions → Blue Red White

Positions ↑ Hatchback / SUV / Sedan

Each cell = one measurement. Each axis = one dimension. The positions along each axis are the dimension’s values (Hatchback, SUV, Sedan = positions on the MODEL dimension).

2. Adding a Third Dimension (Dealer)

2.1 Why adding a dimension explodes the relational table

When the Dealer dimension (Mitra, Patel, Singh) is added, the relational table grows from 9 rows to 27 rows (3 × 3 × 3). Each combination of Model + Color + Dealer becomes one row.

Rows = dim1_positions × dim2_positions × dim3_positions = 3 × 3 × 3 = 27

A sample of the 27-row relational table (all combinations of Hatchback shown):

Model Color Dealership Volume
Hatchback Blue Mitra 6
Hatchback Blue Patel 6
Hatchback Blue Singh 2
Hatchback Red Mitra 3
Hatchback Red Patel 5
Hatchback Red Singh 5
Hatchback White Mitra 2
Hatchback White Patel 4
Hatchback White Singh 3
… 18 more rows for SUV and Sedan …

2.2 The same data as a 3D cube

With 3 dimensions the structure becomes a physical cube. Each face/layer of the cube is one dealer’s data. The front face (Mitra’s layer) shows Hatchback-Blue = 2, Hatchback-Red = 5, Hatchback-White = 3 etc.

2 5 3 n n n n n n

Hatchback SUV Sedan MODEL

Blue Red White COLOR

Mitra Patel Singh DEALER

← depth = Dealer dimension →

Generalisation: If each dimension has n positions, the relational table needs nk records where k = number of dimensions. With 10 positions and 3 dimensions: 10 × 10 × 10 = 1,000 rows.
Records in relational table = nk  (n = positions per dim, k = no. of dimensions)

3. Data Cube — Concepts

3.1 Anatomy of a data cube (Revision example)

The data cube below has 3 dimensions: Time (M T W Th F S S), Product (Juice, Milk, Coke, Cream, Soap, Bread), Location (Pilani, Dubai, Goa). A cell holds a measurement — e.g., “56 units of Bread sold in Pilani on Monday”.

Dimensions: Time, Product, Location
Attributes: Product (upc, price…), Location (city, region…)
Hierarchies:

ProductBrand
DayWeekQuarter
CityRegionCountry

Roll-up to week means collapsing the 7 daily columns into one weekly total — aggregating along the Time hierarchy from Day → Week.

Hierarchies are key: Roll-up ALWAYS goes UP a hierarchy (finer → coarser). Drill-down goes DOWN (coarser → finer).

4. OLAP — Definition & Architecture

4.1 Definition

OLAP (Online Analytical Processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. It allows analysis of database information from multiple database systems at one time.

4.2 OLAP system architecture (Source → Warehouse → Cube → User)

Data flows from multiple source databases into a central Data Warehouse. The OLAP server then processes the warehouse data to produce OLAP Cubes, which users query through analytical applications.

Stage Component Role
1 Source data (SQL Server, Excel, MySQL) Original operational databases
2 Data Warehouse Consolidated, cleaned, integrated store
3 OLAP Cube (via processing) Pre-aggregated multidimensional structure
4 End-user tools Analytics apps, reporting, data mining, business modelling

4.3 OLAP Cube architecture

The Data Warehouse (relational dimensional model) is loaded into the OLAP server, which creates Physical Multi-Dimensional Cubes. Users then access these cubes through:

  • Analytics applications (Business Objects, Cognos)
  • Business reporting (Dashboard Manager)
  • Data mining (SQL Server Mining, SAS Mining)
  • Business modelling

5. OLAP Operations

There are 7 operations you must know: Roll-up, Drill-down, Slice, Dice, Pivot, Drill-across, Drill-through.

Roll-up (drill-up)

Climbs up a concept hierarchy, aggregating data at a coarser level. One or more dimensions are removed from the cube.

Example: Location hierarchy = street < city < region < country. Rolling up from city to country aggregates individual city data into national totals.

  • Before: Chicago=440, New York=1560, Toronto=395, Vancouver=… (city level)
  • After: USA=2000, Canada=1000 (country level)

Drill-down (reverse of roll-up)

Steps down a concept hierarchy or introduces a new dimension. One or more dimensions are added. Navigates from less detailed → highly detailed data.

Example: Time hierarchy = day < month < quarter < year. Drilling down from quarter to month: Q1 splits into Jan, Feb, Mar etc.

Two ways to drill-down:

  • Step down a concept hierarchy for a dimension
  • Introduce a new dimension

Slice

Selects one particular value on one dimension, producing a new sub-cube with one fewer dimension.

Example: Slice on Time = “Q1” → fixes the Time dimension at Q1, producing a 2D grid (Location × Item).

  • Criterion: time = “Q1”
  • Result: sub-cube retaining Chicago=605, New York=1560, Toronto=395, Vancouver=… for Q1 only

Dice

Selects two or more dimensions and two or more values per dimension, producing a smaller sub-cube (all original dimensions retained, but restricted).

Example:

  • location = “Toronto” OR “Vancouver”
  • time = “Q1” OR “Q2”
  • item = “Mobile” OR “Modem”

Result: a sub-cube of 2×2×2 = 8 cells, with Chicago Q1 Mobile = 605.

Pivot (rotate)

Rotates the data axes to provide alternative presentations of the data. Like rotating a spreadsheet — rows become columns and vice versa.

Drill-across

Accesses more than one fact table linked by shared dimensions (looks across multiple cubes).

Drill-through

Goes all the way to the bottom-most level of a dimension, often reaching the raw underlying relational data. Easiest in ROLAP.

Slice fixes ONE dimension at ONE value (reduces dimensions). Dice fixes MULTIPLE dimensions at MULTIPLE values (sub-cube, same number of dimensions). Roll-up REMOVES dimensions; drill-down ADDS them.

Roll-up visual (from city → country on location dimension)

Before: city level

Chicago New York Toronto Vancouver

440 1560 395

Mobile Modem Phone Security item (types)

Q1 Q2 Q3 Q4

roll-up on location (cities→countries)

After: country level USA 2000 Canada 1000 Q1 Q2 Q3 Q4 Mobile Modem Phone Security

Slice visual (time = Q1 fixed)

Original cube Chicago New York Toronto Vancouver 605 825 14 400 Q1 Q2 Q3 Q4 Time (quarter) Mobile Modem Phone Security

slice for time=”Q1″

Sub-cube (Q1 only) Chicago 605 New York 1560 Toronto 395 Vancouver Mobile Modem Phone Security

6. CUBE Operator in SQL

  • Aggregates facts at every level of each dimension in a given OLAP schema.
  • Data cubes are not geometrically “cubes” — they rarely have equal sides; most have more than 3 dimensions.
  • Major SQL vendors (Oracle, SQL Server, etc.) include the CUBE operator.

Typical sequence for cube computation

  1. Identify physical sources of data
  2. Specify logical views built upon physical sources
  3. Build cube for specified measures and dimensions

7. Star-Net Query Model

A model for understanding multi-dimensional queries. Each dimension radiates outwards from a central fact with hierarchy levels shown as dots. Each circle on a spoke is called a footprint.

Dimension Hierarchy levels (footprints)
Customer Orders Contracts → Order
Shipping Method Air-Express → Truck
Time Annually → Qtrly → Daily
Product Product Line → Product Item → Product Group
Location City → District → Region → Division → Country
Promotion (footprint)
Organization Sales Person → Division
Customer (footprint)

8. OLTP vs OLAP

Feature OLTP (Online Transaction Processing) OLAP (Online Analytical Processing)
Source of data Operational data; OLTP is the original source Consolidation data; comes from various OLTP DBs
Purpose Control and run fundamental business tasks Planning, problem solving, decision support
What data reveals Snapshot of ongoing business processes Multi-dimensional views of business activities
Inserts/Updates Short, fast inserts/updates by end users Periodic long-running batch jobs refresh data
Queries Standardised, simple; few records returned Often complex, involving aggregations
Processing speed Typically very fast Depends on data volume; batch jobs may take hours; indexes improve speed
Space requirements Relatively small if historical data archived Larger due to aggregation structures and history data; more indexes needed
Database design Highly normalised with many tables Typically de-normalised; star/snowflake schemas
Backup and recovery Backup religiously — data loss = financial/legal risk May simply reload OLTP data as recovery method
OLTP = day-to-day operations (transactions), normalised, fast, few rows per query. OLAP = analysis and decision support, de-normalised, complex queries, many rows aggregated.

9. OLAP Implementation Types

Overview

MOLAP — Multidimensional OLAP

Data stored in a proprietary multidimensional structure (MDDB — multidimensional database), optimised for multidimensional analysis.

+ Pro: Very fast query response — data is pre-calculated.

− Con: Practical size limit — pre-calculation time and storage space grow fast.

ROLAP — Relational OLAP

Standard relational database with a star or snowflake schema (multidimensional model on top of RDBMS).

+ Pro: More scalable solution; no data size limit beyond the RDBMS limit.

− Con: Query performance limited by SQL complexity and join sizes.

HOLAP — Hybrid OLAP

Mix of MOLAP and ROLAP. Higher levels implemented as MOLAP, lower levels as ROLAP.

Large amounts of data in RDBMS; frequently accessed summary data in multidimensional system.

DOLAP — Desktop OLAP

Server-side OLAP downloaded to the desktop. Users quickly pull small cubes onto their laptops/desktops.

9.1 MOLAP — detailed

Architecture: RDBMS Server → Data Warehouse → MOLAP engine creates and stores Data Cubes → MDDBs (proprietary store) → Presentation Layer → Client Desktop.

MOLAP advantages MOLAP disadvantages
Excellent performance — cubes built for fast retrieval, optimal for slice/dice Limited data volume — all calculations pre-generated at cube build time; only summary-level data fits in the cube
Complex calculations achievable and fast — pre-generated at build time Requires additional investment — cube technology is often proprietary; human and capital resources needed

9.2 ROLAP — detailed

Architecture: Data Warehouse → RDBMS Server → ROLAP engine (uses complex SQL, dynamically creates data cubes) → cloud/virtual cubes → Client Desktop.

ROLAP advantages ROLAP disadvantages
Can handle large amounts of data — no limitation beyond the underlying RDBMS Performance can be slow — each report = one or more SQL queries; slow on large data
Leverages all functionalities of the relational database Limited by SQL functionalities — complex calculations are difficult; vendors add built-in functions to mitigate

9.3 HOLAP — detailed

  • Combines standardisation and large-data capability of ROLAP with query speed of MOLAP.
  • Largest data volumes stored in RDBMS.
  • Frequently accessed data stored in multidimensional system.
  • If multidimensional store can’t answer a query, system transparently accesses the RDBMS layer.

9.4 Side-by-side architecture comparison (MOLAP vs ROLAP)

In MOLAP: OLAP engine sits on a special server with an MDDB storing data as multidimensional hypercubes. In ROLAP: OLAP services dynamically create cubes from a conventional RDBMS + Data Warehouse via complex SQL — no proprietary MDDB required.

Aspect MOLAP ROLAP
Data storage Relational tables in warehouse + various summaries in proprietary MDDBs. Moderate volumes. Summary data from MDB, detail from warehouse. Relational tables in warehouse only. Detailed and light summary data. Very large volumes. All access from warehouse.
Technology Pre-fabricated cubes by MOLAP engine. Proprietary multidimensional arrays. High-speed matrix retrieval. Sparse matrix technology. Complex SQL from warehouse. ROLAP engine creates cubes on the fly. Multidimensional views by presentation layer.
Functions & features Faster access. Large library for complex calculations. Easy analysis across any number of dimensions. Extensive drill-down & slice-dice. Known environment, many tools available. Limitations on complex analysis. Drill-through to lowest level easier. Drill-across not always easy.
Cube browsing speed Fastest — even without aggregations, compressed multidimensional format is quicker than RDBMS Very slow
Processing time Faster at query time (pre-built) Slower, especially at higher aggregation levels
Storage space More space — data is copied; at low aggregation levels takes more room than ROLAP Almost no extra space — data not duplicated; but ROLAP aggregations take more space than MOLAP/HOLAP aggregations
Data availability offline All data in cube; can be viewed without original source being available Cannot view data unless connected to data source
Data volume limit Very limited — all data must fit in cube No limit beyond RDBMS limit
HOLAP = best of both: MOLAP speed for summaries + ROLAP scalability for detail. DOLAP = server OLAP pulled to the desktop for offline use.

10. Browsing a Data Cube

Data cube browsing combines three capabilities:

  • Visualisation — graphical display of multidimensional data
  • OLAP capabilities — roll-up, drill-down, slice, dice, pivot
  • Interactive manipulation — user-driven exploration

11. Revision Question (5 marks)

Question: A call centre supports customers who purchased refrigerators. Management wants to know:

  1. Which models require maximum support?
  2. Which staff respond to most calls?
  3. Which service centres give unsatisfactory after-sales service?

Design a Data Warehouse that helps management. (5 marks)

Model answer

Grain (1 mark): Individual call attended by support staff. Measure = call duration OR call count.

Dimensions (2 marks): Product (refrigerator model), Customer, Time, Support Staff, Service Centre.

Schema (2 marks): Star schema with a central fact table (Calls) connected to each dimension table. Attributes should include fields that directly answer the three management queries:

  • Product dimension → model, type, features (answers Q1)
  • Support Staff dimension → staff ID, name, team (answers Q2)
  • Service Centre dimension → centre ID, location, region (answers Q3)
  • Customer dimension → customer ID, purchase date, location
  • Time dimension → date, week, month, quarter, year
Always state the grain first. Dimensions determine what slicing/dicing is possible. Schema diagram should show FK relationships from fact table to each dimension.

Notes compiled from BITS Pilani presentation · Dr. S. Geetha · All slides covered

Leave a Reply