M3: OLAP — Exam-Ready Notes

Dr. S. Geetha · BITS Pilani · Complete slide coverage including all diagrams

1. Relational Form vs Multidimensional Structure

1.1 Sales data in relational form (Eg 3)

A relational database stores multidimensional data as a flat table. Every unique combination of dimension values becomes one row. Notice how MODEL repeats for every color — this repetition is the cost of storing multidimensional data relationally.

ModelColorSales volume
HatchbackBlue6
HatchbackRed5
HatchbackWhite4
SUVBlue3
SUVRed5
SUVWhite5
SedanBlue4
SedanRed3
SedanWhite2
Key concept: 2 dimensions (Model, Color) × 3 positions each = 9 rows in the relational table. The measurement (Sales Volume) is stored in each row.

1.2 The same data as a 2D multidimensional structure

The same 9 values can be arranged as a grid where each axis is a dimension and each cell holds the measurement. This is exactly what a data cube does.

COLOR → dimension Blue Red White MODEL dimension Hatchback SUV Sedan 6 5 4 3 5 5 4 3 2 ← Measurement (Sales volume) Positions → Blue Red White Positions ↑ Hatchback / SUV / Sedan
Each cell = one measurement. Each axis = one dimension. The positions along each axis are the dimension's values (Hatchback, SUV, Sedan = positions on the MODEL dimension).

2. Adding a Third Dimension (Dealer)

2.1 Why adding a dimension explodes the relational table

When the Dealer dimension (Mitra, Patel, Singh) is added, the relational table grows from 9 rows to 27 rows (3 × 3 × 3). Each combination of Model + Color + Dealer becomes one row.

Rows = dim1_positions × dim2_positions × dim3_positions = 3 × 3 × 3 = 27

A sample of the 27-row relational table (all combinations of Hatchback shown):

ModelColorDealershipVolume
HatchbackBlueMitra6
HatchbackBluePatel6
HatchbackBlueSingh2
HatchbackRedMitra3
HatchbackRedPatel5
HatchbackRedSingh5
HatchbackWhiteMitra2
HatchbackWhitePatel4
HatchbackWhiteSingh3
… 18 more rows for SUV and Sedan …

2.2 The same data as a 3D cube

With 3 dimensions the structure becomes a physical cube. Each face/layer of the cube is one dealer's data. The front face (Mitra's layer) shows Hatchback-Blue = 2, Hatchback-Red = 5, Hatchback-White = 3 etc.

2 5 3 n n n n n n Hatchback SUV Sedan MODEL Blue Red White COLOR Mitra Patel Singh DEALER ← depth = Dealer dimension →
Generalisation: If each dimension has n positions, the relational table needs nk records where k = number of dimensions. With 10 positions and 3 dimensions: 10 × 10 × 10 = 1,000 rows.
Records in relational table = nk  (n = positions per dim, k = no. of dimensions)

3. Data Cube — Concepts

3.1 Anatomy of a data cube (Revision example)

The data cube below has 3 dimensions: Time (M T W Th F S S), Product (Juice, Milk, Coke, Cream, Soap, Bread), Location (Pilani, Dubai, Goa). A cell holds a measurement — e.g., "56 units of Bread sold in Pilani on Monday".

Dimensions: Time, Product, Location
Attributes: Product (upc, price…), Location (city, region…)
Hierarchies:
ProductBrand
DayWeekQuarter
CityRegionCountry

Roll-up to week means collapsing the 7 daily columns into one weekly total — aggregating along the Time hierarchy from Day → Week.

Hierarchies are key: Roll-up ALWAYS goes UP a hierarchy (finer → coarser). Drill-down goes DOWN (coarser → finer).

4. OLAP — Definition & Architecture

4.1 Definition

OLAP (Online Analytical Processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. It allows analysis of database information from multiple database systems at one time.

4.2 OLAP system architecture (Source → Warehouse → Cube → User)

Data flows from multiple source databases into a central Data Warehouse. The OLAP server then processes the warehouse data to produce OLAP Cubes, which users query through analytical applications.

StageComponentRole
1Source data (SQL Server, Excel, MySQL)Original operational databases
2Data WarehouseConsolidated, cleaned, integrated store
3OLAP Cube (via processing)Pre-aggregated multidimensional structure
4End-user toolsAnalytics apps, reporting, data mining, business modelling

4.3 OLAP Cube architecture

The Data Warehouse (relational dimensional model) is loaded into the OLAP server, which creates Physical Multi-Dimensional Cubes. Users then access these cubes through:

5. OLAP Operations

There are 7 operations you must know: Roll-up, Drill-down, Slice, Dice, Pivot, Drill-across, Drill-through.

Roll-up (drill-up)

Climbs up a concept hierarchy, aggregating data at a coarser level. One or more dimensions are removed from the cube.

Example: Location hierarchy = street < city < region < country. Rolling up from city to country aggregates individual city data into national totals.

Drill-down (reverse of roll-up)

Steps down a concept hierarchy or introduces a new dimension. One or more dimensions are added. Navigates from less detailed → highly detailed data.

Example: Time hierarchy = day < month < quarter < year. Drilling down from quarter to month: Q1 splits into Jan, Feb, Mar etc.

Two ways to drill-down:

Slice

Selects one particular value on one dimension, producing a new sub-cube with one fewer dimension.

Example: Slice on Time = "Q1" → fixes the Time dimension at Q1, producing a 2D grid (Location × Item).

Dice

Selects two or more dimensions and two or more values per dimension, producing a smaller sub-cube (all original dimensions retained, but restricted).

Example:

Result: a sub-cube of 2×2×2 = 8 cells, with Chicago Q1 Mobile = 605.

Pivot (rotate)

Rotates the data axes to provide alternative presentations of the data. Like rotating a spreadsheet — rows become columns and vice versa.

Drill-across

Accesses more than one fact table linked by shared dimensions (looks across multiple cubes).

Drill-through

Goes all the way to the bottom-most level of a dimension, often reaching the raw underlying relational data. Easiest in ROLAP.

Slice fixes ONE dimension at ONE value (reduces dimensions). Dice fixes MULTIPLE dimensions at MULTIPLE values (sub-cube, same number of dimensions). Roll-up REMOVES dimensions; drill-down ADDS them.

Roll-up visual (from city → country on location dimension)

Before: city level Chicago New York Toronto Vancouver 440 1560 395 Mobile Modem Phone Security item (types) Q1 Q2 Q3 Q4 roll-up on location (cities→countries) After: country level USA 2000 Canada 1000 Q1 Q2 Q3 Q4 Mobile Modem Phone Security

Slice visual (time = Q1 fixed)

Original cube Chicago New York Toronto Vancouver 605 825 14 400 Q1 Q2 Q3 Q4 Time (quarter) Mobile Modem Phone Security slice for time="Q1" Sub-cube (Q1 only) Chicago 605 New York 1560 Toronto 395 Vancouver Mobile Modem Phone Security

6. CUBE Operator in SQL

Typical sequence for cube computation

  1. Identify physical sources of data
  2. Specify logical views built upon physical sources
  3. Build cube for specified measures and dimensions

7. Star-Net Query Model

A model for understanding multi-dimensional queries. Each dimension radiates outwards from a central fact with hierarchy levels shown as dots. Each circle on a spoke is called a footprint.

DimensionHierarchy levels (footprints)
Customer OrdersContracts → Order
Shipping MethodAir-Express → Truck
TimeAnnually → Qtrly → Daily
ProductProduct Line → Product Item → Product Group
LocationCity → District → Region → Division → Country
Promotion(footprint)
OrganizationSales Person → Division
Customer(footprint)

8. OLTP vs OLAP

Feature OLTP (Online Transaction Processing) OLAP (Online Analytical Processing)
Source of dataOperational data; OLTP is the original sourceConsolidation data; comes from various OLTP DBs
PurposeControl and run fundamental business tasksPlanning, problem solving, decision support
What data revealsSnapshot of ongoing business processesMulti-dimensional views of business activities
Inserts/UpdatesShort, fast inserts/updates by end usersPeriodic long-running batch jobs refresh data
QueriesStandardised, simple; few records returnedOften complex, involving aggregations
Processing speedTypically very fastDepends on data volume; batch jobs may take hours; indexes improve speed
Space requirementsRelatively small if historical data archivedLarger due to aggregation structures and history data; more indexes needed
Database designHighly normalised with many tablesTypically de-normalised; star/snowflake schemas
Backup and recoveryBackup religiously — data loss = financial/legal riskMay simply reload OLTP data as recovery method
OLTP = day-to-day operations (transactions), normalised, fast, few rows per query. OLAP = analysis and decision support, de-normalised, complex queries, many rows aggregated.

9. OLAP Implementation Types

Overview

MOLAP — Multidimensional OLAP

Data stored in a proprietary multidimensional structure (MDDB — multidimensional database), optimised for multidimensional analysis.

+ Pro: Very fast query response — data is pre-calculated.

− Con: Practical size limit — pre-calculation time and storage space grow fast.

ROLAP — Relational OLAP

Standard relational database with a star or snowflake schema (multidimensional model on top of RDBMS).

+ Pro: More scalable solution; no data size limit beyond the RDBMS limit.

− Con: Query performance limited by SQL complexity and join sizes.

HOLAP — Hybrid OLAP

Mix of MOLAP and ROLAP. Higher levels implemented as MOLAP, lower levels as ROLAP.

Large amounts of data in RDBMS; frequently accessed summary data in multidimensional system.

DOLAP — Desktop OLAP

Server-side OLAP downloaded to the desktop. Users quickly pull small cubes onto their laptops/desktops.

9.1 MOLAP — detailed

Architecture: RDBMS Server → Data Warehouse → MOLAP engine creates and stores Data Cubes → MDDBs (proprietary store) → Presentation Layer → Client Desktop.

MOLAP advantagesMOLAP disadvantages
Excellent performance — cubes built for fast retrieval, optimal for slice/dice Limited data volume — all calculations pre-generated at cube build time; only summary-level data fits in the cube
Complex calculations achievable and fast — pre-generated at build time Requires additional investment — cube technology is often proprietary; human and capital resources needed

9.2 ROLAP — detailed

Architecture: Data Warehouse → RDBMS Server → ROLAP engine (uses complex SQL, dynamically creates data cubes) → cloud/virtual cubes → Client Desktop.

ROLAP advantagesROLAP disadvantages
Can handle large amounts of data — no limitation beyond the underlying RDBMS Performance can be slow — each report = one or more SQL queries; slow on large data
Leverages all functionalities of the relational database Limited by SQL functionalities — complex calculations are difficult; vendors add built-in functions to mitigate

9.3 HOLAP — detailed

9.4 Side-by-side architecture comparison (MOLAP vs ROLAP)

In MOLAP: OLAP engine sits on a special server with an MDDB storing data as multidimensional hypercubes. In ROLAP: OLAP services dynamically create cubes from a conventional RDBMS + Data Warehouse via complex SQL — no proprietary MDDB required.

AspectMOLAPROLAP
Data storage Relational tables in warehouse + various summaries in proprietary MDDBs. Moderate volumes. Summary data from MDB, detail from warehouse. Relational tables in warehouse only. Detailed and light summary data. Very large volumes. All access from warehouse.
Technology Pre-fabricated cubes by MOLAP engine. Proprietary multidimensional arrays. High-speed matrix retrieval. Sparse matrix technology. Complex SQL from warehouse. ROLAP engine creates cubes on the fly. Multidimensional views by presentation layer.
Functions & features Faster access. Large library for complex calculations. Easy analysis across any number of dimensions. Extensive drill-down & slice-dice. Known environment, many tools available. Limitations on complex analysis. Drill-through to lowest level easier. Drill-across not always easy.
Cube browsing speed Fastest — even without aggregations, compressed multidimensional format is quicker than RDBMS Very slow
Processing time Faster at query time (pre-built) Slower, especially at higher aggregation levels
Storage space More space — data is copied; at low aggregation levels takes more room than ROLAP Almost no extra space — data not duplicated; but ROLAP aggregations take more space than MOLAP/HOLAP aggregations
Data availability offline All data in cube; can be viewed without original source being available Cannot view data unless connected to data source
Data volume limit Very limited — all data must fit in cube No limit beyond RDBMS limit
HOLAP = best of both: MOLAP speed for summaries + ROLAP scalability for detail. DOLAP = server OLAP pulled to the desktop for offline use.

10. Browsing a Data Cube

Data cube browsing combines three capabilities:

11. Revision Question (5 marks)

Question: A call centre supports customers who purchased refrigerators. Management wants to know:
  1. Which models require maximum support?
  2. Which staff respond to most calls?
  3. Which service centres give unsatisfactory after-sales service?
Design a Data Warehouse that helps management. (5 marks)

Model answer

Grain (1 mark): Individual call attended by support staff. Measure = call duration OR call count.

Dimensions (2 marks): Product (refrigerator model), Customer, Time, Support Staff, Service Centre.

Schema (2 marks): Star schema with a central fact table (Calls) connected to each dimension table. Attributes should include fields that directly answer the three management queries:

Always state the grain first. Dimensions determine what slicing/dicing is possible. Schema diagram should show FK relationships from fact table to each dimension.

Notes compiled from BITS Pilani presentation · Dr. S. Geetha · All slides covered