M3: OLAP — Exam-Ready Notes
Dr. S. Geetha · BITS Pilani · Complete slide coverage including all diagrams
1. Relational Form vs Multidimensional Structure
1.1 Sales data in relational form (Eg 3)
A relational database stores multidimensional data as a flat table. Every unique combination of dimension values becomes one row. Notice how MODEL repeats for every color — this repetition is the cost of storing multidimensional data relationally.
| Model | Color | Sales volume |
|---|---|---|
| Hatchback | Blue | 6 |
| Hatchback | Red | 5 |
| Hatchback | White | 4 |
| SUV | Blue | 3 |
| SUV | Red | 5 |
| SUV | White | 5 |
| Sedan | Blue | 4 |
| Sedan | Red | 3 |
| Sedan | White | 2 |
1.2 The same data as a 2D multidimensional structure
The same 9 values can be arranged as a grid where each axis is a dimension and each cell holds the measurement. This is exactly what a data cube does.
2. Adding a Third Dimension (Dealer)
2.1 Why adding a dimension explodes the relational table
When the Dealer dimension (Mitra, Patel, Singh) is added, the relational table grows from 9 rows to 27 rows (3 × 3 × 3). Each combination of Model + Color + Dealer becomes one row.
A sample of the 27-row relational table (all combinations of Hatchback shown):
| Model | Color | Dealership | Volume |
|---|---|---|---|
| Hatchback | Blue | Mitra | 6 |
| Hatchback | Blue | Patel | 6 |
| Hatchback | Blue | Singh | 2 |
| Hatchback | Red | Mitra | 3 |
| Hatchback | Red | Patel | 5 |
| Hatchback | Red | Singh | 5 |
| Hatchback | White | Mitra | 2 |
| Hatchback | White | Patel | 4 |
| Hatchback | White | Singh | 3 |
| … 18 more rows for SUV and Sedan … | |||
2.2 The same data as a 3D cube
With 3 dimensions the structure becomes a physical cube. Each face/layer of the cube is one dealer’s data. The front face (Mitra’s layer) shows Hatchback-Blue = 2, Hatchback-Red = 5, Hatchback-White = 3 etc.
3. Data Cube — Concepts
3.1 Anatomy of a data cube (Revision example)
The data cube below has 3 dimensions: Time (M T W Th F S S), Product (Juice, Milk, Coke, Cream, Soap, Bread), Location (Pilani, Dubai, Goa). A cell holds a measurement — e.g., “56 units of Bread sold in Pilani on Monday”.
Attributes: Product (upc, price…), Location (city, region…)
Hierarchies:
Roll-up to week means collapsing the 7 daily columns into one weekly total — aggregating along the Time hierarchy from Day → Week.
4. OLAP — Definition & Architecture
4.1 Definition
4.2 OLAP system architecture (Source → Warehouse → Cube → User)
Data flows from multiple source databases into a central Data Warehouse. The OLAP server then processes the warehouse data to produce OLAP Cubes, which users query through analytical applications.
| Stage | Component | Role |
|---|---|---|
| 1 | Source data (SQL Server, Excel, MySQL) | Original operational databases |
| 2 | Data Warehouse | Consolidated, cleaned, integrated store |
| 3 | OLAP Cube (via processing) | Pre-aggregated multidimensional structure |
| 4 | End-user tools | Analytics apps, reporting, data mining, business modelling |
4.3 OLAP Cube architecture
The Data Warehouse (relational dimensional model) is loaded into the OLAP server, which creates Physical Multi-Dimensional Cubes. Users then access these cubes through:
- Analytics applications (Business Objects, Cognos)
- Business reporting (Dashboard Manager)
- Data mining (SQL Server Mining, SAS Mining)
- Business modelling
5. OLAP Operations
There are 7 operations you must know: Roll-up, Drill-down, Slice, Dice, Pivot, Drill-across, Drill-through.
Roll-up (drill-up)
Climbs up a concept hierarchy, aggregating data at a coarser level. One or more dimensions are removed from the cube.
Example: Location hierarchy = street < city < region < country. Rolling up from city to country aggregates individual city data into national totals.
- Before: Chicago=440, New York=1560, Toronto=395, Vancouver=… (city level)
- After: USA=2000, Canada=1000 (country level)
Drill-down (reverse of roll-up)
Steps down a concept hierarchy or introduces a new dimension. One or more dimensions are added. Navigates from less detailed → highly detailed data.
Example: Time hierarchy = day < month < quarter < year. Drilling down from quarter to month: Q1 splits into Jan, Feb, Mar etc.
Two ways to drill-down:
- Step down a concept hierarchy for a dimension
- Introduce a new dimension
Slice
Selects one particular value on one dimension, producing a new sub-cube with one fewer dimension.
Example: Slice on Time = “Q1” → fixes the Time dimension at Q1, producing a 2D grid (Location × Item).
- Criterion: time = “Q1”
- Result: sub-cube retaining Chicago=605, New York=1560, Toronto=395, Vancouver=… for Q1 only
Dice
Selects two or more dimensions and two or more values per dimension, producing a smaller sub-cube (all original dimensions retained, but restricted).
Example:
- location = “Toronto” OR “Vancouver”
- time = “Q1” OR “Q2”
- item = “Mobile” OR “Modem”
Result: a sub-cube of 2×2×2 = 8 cells, with Chicago Q1 Mobile = 605.
Pivot (rotate)
Rotates the data axes to provide alternative presentations of the data. Like rotating a spreadsheet — rows become columns and vice versa.
Drill-across
Accesses more than one fact table linked by shared dimensions (looks across multiple cubes).
Drill-through
Goes all the way to the bottom-most level of a dimension, often reaching the raw underlying relational data. Easiest in ROLAP.
Roll-up visual (from city → country on location dimension)
Slice visual (time = Q1 fixed)
6. CUBE Operator in SQL
- Aggregates facts at every level of each dimension in a given OLAP schema.
- Data cubes are not geometrically “cubes” — they rarely have equal sides; most have more than 3 dimensions.
- Major SQL vendors (Oracle, SQL Server, etc.) include the CUBE operator.
Typical sequence for cube computation
- Identify physical sources of data
- Specify logical views built upon physical sources
- Build cube for specified measures and dimensions
7. Star-Net Query Model
A model for understanding multi-dimensional queries. Each dimension radiates outwards from a central fact with hierarchy levels shown as dots. Each circle on a spoke is called a footprint.
| Dimension | Hierarchy levels (footprints) |
|---|---|
| Customer Orders | Contracts → Order |
| Shipping Method | Air-Express → Truck |
| Time | Annually → Qtrly → Daily |
| Product | Product Line → Product Item → Product Group |
| Location | City → District → Region → Division → Country |
| Promotion | (footprint) |
| Organization | Sales Person → Division |
| Customer | (footprint) |
8. OLTP vs OLAP
| Feature | OLTP (Online Transaction Processing) | OLAP (Online Analytical Processing) |
|---|---|---|
| Source of data | Operational data; OLTP is the original source | Consolidation data; comes from various OLTP DBs |
| Purpose | Control and run fundamental business tasks | Planning, problem solving, decision support |
| What data reveals | Snapshot of ongoing business processes | Multi-dimensional views of business activities |
| Inserts/Updates | Short, fast inserts/updates by end users | Periodic long-running batch jobs refresh data |
| Queries | Standardised, simple; few records returned | Often complex, involving aggregations |
| Processing speed | Typically very fast | Depends on data volume; batch jobs may take hours; indexes improve speed |
| Space requirements | Relatively small if historical data archived | Larger due to aggregation structures and history data; more indexes needed |
| Database design | Highly normalised with many tables | Typically de-normalised; star/snowflake schemas |
| Backup and recovery | Backup religiously — data loss = financial/legal risk | May simply reload OLTP data as recovery method |
9. OLAP Implementation Types
Overview
MOLAP — Multidimensional OLAP
Data stored in a proprietary multidimensional structure (MDDB — multidimensional database), optimised for multidimensional analysis.
+ Pro: Very fast query response — data is pre-calculated.
− Con: Practical size limit — pre-calculation time and storage space grow fast.
ROLAP — Relational OLAP
Standard relational database with a star or snowflake schema (multidimensional model on top of RDBMS).
+ Pro: More scalable solution; no data size limit beyond the RDBMS limit.
− Con: Query performance limited by SQL complexity and join sizes.
HOLAP — Hybrid OLAP
Mix of MOLAP and ROLAP. Higher levels implemented as MOLAP, lower levels as ROLAP.
Large amounts of data in RDBMS; frequently accessed summary data in multidimensional system.
DOLAP — Desktop OLAP
Server-side OLAP downloaded to the desktop. Users quickly pull small cubes onto their laptops/desktops.
9.1 MOLAP — detailed
Architecture: RDBMS Server → Data Warehouse → MOLAP engine creates and stores Data Cubes → MDDBs (proprietary store) → Presentation Layer → Client Desktop.
| MOLAP advantages | MOLAP disadvantages |
|---|---|
| Excellent performance — cubes built for fast retrieval, optimal for slice/dice | Limited data volume — all calculations pre-generated at cube build time; only summary-level data fits in the cube |
| Complex calculations achievable and fast — pre-generated at build time | Requires additional investment — cube technology is often proprietary; human and capital resources needed |
9.2 ROLAP — detailed
Architecture: Data Warehouse → RDBMS Server → ROLAP engine (uses complex SQL, dynamically creates data cubes) → cloud/virtual cubes → Client Desktop.
| ROLAP advantages | ROLAP disadvantages |
|---|---|
| Can handle large amounts of data — no limitation beyond the underlying RDBMS | Performance can be slow — each report = one or more SQL queries; slow on large data |
| Leverages all functionalities of the relational database | Limited by SQL functionalities — complex calculations are difficult; vendors add built-in functions to mitigate |
9.3 HOLAP — detailed
- Combines standardisation and large-data capability of ROLAP with query speed of MOLAP.
- Largest data volumes stored in RDBMS.
- Frequently accessed data stored in multidimensional system.
- If multidimensional store can’t answer a query, system transparently accesses the RDBMS layer.
9.4 Side-by-side architecture comparison (MOLAP vs ROLAP)
In MOLAP: OLAP engine sits on a special server with an MDDB storing data as multidimensional hypercubes. In ROLAP: OLAP services dynamically create cubes from a conventional RDBMS + Data Warehouse via complex SQL — no proprietary MDDB required.
| Aspect | MOLAP | ROLAP |
|---|---|---|
| Data storage | Relational tables in warehouse + various summaries in proprietary MDDBs. Moderate volumes. Summary data from MDB, detail from warehouse. | Relational tables in warehouse only. Detailed and light summary data. Very large volumes. All access from warehouse. |
| Technology | Pre-fabricated cubes by MOLAP engine. Proprietary multidimensional arrays. High-speed matrix retrieval. Sparse matrix technology. | Complex SQL from warehouse. ROLAP engine creates cubes on the fly. Multidimensional views by presentation layer. |
| Functions & features | Faster access. Large library for complex calculations. Easy analysis across any number of dimensions. Extensive drill-down & slice-dice. | Known environment, many tools available. Limitations on complex analysis. Drill-through to lowest level easier. Drill-across not always easy. |
| Cube browsing speed | Fastest — even without aggregations, compressed multidimensional format is quicker than RDBMS | Very slow |
| Processing time | Faster at query time (pre-built) | Slower, especially at higher aggregation levels |
| Storage space | More space — data is copied; at low aggregation levels takes more room than ROLAP | Almost no extra space — data not duplicated; but ROLAP aggregations take more space than MOLAP/HOLAP aggregations |
| Data availability offline | All data in cube; can be viewed without original source being available | Cannot view data unless connected to data source |
| Data volume limit | Very limited — all data must fit in cube | No limit beyond RDBMS limit |
10. Browsing a Data Cube
Data cube browsing combines three capabilities:
- Visualisation — graphical display of multidimensional data
- OLAP capabilities — roll-up, drill-down, slice, dice, pivot
- Interactive manipulation — user-driven exploration
11. Revision Question (5 marks)
- Which models require maximum support?
- Which staff respond to most calls?
- Which service centres give unsatisfactory after-sales service?
Design a Data Warehouse that helps management. (5 marks)
Model answer
Grain (1 mark): Individual call attended by support staff. Measure = call duration OR call count.
Dimensions (2 marks): Product (refrigerator model), Customer, Time, Support Staff, Service Centre.
Schema (2 marks): Star schema with a central fact table (Calls) connected to each dimension table. Attributes should include fields that directly answer the three management queries:
- Product dimension → model, type, features (answers Q1)
- Support Staff dimension → staff ID, name, team (answers Q2)
- Service Centre dimension → centre ID, location, region (answers Q3)
- Customer dimension → customer ID, purchase date, location
- Time dimension → date, week, month, quarter, year
Notes compiled from BITS Pilani presentation · Dr. S. Geetha · All slides covered
Leave a Reply