Datawarehouse Numerical Questions and Worked Examples

1. Range Partitioning Based on Year

Problem:
Consider the SALES table:

Sale_ID | Date | Region | Amount

101 | 2022-01-15 | South | 5000
102 | 2022-05-20 | North | 7000
103 | 2023-02-10 | East | 6000
104 | 2023-07-25 | West | 8000

Partition the table based on the year.

Solution:
The year is extracted from the Date column.

Records belonging to year 2022 are: 101, 102

Records belonging to year 2023 are: 103, 104

Therefore, the partitions are:

P1 (2022 Data): 101, 102
P2 (2023 Data): 103, 104

This is called range partitioning, because the records are divided according to a range of year values.

2. List Partitioning Based on Region

Problem:
Consider the CUSTOMER table:

Cust_ID | Name | Region

Partition the table based on the Region column.

Solution:
Rows are grouped according to the category values in the Region column.

Customers in South region are: 1, 4

Customers in North region are: 2

Customers in East region are: 3

Therefore, the partitions are:

P1 (South Region): 1, 4
P2 (North Region): 2
P3 (East Region): 3

This is called list partitioning, because partitioning is based on specific category values.

3. List Partitioning Based on Customer Name

Problem:
Consider the CUSTOMER table:

Cust_ID | Name | Region

Partition the table based on the Name column.

Solution:
Rows having the same customer name are grouped into the same partition.

For Ravi: 1, 10
For John: 2, 5, 6, 7
For Asha: 3, 8, 9
For Meena: 4

Therefore, the partitions are:

P1 (Ravi): 1, 10
P2 (John): 2, 5, 6, 7
P3 (Asha): 3, 8, 9
P4 (Meena): 4

This is again list partitioning, because grouping is done using fixed values of the Name field.

4. Hash Partitioning on Order_ID Using Odd and Even Rule

Problem:
Consider the ORDERS table:

Order_ID | Customer | Amount

201 | A | 1000
202 | B | 2000
203 | C | 1500
204 | D | 2500

Partition the table by applying hashing on Order_ID using the odd-even rule.

Solution:
Even order IDs go into one partition, and odd order IDs go into another partition.

Even Order_ID values are: 202, 204

Odd Order_ID values are: 201, 203

Therefore, the partitions are:

P1 (Even): 202, 204
P2 (Odd): 201, 203

This is a simple example of hash-style partitioning based on a derived rule.

5. Hashing with Respect to Amount Divided by 100

Problem:
Consider the ORDERS table:

Order_ID | Customer | Amount

201 | A | 1000
202 | B | 2000
203 | C | 1500
204 | D | 2500

Partition the table by considering whether Amount ÷ 100 is odd or even.

Solution:
First divide each amount by 100.

For 201: 1000 ÷ 100 = 10
For 202: 2000 ÷ 100 = 20
For 203: 1500 ÷ 100 = 15
For 204: 2500 ÷ 100 = 25

Now check odd and even:

Even results: 10, 20
Odd results: 15, 25

So the partitions are:

P1 (Even result): 201, 202
P2 (Odd result): 203, 204

6. Composite Partitioning by Year and Region

Problem:
Consider the SALES table:

Sale_ID | Year | Region | Amount

301 | 2023 | South | 5000
302 | 2023 | North | 7000
303 | 2024 | South | 6000

Partition the table using composite partitioning by Year and Region.

Solution:
In composite partitioning, the table is first partitioned using one attribute and then sub-partitioned using another attribute.

First partition by Year:

For year 2023: 301, 302
For year 2024: 303

Then within each year, partition by Region.

Therefore:

P1 (Year 2023)
SP1 (South): 301
SP2 (North): 302

P2 (Year 2024)
SP1 (South): 303

This is called composite partitioning because two partitioning criteria are applied.

7. Partitioning Based on Month Extracted from Date

Problem:
Consider the SALES table:

Sale_ID | Date | Region | Amount

101 | 2022-01-15 | South | 5000
102 | 2022-05-20 | North | 7000
103 | 2023-02-10 | East | 6000
104 | 2023-07-25 | West | 8000
105 | 2022-01-15 | South | 5000
106 | 2022-07-15 | South | 5000
107 | 2022-05-15 | South | 5000
108 | 2022-02-15 | South | 5000
109 | 2022-07-15 | South | 5000

Partition the records based on the month extracted from the Date column.

Solution:
Extract the month from each date.

January entries: 101, 105
February entries: 103, 108
May entries: 102, 107
July entries: 104, 106, 109

Therefore, the partitions are:

P1 (Month = 1, January): 101, 105
P2 (Month = 2, February): 103, 108
P3 (Month = 5, May): 102, 107
P4 (Month = 7, July): 104, 106, 109

This is partitioning based on a value derived from the date field.

8. Storage Requirement Calculation

Problem:
A company stores 2 GB of data per day. Data retention period is 3 years. Find the total storage required.

Solution:
Data stored per day = 2 GB

Retention period = 3 years

Total number of days =
3 × 365 = 1095 days

Total storage required =
1095 × 2 = 2190 GB

Using decimal conversion:

1 TB = 1000 GB

Therefore:

2190 GB = 2.19 TB

Hence, the total storage required is:

2190 GB or approximately 2.19 TB

If binary-style conversion is used:

2190 ÷ 1024 = 2.14 TB approximately

So under binary interpretation, it is about 2.14 TB.

9. Linear Growth from 1 GB in Year 1 to 2 TB in Year 10

Problem:
At the end of Year 1, data is 1 GB. At the end of Year 10, data is 2 TB. Find the yearly increment, assuming linear growth.

Solution:
Using decimal conversion:

2 TB = 2000 GB

Initial value at Year 1 = 1 GB
Final value at Year 10 = 2000 GB

Total increase =
2000 − 1 = 1999 GB

This increase takes place over:

10 − 1 = 9 years

Therefore, yearly increment =
1999 ÷ 9 = 222.11 GB approximately

Hence, the yearly increment is:

222.11 GB per year approximately

If 1 TB = 1024 GB is used, then:

2 TB = 2048 GB

Increase =
2048 − 1 = 2047 GB

Yearly increment =
2047 ÷ 9 = 227.44 GB approximately

10. Data After 5 Years and 7 Years Later

Problem:
At the end of Year 1, data is 1 GB. At the end of Year 10, data is 2 TB. Assuming linear growth, find the data size 5 years later and 7 years later from Year 10.

Solution:
Using decimal conversion:

2 TB = 2000 GB

From the previous result, yearly increment = 222.11 GB approximately

After 5 years from Year 10, that is Year 15:

Data at Year 15 =
2000 + (5 × 222.11)
= 2000 + 1110.55
= 3110.55 GB

Converting to TB:

3110.55 ÷ 1000 = 3.11 TB approximately

After 7 years from Year 10, that is Year 17:

Data at Year 17 =
2000 + (7 × 222.11)
= 2000 + 1554.77
= 3554.77 GB

Converting to TB:

3554.77 ÷ 1000 = 3.55 TB approximately

Therefore:

After 5 years: approximately 3.11 TB
After 7 years: approximately 3.55 TB

11. Construction of Leaf Pages in a B-Tree with Page Size 4

Problem:
Construct the leaf-level arrangement for the following names in a B-Tree, assuming page size = 4:

abe, arnold, james, gita, sam, ragul, siva, anand, john, jaini, rupa, raj, vivekanda, akhila, sidharth, prithvi, srishti

Solution:
First arrange the names in alphabetical order:

abe, akhila, anand, arnold, gita, jaini, james, john, prithvi, ragul, raj, rupa, sam, sidharth, siva, srishti, vivekanda

Since page size is 4, each leaf page can hold at most 4 entries.

Therefore, the leaf nodes are:

Leaf 1: abe, akhila, anand, arnold
Leaf 2: gita, jaini, james, john
Leaf 3: prithvi, ragul, raj, rupa
Leaf 4: sam, sidharth, siva, srishti
Leaf 5: vivekanda

This forms the bottom level of the B-Tree.

12. Inserting JOHN into a Full Leaf Node in a B-Tree

Problem:
In the illustrated B-Tree, the leaf node contains:

HAIG, IGNAR, JONES, KUMAR

If the page size is 4, explain what happens when JOHN is inserted.

Solution:
The name JOHN belongs alphabetically between IGNAR and JONES.

So after insertion, the leaf would become:

HAIG, IGNAR, JOHN, JONES, KUMAR

This gives 5 entries, which exceeds the page size of 4.

Therefore, overflow occurs.

The leaf node must be split into two nodes. One possible split is:

Leaf 1: HAIG, IGNAR
Leaf 2: JOHN, JONES, KUMAR

Then a separator key, typically JOHN, is promoted to the immediate parent.

If the parent also overflows, splitting continues upward. This is how the B-Tree maintains balance.

13. Identification of Immediate Parent in a B-Tree

Problem:
In the given B-Tree, identify the immediate parent of the leaf node containing:

HAIG, IGNAR, JONES, KUMAR

Solution:
The immediate parent is the internal node directly above this leaf node.

That node contains:

A-D, E-G, H-K

Hence, the immediate parent of the leaf node HAIG, IGNAR, JONES, KUMAR is:

A-D, E-G, H-K

This parent node is the first node to be updated if the leaf splits.

14. Levels of the B-Tree

Problem:
Identify the levels of the B-Tree in bottom-up form.

Solution:
In bottom-up interpretation:

Level I: Leaf level
This contains the actual indexed values and pointers to data rows.

Level II: Intermediate or internal level
This contains separator ranges that guide the search.

Level III: Root level
This is the topmost node and starting point of search.

Therefore, in bottom-up order:

Level I = Leaf level
Level II = Intermediate level
Level III = Root level

15. Meaning of Dynamic Nature of B-Tree

Problem:
Explain what is meant by saying that the B-Tree is dynamic in nature.

Solution:
A B-Tree is dynamic because it automatically adjusts its structure when data is inserted or deleted.

When a node becomes full, it is split.
When nodes become too empty after deletion, they may be merged or redistributed.

These changes ensure that:

the tree remains balanced
all leaf nodes stay at the same level
search and update operations remain efficient

Thus, a B-Tree grows and shrinks dynamically according to changes in data.

16. Selectivity in Data Warehouse Columns

Problem:
What can be said about the selectivity of data in a data warehouse? Are most columns highly selective?

Solution:
In a data warehouse, most columns are not highly selective.

This is because many warehouse columns contain repeated business values such as:

region
gender
department
category
month
status

These values appear in many rows, so queries on them often return large numbers of records.

Highly selective columns are usually:

primary key
order ID
customer ID
transaction ID

Hence, the correct conclusion is:

Most columns in a data warehouse are not highly selective. Primary keys and unique identifiers are the main highly selective columns.

17. Duplicate Values in a B-Tree Index

Problem:
Are duplicate values allowed in a B-Tree index?

Solution:
Duplicate values are not allowed only when the B-Tree index is unique, such as a primary key index.

In a non-unique B-Tree index, duplicate values are allowed.

For example:

If a B-Tree index is built on Customer_ID as primary key, duplicates are not allowed.

If a B-Tree index is built on Region, values such as South may appear many times, so duplicates are allowed.

Hence, the correct statement is:

Duplicates are not allowed in a unique B-Tree index, but duplicates are allowed in a non-unique B-Tree index.

If you want, I will now turn this into a more polished single continuous exam notes document with a proper title, short introduction, and better subheading flow.

Datawarehouse Numerical Questions and Worked Examples

1. Range Partitioning Based on Year

2. List Partitioning Based on Region

3. List Partitioning Based on Customer Name

4. Hash Partitioning on Order_ID Using Odd and Even Rule

5. Hashing with Respect to Amount Divided by 100

6. Composite Partitioning by Year and Region

7. Partitioning Based on Month Extracted from Date

8. Storage Requirement Calculation

9. Linear Growth from 1 GB in Year 1 to 2 TB in Year 10

10. Data After 5 Years and 7 Years Later

After 5 years from Year 10, that is Year 15:

After 7 years from Year 10, that is Year 17:

11. Construction of Leaf Pages in a B-Tree with Page Size 4

12. Inserting JOHN into a Full Leaf Node in a B-Tree

13. Identification of Immediate Parent in a B-Tree

14. Levels of the B-Tree

15. Meaning of Dynamic Nature of B-Tree

16. Selectivity in Data Warehouse Columns

17. Duplicate Values in a B-Tree Index

Related

Leave a ReplyCancel reply

1. Range Partitioning Based on Year

2. List Partitioning Based on Region

3. List Partitioning Based on Customer Name

4. Hash Partitioning on Order_ID Using Odd and Even Rule

5. Hashing with Respect to Amount Divided by 100

6. Composite Partitioning by Year and Region

7. Partitioning Based on Month Extracted from Date

8. Storage Requirement Calculation

9. Linear Growth from 1 GB in Year 1 to 2 TB in Year 10

10. Data After 5 Years and 7 Years Later

After 5 years from Year 10, that is Year 15:

After 7 years from Year 10, that is Year 17:

11. Construction of Leaf Pages in a B-Tree with Page Size 4

12. Inserting JOHN into a Full Leaf Node in a B-Tree

13. Identification of Immediate Parent in a B-Tree

14. Levels of the B-Tree

15. Meaning of Dynamic Nature of B-Tree

16. Selectivity in Data Warehouse Columns

17. Duplicate Values in a B-Tree Index

Related

Leave a ReplyCancel reply

Discover more from Tech For Talk