1. Range Partitioning Based on Year

Problem:
Consider the SALES table:

Sale_ID | Date | Region | Amount

101 | 2022-01-15 | South | 5000
102 | 2022-05-20 | North | 7000
103 | 2023-02-10 | East | 6000
104 | 2023-07-25 | West | 8000

Partition the table based on the year.

Solution:
The year is extracted from the Date column.

Records belonging to year 2022 are: 101, 102

Records belonging to year 2023 are: 103, 104

Therefore, the partitions are:

P1 (2022 Data): 101, 102
P2 (2023 Data): 103, 104

This is called range partitioning, because the records are divided according to a range of year values.


2. List Partitioning Based on Region

Problem:
Consider the CUSTOMER table:

Cust_ID | Name | Region

1 | Ravi | South
2 | John | North
3 | Asha | East
4 | Meena | South

Partition the table based on the Region column.

Solution:
Rows are grouped according to the category values in the Region column.

Customers in South region are: 1, 4

Customers in North region are: 2

Customers in East region are: 3

Therefore, the partitions are:

P1 (South Region): 1, 4
P2 (North Region): 2
P3 (East Region): 3

This is called list partitioning, because partitioning is based on specific category values.


3. List Partitioning Based on Customer Name

Problem:
Consider the CUSTOMER table:

Cust_ID | Name | Region

1 | Ravi | South
2 | John | North
3 | Asha | East
4 | Meena | South
5 | John | North
6 | John | North
7 | John | North
8 | Asha | East
9 | Asha | East
10 | Ravi | East

Partition the table based on the Name column.

Solution:
Rows having the same customer name are grouped into the same partition.

For Ravi: 1, 10
For John: 2, 5, 6, 7
For Asha: 3, 8, 9
For Meena: 4

Therefore, the partitions are:

P1 (Ravi): 1, 10
P2 (John): 2, 5, 6, 7
P3 (Asha): 3, 8, 9
P4 (Meena): 4

This is again list partitioning, because grouping is done using fixed values of the Name field.


4. Hash Partitioning on Order_ID Using Odd and Even Rule

Problem:
Consider the ORDERS table:

Order_ID | Customer | Amount

201 | A | 1000
202 | B | 2000
203 | C | 1500
204 | D | 2500

Partition the table by applying hashing on Order_ID using the odd-even rule.

Solution:
Even order IDs go into one partition, and odd order IDs go into another partition.

Even Order_ID values are: 202, 204

Odd Order_ID values are: 201, 203

Therefore, the partitions are:

P1 (Even): 202, 204
P2 (Odd): 201, 203

This is a simple example of hash-style partitioning based on a derived rule.


5. Hashing with Respect to Amount Divided by 100

Problem:
Consider the ORDERS table:

Order_ID | Customer | Amount

201 | A | 1000
202 | B | 2000
203 | C | 1500
204 | D | 2500

Partition the table by considering whether Amount ÷ 100 is odd or even.

Solution:
First divide each amount by 100.

For 201: 1000 ÷ 100 = 10
For 202: 2000 ÷ 100 = 20
For 203: 1500 ÷ 100 = 15
For 204: 2500 ÷ 100 = 25

Now check odd and even:

Even results: 10, 20
Odd results: 15, 25

So the partitions are:

P1 (Even result): 201, 202
P2 (Odd result): 203, 204


6. Composite Partitioning by Year and Region

Problem:
Consider the SALES table:

Sale_ID | Year | Region | Amount

301 | 2023 | South | 5000
302 | 2023 | North | 7000
303 | 2024 | South | 6000

Partition the table using composite partitioning by Year and Region.

Solution:
In composite partitioning, the table is first partitioned using one attribute and then sub-partitioned using another attribute.

First partition by Year:

For year 2023: 301, 302
For year 2024: 303

Then within each year, partition by Region.

Therefore:

P1 (Year 2023)
SP1 (South): 301
SP2 (North): 302

P2 (Year 2024)
SP1 (South): 303

This is called composite partitioning because two partitioning criteria are applied.


7. Partitioning Based on Month Extracted from Date

Problem:
Consider the SALES table:

Sale_ID | Date | Region | Amount

101 | 2022-01-15 | South | 5000
102 | 2022-05-20 | North | 7000
103 | 2023-02-10 | East | 6000
104 | 2023-07-25 | West | 8000
105 | 2022-01-15 | South | 5000
106 | 2022-07-15 | South | 5000
107 | 2022-05-15 | South | 5000
108 | 2022-02-15 | South | 5000
109 | 2022-07-15 | South | 5000

Partition the records based on the month extracted from the Date column.

Solution:
Extract the month from each date.

January entries: 101, 105
February entries: 103, 108
May entries: 102, 107
July entries: 104, 106, 109

Therefore, the partitions are:

P1 (Month = 1, January): 101, 105
P2 (Month = 2, February): 103, 108
P3 (Month = 5, May): 102, 107
P4 (Month = 7, July): 104, 106, 109

This is partitioning based on a value derived from the date field.


8. Storage Requirement Calculation

Problem:
A company stores 2 GB of data per day. Data retention period is 3 years. Find the total storage required.

Solution:
Data stored per day = 2 GB

Retention period = 3 years

Total number of days =
3 × 365 = 1095 days

Total storage required =
1095 × 2 = 2190 GB

Using decimal conversion:

1 TB = 1000 GB

Therefore:

2190 GB = 2.19 TB

Hence, the total storage required is:

2190 GB or approximately 2.19 TB

If binary-style conversion is used:

2190 ÷ 1024 = 2.14 TB approximately

So under binary interpretation, it is about 2.14 TB.


9. Linear Growth from 1 GB in Year 1 to 2 TB in Year 10

Problem:
At the end of Year 1, data is 1 GB. At the end of Year 10, data is 2 TB. Find the yearly increment, assuming linear growth.

Solution:
Using decimal conversion:

2 TB = 2000 GB

Initial value at Year 1 = 1 GB
Final value at Year 10 = 2000 GB

Total increase =
2000 − 1 = 1999 GB

This increase takes place over:

10 − 1 = 9 years

Therefore, yearly increment =
1999 ÷ 9 = 222.11 GB approximately

Hence, the yearly increment is:

222.11 GB per year approximately

If 1 TB = 1024 GB is used, then:

2 TB = 2048 GB

Increase =
2048 − 1 = 2047 GB

Yearly increment =
2047 ÷ 9 = 227.44 GB approximately


10. Data After 5 Years and 7 Years Later

Problem:
At the end of Year 1, data is 1 GB. At the end of Year 10, data is 2 TB. Assuming linear growth, find the data size 5 years later and 7 years later from Year 10.

Solution:
Using decimal conversion:

2 TB = 2000 GB

From the previous result, yearly increment = 222.11 GB approximately

After 5 years from Year 10, that is Year 15:

Data at Year 15 =
2000 + (5 × 222.11)
= 2000 + 1110.55
= 3110.55 GB

Converting to TB:

3110.55 ÷ 1000 = 3.11 TB approximately

After 7 years from Year 10, that is Year 17:

Data at Year 17 =
2000 + (7 × 222.11)
= 2000 + 1554.77
= 3554.77 GB

Converting to TB:

3554.77 ÷ 1000 = 3.55 TB approximately

Therefore:

After 5 years: approximately 3.11 TB
After 7 years: approximately 3.55 TB


11. Construction of Leaf Pages in a B-Tree with Page Size 4

Problem:
Construct the leaf-level arrangement for the following names in a B-Tree, assuming page size = 4:

abe, arnold, james, gita, sam, ragul, siva, anand, john, jaini, rupa, raj, vivekanda, akhila, sidharth, prithvi, srishti

Solution:
First arrange the names in alphabetical order:

abe, akhila, anand, arnold, gita, jaini, james, john, prithvi, ragul, raj, rupa, sam, sidharth, siva, srishti, vivekanda

Since page size is 4, each leaf page can hold at most 4 entries.

Therefore, the leaf nodes are:

Leaf 1: abe, akhila, anand, arnold
Leaf 2: gita, jaini, james, john
Leaf 3: prithvi, ragul, raj, rupa
Leaf 4: sam, sidharth, siva, srishti
Leaf 5: vivekanda

This forms the bottom level of the B-Tree.


12. Inserting JOHN into a Full Leaf Node in a B-Tree

Problem:
In the illustrated B-Tree, the leaf node contains:

HAIG, IGNAR, JONES, KUMAR

If the page size is 4, explain what happens when JOHN is inserted.

Solution:
The name JOHN belongs alphabetically between IGNAR and JONES.

So after insertion, the leaf would become:

HAIG, IGNAR, JOHN, JONES, KUMAR

This gives 5 entries, which exceeds the page size of 4.

Therefore, overflow occurs.

The leaf node must be split into two nodes. One possible split is:

Leaf 1: HAIG, IGNAR
Leaf 2: JOHN, JONES, KUMAR

Then a separator key, typically JOHN, is promoted to the immediate parent.

If the parent also overflows, splitting continues upward. This is how the B-Tree maintains balance.


13. Identification of Immediate Parent in a B-Tree

Problem:
In the given B-Tree, identify the immediate parent of the leaf node containing:

HAIG, IGNAR, JONES, KUMAR

Solution:
The immediate parent is the internal node directly above this leaf node.

That node contains:

A-D, E-G, H-K

Hence, the immediate parent of the leaf node HAIG, IGNAR, JONES, KUMAR is:

A-D, E-G, H-K

This parent node is the first node to be updated if the leaf splits.


14. Levels of the B-Tree

Problem:
Identify the levels of the B-Tree in bottom-up form.

Solution:
In bottom-up interpretation:

Level I: Leaf level
This contains the actual indexed values and pointers to data rows.

Level II: Intermediate or internal level
This contains separator ranges that guide the search.

Level III: Root level
This is the topmost node and starting point of search.

Therefore, in bottom-up order:

Level I = Leaf level
Level II = Intermediate level
Level III = Root level


15. Meaning of Dynamic Nature of B-Tree

Problem:
Explain what is meant by saying that the B-Tree is dynamic in nature.

Solution:
A B-Tree is dynamic because it automatically adjusts its structure when data is inserted or deleted.

When a node becomes full, it is split.
When nodes become too empty after deletion, they may be merged or redistributed.

These changes ensure that:

  • the tree remains balanced
  • all leaf nodes stay at the same level
  • search and update operations remain efficient

Thus, a B-Tree grows and shrinks dynamically according to changes in data.


16. Selectivity in Data Warehouse Columns

Problem:
What can be said about the selectivity of data in a data warehouse? Are most columns highly selective?

Solution:
In a data warehouse, most columns are not highly selective.

This is because many warehouse columns contain repeated business values such as:

  • region
  • gender
  • department
  • category
  • month
  • status

These values appear in many rows, so queries on them often return large numbers of records.

Highly selective columns are usually:

  • primary key
  • order ID
  • customer ID
  • transaction ID

Hence, the correct conclusion is:

Most columns in a data warehouse are not highly selective. Primary keys and unique identifiers are the main highly selective columns.


17. Duplicate Values in a B-Tree Index

Problem:
Are duplicate values allowed in a B-Tree index?

Solution:
Duplicate values are not allowed only when the B-Tree index is unique, such as a primary key index.

In a non-unique B-Tree index, duplicate values are allowed.

For example:

If a B-Tree index is built on Customer_ID as primary key, duplicates are not allowed.

If a B-Tree index is built on Region, values such as South may appear many times, so duplicates are allowed.

Hence, the correct statement is:

Duplicates are not allowed in a unique B-Tree index, but duplicates are allowed in a non-unique B-Tree index.

If you want, I will now turn this into a more polished single continuous exam notes document with a proper title, short introduction, and better subheading flow.

Leave a Reply