1. Range Partitioning Based on Year
Problem:
Consider the SALES table:
Sale_ID | Date | Region | Amount
101 | 2022-01-15 | South | 5000102 | 2022-05-20 | North | 7000103 | 2023-02-10 | East | 6000104 | 2023-07-25 | West | 8000
Partition the table based on the year.
Solution:
The year is extracted from the Date column.
Records belonging to year 2022 are: 101, 102
Records belonging to year 2023 are: 103, 104
Therefore, the partitions are:
P1 (2022 Data): 101, 102
P2 (2023 Data): 103, 104
This is called range partitioning, because the records are divided according to a range of year values.
2. List Partitioning Based on Region
Problem:
Consider the CUSTOMER table:
Cust_ID | Name | Region
1 | Ravi | South2 | John | North3 | Asha | East4 | Meena | South
Partition the table based on the Region column.
Solution:
Rows are grouped according to the category values in the Region column.
Customers in South region are: 1, 4
Customers in North region are: 2
Customers in East region are: 3
Therefore, the partitions are:
P1 (South Region): 1, 4
P2 (North Region): 2
P3 (East Region): 3
This is called list partitioning, because partitioning is based on specific category values.
3. List Partitioning Based on Customer Name
Problem:
Consider the CUSTOMER table:
Cust_ID | Name | Region
1 | Ravi | South2 | John | North3 | Asha | East4 | Meena | South5 | John | North6 | John | North7 | John | North8 | Asha | East9 | Asha | East10 | Ravi | East
Partition the table based on the Name column.
Solution:
Rows having the same customer name are grouped into the same partition.
For Ravi: 1, 10
For John: 2, 5, 6, 7
For Asha: 3, 8, 9
For Meena: 4
Therefore, the partitions are:
P1 (Ravi): 1, 10
P2 (John): 2, 5, 6, 7
P3 (Asha): 3, 8, 9
P4 (Meena): 4
This is again list partitioning, because grouping is done using fixed values of the Name field.
4. Hash Partitioning on Order_ID Using Odd and Even Rule
Problem:
Consider the ORDERS table:
Order_ID | Customer | Amount
201 | A | 1000202 | B | 2000203 | C | 1500204 | D | 2500
Partition the table by applying hashing on Order_ID using the odd-even rule.
Solution:
Even order IDs go into one partition, and odd order IDs go into another partition.
Even Order_ID values are: 202, 204
Odd Order_ID values are: 201, 203
Therefore, the partitions are:
P1 (Even): 202, 204
P2 (Odd): 201, 203
This is a simple example of hash-style partitioning based on a derived rule.
5. Hashing with Respect to Amount Divided by 100
Problem:
Consider the ORDERS table:
Order_ID | Customer | Amount
201 | A | 1000202 | B | 2000203 | C | 1500204 | D | 2500
Partition the table by considering whether Amount ÷ 100 is odd or even.
Solution:
First divide each amount by 100.
For 201: 1000 ÷ 100 = 10
For 202: 2000 ÷ 100 = 20
For 203: 1500 ÷ 100 = 15
For 204: 2500 ÷ 100 = 25
Now check odd and even:
Even results: 10, 20
Odd results: 15, 25
So the partitions are:
P1 (Even result): 201, 202
P2 (Odd result): 203, 204
6. Composite Partitioning by Year and Region
Problem:
Consider the SALES table:
Sale_ID | Year | Region | Amount
301 | 2023 | South | 5000302 | 2023 | North | 7000303 | 2024 | South | 6000
Partition the table using composite partitioning by Year and Region.
Solution:
In composite partitioning, the table is first partitioned using one attribute and then sub-partitioned using another attribute.
First partition by Year:
For year 2023: 301, 302
For year 2024: 303
Then within each year, partition by Region.
Therefore:
P1 (Year 2023)
SP1 (South): 301
SP2 (North): 302
P2 (Year 2024)
SP1 (South): 303
This is called composite partitioning because two partitioning criteria are applied.
7. Partitioning Based on Month Extracted from Date
Problem:
Consider the SALES table:
Sale_ID | Date | Region | Amount
101 | 2022-01-15 | South | 5000102 | 2022-05-20 | North | 7000103 | 2023-02-10 | East | 6000104 | 2023-07-25 | West | 8000105 | 2022-01-15 | South | 5000106 | 2022-07-15 | South | 5000107 | 2022-05-15 | South | 5000108 | 2022-02-15 | South | 5000109 | 2022-07-15 | South | 5000
Partition the records based on the month extracted from the Date column.
Solution:
Extract the month from each date.
January entries: 101, 105
February entries: 103, 108
May entries: 102, 107
July entries: 104, 106, 109
Therefore, the partitions are:
P1 (Month = 1, January): 101, 105
P2 (Month = 2, February): 103, 108
P3 (Month = 5, May): 102, 107
P4 (Month = 7, July): 104, 106, 109
This is partitioning based on a value derived from the date field.
8. Storage Requirement Calculation
Problem:
A company stores 2 GB of data per day. Data retention period is 3 years. Find the total storage required.
Solution:
Data stored per day = 2 GB
Retention period = 3 years
Total number of days =3 × 365 = 1095 days
Total storage required =1095 × 2 = 2190 GB
Using decimal conversion:
1 TB = 1000 GB
Therefore:
2190 GB = 2.19 TB
Hence, the total storage required is:
2190 GB or approximately 2.19 TB
If binary-style conversion is used:
2190 ÷ 1024 = 2.14 TB approximately
So under binary interpretation, it is about 2.14 TB.
9. Linear Growth from 1 GB in Year 1 to 2 TB in Year 10
Problem:
At the end of Year 1, data is 1 GB. At the end of Year 10, data is 2 TB. Find the yearly increment, assuming linear growth.
Solution:
Using decimal conversion:
2 TB = 2000 GB
Initial value at Year 1 = 1 GB
Final value at Year 10 = 2000 GB
Total increase =2000 − 1 = 1999 GB
This increase takes place over:
10 − 1 = 9 years
Therefore, yearly increment =1999 ÷ 9 = 222.11 GB approximately
Hence, the yearly increment is:
222.11 GB per year approximately
If 1 TB = 1024 GB is used, then:
2 TB = 2048 GB
Increase =2048 − 1 = 2047 GB
Yearly increment =2047 ÷ 9 = 227.44 GB approximately
10. Data After 5 Years and 7 Years Later
Problem:
At the end of Year 1, data is 1 GB. At the end of Year 10, data is 2 TB. Assuming linear growth, find the data size 5 years later and 7 years later from Year 10.
Solution:
Using decimal conversion:
2 TB = 2000 GB
From the previous result, yearly increment = 222.11 GB approximately
After 5 years from Year 10, that is Year 15:
Data at Year 15 =2000 + (5 × 222.11)= 2000 + 1110.55= 3110.55 GB
Converting to TB:
3110.55 ÷ 1000 = 3.11 TB approximately
After 7 years from Year 10, that is Year 17:
Data at Year 17 =2000 + (7 × 222.11)= 2000 + 1554.77= 3554.77 GB
Converting to TB:
3554.77 ÷ 1000 = 3.55 TB approximately
Therefore:
After 5 years: approximately 3.11 TB
After 7 years: approximately 3.55 TB
11. Construction of Leaf Pages in a B-Tree with Page Size 4
Problem:
Construct the leaf-level arrangement for the following names in a B-Tree, assuming page size = 4:
abe, arnold, james, gita, sam, ragul, siva, anand, john, jaini, rupa, raj, vivekanda, akhila, sidharth, prithvi, srishti
Solution:
First arrange the names in alphabetical order:
abe, akhila, anand, arnold, gita, jaini, james, john, prithvi, ragul, raj, rupa, sam, sidharth, siva, srishti, vivekanda
Since page size is 4, each leaf page can hold at most 4 entries.
Therefore, the leaf nodes are:
Leaf 1: abe, akhila, anand, arnold
Leaf 2: gita, jaini, james, john
Leaf 3: prithvi, ragul, raj, rupa
Leaf 4: sam, sidharth, siva, srishti
Leaf 5: vivekanda
This forms the bottom level of the B-Tree.
12. Inserting JOHN into a Full Leaf Node in a B-Tree
Problem:
In the illustrated B-Tree, the leaf node contains:
HAIG, IGNAR, JONES, KUMAR
If the page size is 4, explain what happens when JOHN is inserted.
Solution:
The name JOHN belongs alphabetically between IGNAR and JONES.
So after insertion, the leaf would become:
HAIG, IGNAR, JOHN, JONES, KUMAR
This gives 5 entries, which exceeds the page size of 4.
Therefore, overflow occurs.
The leaf node must be split into two nodes. One possible split is:
Leaf 1: HAIG, IGNAR
Leaf 2: JOHN, JONES, KUMAR
Then a separator key, typically JOHN, is promoted to the immediate parent.
If the parent also overflows, splitting continues upward. This is how the B-Tree maintains balance.
13. Identification of Immediate Parent in a B-Tree
Problem:
In the given B-Tree, identify the immediate parent of the leaf node containing:
HAIG, IGNAR, JONES, KUMAR
Solution:
The immediate parent is the internal node directly above this leaf node.
That node contains:
A-D, E-G, H-K
Hence, the immediate parent of the leaf node HAIG, IGNAR, JONES, KUMAR is:
A-D, E-G, H-K
This parent node is the first node to be updated if the leaf splits.
14. Levels of the B-Tree
Problem:
Identify the levels of the B-Tree in bottom-up form.
Solution:
In bottom-up interpretation:
Level I: Leaf level
This contains the actual indexed values and pointers to data rows.
Level II: Intermediate or internal level
This contains separator ranges that guide the search.
Level III: Root level
This is the topmost node and starting point of search.
Therefore, in bottom-up order:
Level I = Leaf level
Level II = Intermediate level
Level III = Root level
15. Meaning of Dynamic Nature of B-Tree
Problem:
Explain what is meant by saying that the B-Tree is dynamic in nature.
Solution:
A B-Tree is dynamic because it automatically adjusts its structure when data is inserted or deleted.
When a node becomes full, it is split.
When nodes become too empty after deletion, they may be merged or redistributed.
These changes ensure that:
- the tree remains balanced
- all leaf nodes stay at the same level
- search and update operations remain efficient
Thus, a B-Tree grows and shrinks dynamically according to changes in data.
16. Selectivity in Data Warehouse Columns
Problem:
What can be said about the selectivity of data in a data warehouse? Are most columns highly selective?
Solution:
In a data warehouse, most columns are not highly selective.
This is because many warehouse columns contain repeated business values such as:
- region
- gender
- department
- category
- month
- status
These values appear in many rows, so queries on them often return large numbers of records.
Highly selective columns are usually:
- primary key
- order ID
- customer ID
- transaction ID
Hence, the correct conclusion is:
Most columns in a data warehouse are not highly selective. Primary keys and unique identifiers are the main highly selective columns.
17. Duplicate Values in a B-Tree Index
Problem:
Are duplicate values allowed in a B-Tree index?
Solution:
Duplicate values are not allowed only when the B-Tree index is unique, such as a primary key index.
In a non-unique B-Tree index, duplicate values are allowed.
For example:
If a B-Tree index is built on Customer_ID as primary key, duplicates are not allowed.
If a B-Tree index is built on Region, values such as South may appear many times, so duplicates are allowed.
Hence, the correct statement is:
Duplicates are not allowed in a unique B-Tree index, but duplicates are allowed in a non-unique B-Tree index.
If you want, I will now turn this into a more polished single continuous exam notes document with a proper title, short introduction, and better subheading flow.
Leave a Reply