ER modeling, normalization, transactions, concurrency control and storage/indexing concepts.
| Concept | Description | Example |
|---|---|---|
| Entity | Real-world object with independent existence | Student, Employee, Course |
| Entity Set | Collection of similar entities | All Students, All Courses |
| Weak Entity | Cannot exist without an owner entity; has partial key | Dependent (relies on Employee) |
| Strong Entity | Exists independently with a primary key | Employee (emp_id) |
| Attribute | Property that describes an entity | name, age, salary |
| Key Attribute | Uniquely identifies each entity | emp_id, roll_no |
| Composite Attribute | Attribute made of sub-attributes | Full_Name(First, Middle, Last) |
| Multi-valued | Attribute with multiple values | Phone_Numbers, Email_Addresses |
| Derived Attribute | Computed from other attributes | Age (from DOB), GPA (from marks) |
| Relationship | Association among entities | Works_In, Enrolls_In, Manages |
| Participation | How entities participate in a relationship | Total (must) vs Partial (may) |
| Cardinality | Number of entity occurrences in relationship | 1:1, 1:N, M:N |
| Type | Notation | Example |
|---|---|---|
| Simple | Oval (single line) | age, gender |
| Composite | Oval with sub-ovals | Address(street, city, zip) |
| Multi-valued | Double oval | phone_numbers, hobbies |
| Derived | Dashed oval | age (from DOB) |
| Key | Underlined text in oval | emp_id |
| Stored | Solid oval | date_of_birth |
| Cardinality | Notation | Example |
|---|---|---|
| 1:1 | Line with no arrowheads | Person HAS Passport |
| 1:N (One-to-Many) | Arrow from One to Many | Department HAS Employees |
| M:N (Many-to-Many) | Diamond connecting both | Students TAKE Courses |
| Total Participation | Double line to entity | Every Employee MUST belong to a Dept |
| Partial Participation | Single line to entity | Some Employees may manage a Dept |
| Symbol | Represents | Description |
|---|---|---|
| Rectangle | Entity / Entity Set | Strong entity with independent existence |
| Double Rectangle | Weak Entity Set | Depends on identifying/owner entity |
| Diamond | Relationship Set | Association between entities |
| Double Diamond | Identifying Relationship | Links weak entity to owner entity |
| Oval | Attribute | Property of an entity or relationship |
| Double Oval | Multi-valued Attribute | Can hold multiple values (e.g., phones) |
| Dashed Oval | Derived Attribute | Value computed from other attributes |
| Underlined Oval | Key Attribute | Uniquely identifies entity instance |
| Double Line | Total Participation | Every entity must participate |
| Arrow / Directed Line | Role / Cardinality | Indicates one side of 1:N relationship |
| Dashed Line | Weak Entity Link | Connects weak entity to identifying owner |
╔══════════════════════════════════════════════════════════╗
║ ER Diagram Example: University Database ║
║ ║
║ ┌──────────┐ ┌──────────┐ ┌──────────┐ ║
║ │ Student │──────│ Enrolls │──────│ Course │ ║
║ └──────────┘ └──────────┘ └──────────┘ ║
║ (roll_no) PK (course_id) PK ║
║ (name) (title) ║
║ (phone) [multi] (credits) ║
║ (age) [derived] (dept_name) FK ║
║ ║
║ ┌──────────┐ ┌──────────┐ ║
║ │ Dept │──────│ Has │──┐ ║
║ └──────────┘ └──────────┘ │ ║
║ (dept_id) PK │ ║
║ (d_name) v ║
║ (location) ┌──────────┐ ║
║ (budget) │Instructor│ ║
║ └──────────┘ ║
║ Cardinalities: (emp_id) PK ║
║ Student-Enrolls: M:N (name) ║
║ Dept-Has-Instructor: 1:N (salary) ║
║ Instructor-Teaches: M:N ║
╚══════════════════════════════════════════════════════════╝| Type | Description | Example |
|---|---|---|
| Generalization | Bottom-up approach: identify common features from specific entities | Employee {Employee + Manager + Engineer} -> Person |
| Specialization | Top-down approach: break a general entity into specific ones | Account -> Savings_Account, Current_Account |
| Overlapping | Entity can belong to multiple specializations | Person can be both Student AND Employee |
| Disjoint | Entity belongs to exactly one specialization | Customer is EITHER Individual OR Corporate |
| Total | Every entity must belong to at least one specialization | Every Account must be Savings or Current |
| Partial | Some entities may not belong to any specialization | Some People are neither Student nor Employee |
Dependent(partial_key: dependent_name) + Employee(emp_id) -> PK = (emp_id, dependent_name).| Key Type | Definition | Example |
|---|---|---|
| Super Key | Set of attributes that uniquely identifies a tuple | {emp_id}, {emp_id, name}, {emp_id, dept, age} |
| Candidate Key | Minimal super key (no proper subset is a super key) | {emp_id}, {email} (if email is unique) |
| Primary Key | Chosen candidate key; identifies tuples uniquely | emp_id (chosen from candidate keys) |
| Foreign Key | References primary key of another table | dept_id in Employee refs Dept(dept_id) |
| Composite Key | Key consisting of 2+ attributes | {emp_id, project_id} in Works_On |
| Alternate Key | Candidate keys not chosen as primary key | email (when emp_id is PK) |
| Surrogate Key | System-generated artificial key | auto-increment id columns |
Candidate Key ⊊ Super Key: Every candidate key is a super key, but not vice-versa. A super key may have extra (redundant) attributes. A candidate key is minimal.
Primary Key ⊊ Candidate Key: The primary key is one specific candidate key chosen for the table. All other candidate keys become alternate keys.
Foreign Key: Creates a relationship between two tables. May or may not be unique. Can be NULL (unless NOT NULL is specified). Must match a value in the referenced primary key or be NULL.
| Operation | Symbol | Description | Example |
|---|---|---|---|
| Select | σ (sigma) | Returns rows matching condition | σₐgₑ>30(Student) |
| Project | π (pi) | Returns specific columns | πname,salary(Employee) |
| Union | ∪ | All tuples from both relations (no dupes) | R ∪ S |
| Intersection | ∩ | Common tuples in both relations | R ∩ S |
| Difference | − | Tuples in R but not in S | R − S |
| Cartesian Product | × | All combinations of tuples | R × S |
| Join | ⋈ (bowtie) | Combine tuples matching a condition | R ⋈ R.dept=S.dept S |
| Natural Join | ⋈* | Join on all common attributes (auto-match) | R ⋈* S |
| Left Outer Join | ⟕⋈ | All from R + matching from S (NULL fill) | R ⟕⋈ S |
| Right Outer Join | ⋈⟖ | All from S + matching from R (NULL fill) | R ⋈⟖ S |
| Full Outer Join | ⟗⋈ | All tuples from both (NULL fill) | R ⟗⋈ S |
| Rename | ρ (rho) | Rename a relation or attributes | ρₓ(Emp)(Employee) |
-- Example Tables
-- Student(roll_no, name, age, dept)
-- Course(course_id, title, credits, dept)
-- Enrollment(roll_no, course_id, grade)
-- 1. SELECT: Find students older than 20
-- σage>20(Student)
-- 2. PROJECT: Get names and departments of all students
-- πname,dept(Student)
-- 3. SELECT + PROJECT: Names of CS students
-- πname(σdept='CS'(Student))
-- 4. NATURAL JOIN: Students with their enrollments
-- Student ⋈* Enrollment
-- 5. EQUI-JOIN: Students enrolled in 'DBMS'
-- πname(σtitle='DBMS'(Course ⋈ Course.course_id=Enrollment.course_id Enrollment))
-- Or using Natural Join:
-- πname(Student ⋈* Enrollment ⋈* (σtitle='DBMS'(Course)))
-- 6. UNION: All departments from Student and Course
-- πdept(Student) ∪ πdept(Course)
-- 7. INTERSECTION: Depts that have both students and courses
-- πdept(Student) ∩ πdept(Course)
-- 8. DIFFERENCE: Depts with students but no courses
-- πdept(Student) − πdept(Course)
-- 9. CARTESIAN PRODUCT: Every student with every course
-- Student × Course
-- 10. LEFT OUTER JOIN: All students, with course info if enrolled
-- Student ⟕⋈ Enrollment
-- 11. DIVISION: Students enrolled in ALL courses
-- πroll_no,course_id(Enrollment) ÷ πcourse_id(Course)GROUP BY ... HAVING COUNT(*) = (SELECT COUNT(*) FROM Course).| Sub-language | Purpose | Commands |
|---|---|---|
| DDL | Define/modify DB structure | CREATE, ALTER, DROP, TRUNCATE, RENAME |
| DML | Manipulate data | INSERT, UPDATE, DELETE |
| DQL | Query data | SELECT |
| DCL | Control access/permissions | GRANT, REVOKE |
| TCL | Manage transactions | COMMIT, ROLLBACK, SAVEPOINT |
| Type | Description | Example |
|---|---|---|
| INT / INTEGER | Whole numbers | 42 |
| FLOAT / DOUBLE | Decimal numbers | 3.14 |
| DECIMAL(p,s) | Fixed-point number | DECIMAL(10,2) = 99999999.99 |
| VARCHAR(n) | Variable-length string | VARCHAR(255) |
| CHAR(n) | Fixed-length string | CHAR(10) |
| DATE | Date (YYYY-MM-DD) | 2024-01-15 |
| TIMESTAMP | Date + time | 2024-01-15 10:30:00 |
| BOOLEAN | True/False | TRUE |
| TEXT / BLOB | Large text / binary data | Article body |
| ENUM | Set of allowed values | ENUM('active','inactive') |
-- ═══ DDL: Data Definition Language ═══
-- CREATE TABLE with all constraints
CREATE TABLE Employee (
emp_id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
age INT CHECK (age >= 18),
salary DECIMAL(12,2) DEFAULT 50000.00,
dept_id INT REFERENCES Department(dept_id),
hire_date DATE DEFAULT CURRENT_DATE,
CONSTRAINT chk_salary CHECK (salary > 0)
);
-- CREATE TABLE with composite primary key
CREATE TABLE Works_On (
emp_id INT REFERENCES Employee(emp_id),
project_id INT REFERENCES Project(project_id),
hours INT CHECK (hours > 0),
PRIMARY KEY (emp_id, project_id)
);
-- ALTER TABLE
ALTER TABLE Employee ADD COLUMN phone VARCHAR(20);
ALTER TABLE Employee ALTER COLUMN salary TYPE DECIMAL(15,2);
ALTER TABLE Employee DROP COLUMN phone;
ALTER TABLE Employee ADD CONSTRAINT uk_email UNIQUE (email);
ALTER TABLE Employee DROP CONSTRAINT chk_salary;
-- DROP TABLE
DROP TABLE Employee; -- fails if referenced
DROP TABLE Employee CASCADE; -- drops dependents too
DROP TABLE IF EXISTS Employee;
-- TRUNCATE (faster than DELETE, resets identity)
TRUNCATE TABLE Employee;-- ═══ DML: Data Manipulation Language ═══
-- INSERT
INSERT INTO Employee (emp_id, name, email, salary, dept_id)
VALUES (1, 'Alice', 'alice@co.com', 75000, 101);
INSERT INTO Employee VALUES
(2, 'Bob', 'bob@co.com', 60000, 102),
(3, 'Carol', 'carol@co.com', 85000, 101);
INSERT INTO Employee (name, email)
VALUES ('Dave', 'dave@co.com'); -- salary uses DEFAULT
-- INSERT from SELECT
INSERT INTO Archived_Employee SELECT * FROM Employee WHERE hire_date < '2020-01-01';
-- UPDATE
UPDATE Employee SET salary = salary * 1.10 WHERE dept_id = 101;
UPDATE Employee SET salary = 90000, dept_id = 103 WHERE emp_id = 2;
-- UPDATE with subquery
UPDATE Employee SET dept_id = (
SELECT dept_id FROM Department WHERE dept_name = 'Marketing'
) WHERE emp_id = 3;
-- DELETE
DELETE FROM Employee WHERE emp_id = 99;
DELETE FROM Employee WHERE salary < 30000;
-- DELETE with subquery
DELETE FROM Enrollment
WHERE student_id IN (
SELECT emp_id FROM Employee WHERE dept_id IS NULL
);-- ═══ DQL: Data Query Language ═══
-- Basic SELECT
SELECT name, salary FROM Employee WHERE dept_id = 101;
-- DISTINCT
SELECT DISTINCT dept_id FROM Employee;
-- ORDER BY
SELECT * FROM Employee ORDER BY salary DESC, name ASC;
-- LIMIT / OFFSET (pagination)
SELECT * FROM Employee ORDER BY emp_id LIMIT 10 OFFSET 20;
-- Aggregate Functions
SELECT
COUNT(*) AS total_emp,
COUNT(DISTINCT dept_id) AS num_depts,
SUM(salary) AS total_salary,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary
FROM Employee;
-- GROUP BY
SELECT dept_id, COUNT(*) AS emp_count, AVG(salary) AS avg_sal
FROM Employee
GROUP BY dept_id
HAVING COUNT(*) > 5
ORDER BY avg_sal DESC;
-- JOINs
-- INNER JOIN: matching rows from both tables
SELECT e.name, d.dept_name FROM Employee e
INNER JOIN Department d ON e.dept_id = d.dept_id;
-- LEFT JOIN: all from left + matching from right
SELECT e.name, d.dept_name FROM Employee e
LEFT JOIN Department d ON e.dept_id = d.dept_id;
-- RIGHT JOIN: all from right + matching from left
SELECT e.name, d.dept_name FROM Employee e
RIGHT JOIN Department d ON e.dept_id = d.dept_id;
-- FULL OUTER JOIN: all from both
SELECT e.name, d.dept_name FROM Employee e
FULL OUTER JOIN Department d ON e.dept_id = d.dept_id;
-- SELF JOIN: join table with itself
SELECT e1.name AS employee, e2.name AS manager
FROM Employee e1
LEFT JOIN Employee e2 ON e1.manager_id = e2.emp_id;
-- CROSS JOIN: Cartesian product
SELECT * FROM Employee CROSS JOIN Department;
-- SUBQUERY in WHERE
SELECT * FROM Employee
WHERE salary > (SELECT AVG(salary) FROM Employee);
-- Correlated Subquery
SELECT * FROM Employee e
WHERE salary > (SELECT AVG(salary) FROM Employee WHERE dept_id = e.dept_id);
-- EXISTS
SELECT * FROM Department d
WHERE EXISTS (
SELECT 1 FROM Employee e WHERE e.dept_id = d.dept_id
);
-- NOT EXISTS (division-like)
SELECT s.name FROM Student s
WHERE NOT EXISTS (
SELECT c.course_id FROM Course c
WHERE NOT EXISTS (
SELECT 1 FROM Enrollment e
WHERE e.student_id = s.roll_no AND e.course_id = c.course_id
)
);
-- IN / ANY / ALL
SELECT * FROM Employee WHERE dept_id IN (101, 102, 103);
SELECT * FROM Employee WHERE salary > ANY (SELECT salary FROM Manager);
SELECT * FROM Employee WHERE salary > ALL (SELECT salary FROM Intern);
-- UNION / INTERSECT / EXCEPT
SELECT name FROM Employee
UNION
SELECT name FROM Manager;
-- CASE expression
SELECT name, salary,
CASE
WHEN salary > 100000 THEN 'Senior'
WHEN salary > 60000 THEN 'Mid-Level'
ELSE 'Junior'
END AS level
FROM Employee;
-- WINDOW Functions
SELECT name, dept_id, salary,
RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank,
SUM(salary) OVER (PARTITION BY dept_id) AS dept_total,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS global_rank
FROM Employee;-- ═══ VIEWS ═══
CREATE VIEW EmpView AS
SELECT e.name, d.dept_name, e.salary
FROM Employee e JOIN Department d ON e.dept_id = d.dept_id
WHERE e.salary > 50000;
CREATE VIEW DeptStats AS
SELECT dept_id, COUNT(*) AS emp_count, AVG(salary) AS avg_sal
FROM Employee GROUP BY dept_id;
-- Updatable View: must reference one table, no aggregate/group by
CREATE VIEW EmpSalary AS
SELECT name, salary FROM Employee WHERE dept_id = 101;
INSERT INTO EmpSalary VALUES ('New', 60000); -- inserts into Employee
-- ═══ INDEXES ═══
CREATE INDEX idx_emp_dept ON Employee(dept_id);
CREATE UNIQUE INDEX idx_emp_email ON Employee(email);
CREATE INDEX idx_emp_name ON Employee(name, salary); -- composite
DROP INDEX idx_emp_dept;
-- Index types:
-- B-Tree (default): good for range queries
-- Hash: good for exact match (no range)
-- GIN: good for full-text search, arrays (PostgreSQL)
-- ═══ CONSTRAINTS ═══
-- Column-level
CREATE TABLE T (
id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
age INT CHECK (age >= 0),
dept_id INT REFERENCES Dept(dept_id)
ON DELETE CASCADE
ON UPDATE SET NULL
);
-- Table-level constraints
CREATE TABLE Enrollment (
student_id INT,
course_id INT,
grade CHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES Student(roll_no),
FOREIGN KEY (course_id) REFERENCES Course(course_id),
CHECK (grade IN ('A','B','C','D','F'))
);
-- Referential Actions:
-- ON DELETE/UPDATE: CASCADE | SET NULL | SET DEFAULT | RESTRICT | NO ACTIONVIEW is a stored query (virtual table) — always shows latest data. A MATERIALIZED VIEW stores the result physically — faster reads but must be REFRESHed to stay current. Use materialized views for complex, expensive queries that don't need real-time data.| Concept | Definition | Example |
|---|---|---|
| Functional Dependency (FD) | X -> Y means Y is determined by X | roll_no -> name, dept_id |
| Full FD | X -> Y, no proper subset of X determines Y | (roll_no, course_id) -> grade |
| Partial FD | X -> Y where only part of X determines Y | (roll_no, course_id) -> name (roll_no alone determines name) |
| Transitive FD | X -> Y and Y -> Z, so X -> Z indirectly | emp_id -> dept_id, dept_id -> dept_name => emp_id -> dept_name |
| Closure (X+) | Set of all attributes functionally determined by X | If A->B, B->C, then (A)+ = {A,B,C} |
| Minimal Cover | Simplified equivalent set of FDs | Remove redundant FDs, extraneous attributes |
| Lossless Join | Joining decomposed tables recovers original exactly | R1 ⋈ R2 = R with no spurious tuples |
| Dependency Preservation | All original FDs checkable on decomposed tables | Every FD in F+ is in closure of union of F_i |
| Prime Attribute | Part of any candidate key | Attributes in at least one candidate key |
| Non-prime Attribute | Not part of any candidate key | Attributes not in any candidate key |
| Normal Form | Condition | Handles |
|---|---|---|
| 1NF | All attributes have atomic (indivisible) values; no repeating groups | Multi-valued attributes, composite cells |
| 2NF | In 1NF + no partial dependencies (no non-prime partially dependent on PK) | Partial dependencies on composite keys |
| 3NF | In 2NF + no transitive dependencies (non-prime -> non-prime) | Transitive dependencies |
| BCNF | For every FD X->A, X must be a super key (stronger than 3NF) | All anomalies, even with prime attributes |
| 4NF | In BCNF + no multi-valued dependencies (independent multi-valued attributes) | Multi-valued dependencies |
| 5NF | In 4NF + no join dependencies (cannot be decomposed further without loss | Join dependencies |
╔══════════════════════════════════════════════════════════════╗
║ STEP-BY-STEP NORMALIZATION EXAMPLE ║
║ Scenario: Student Course Registration ║
╠══════════════════════════════════════════════════════════════╣
UN-NORMALIZED TABLE:
┌─────────────────────────────────────────────────────────────┐
│ Student(roll_no, name, dept, dept_head, courses_taken) │
│ │
│ courses_taken = [(course_id, title, instructor)] │
│ This is a REPEATING GROUP (non-atomic) │
└─────────────────────────────────────────────────────────────┘
Sample Data (0NF):
┌────────┬────────┬──────┬──────────┬──────────────────────────┐
│roll_no │ name │ dept │ dept_head│ courses_taken │
├────────┼────────┼──────┼──────────┼──────────────────────────┤
│ 101 │ Alice │ CS │ Dr.Smith │(C1, DBMS, Prof.J), │
│ │ │ │ │(C2, OS, Prof.K) │
│ 102 │ Bob │ EE │ Dr.Patel │(C1, DBMS, Prof.J) │
└────────┴────────┴──────┴──────────┴──────────────────────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 1: Convert to 1NF ──────────────────────────────────────
Rule: Eliminate repeating groups; make all cells atomic.
Each row must have a single value per column.
Result (1NF):
┌────────┬────────┬──────┬──────────┬────────┬──────────┬───────────┐
│roll_no │ name │ dept │ dept_head│course_id│ title │ instructor│
├────────┼────────┼──────┼──────────┼────────┼──────────┼───────────┤
│ 101 │ Alice │ CS │ Dr.Smith │ C1 │ DBMS │ Prof.J │
│ 101 │ Alice │ CS │ Dr.Smith │ C2 │ OS │ Prof.K │
│ 102 │ Bob │ EE │ Dr.Patel │ C1 │ DBMS │ Prof.J │
└────────┴────────┴──────┴──────────┴────────┴──────────┴───────────┘
PK = (roll_no, course_id) [Composite Key]
FDs identified:
roll_no -> name, dept, dept_head (partial dependency!)
course_id -> title, instructor (partial dependency!)
dept -> dept_head (transitive dependency!)
Problems in 1NF: Redundancy, update anomalies, insertion anomalies
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 2: Convert to 2NF ──────────────────────────────────────
Rule: Remove partial dependencies.
Non-prime attributes must depend on the ENTIRE PK, not part of it.
Decomposition:
R1(roll_no, name, dept, dept_head) PK: roll_no
R2(course_id, title, instructor) PK: course_id
R3(roll_no, course_id) PK: (roll_no, course_id)
Result (2NF):
┌────────┬────────┬──────┬──────────┐
│roll_no │ name │ dept │ dept_head│
├────────┼────────┼──────┼──────────┤
│ 101 │ Alice │ CS │ Dr.Smith │
│ 102 │ Bob │ EE │ Dr.Patel │
└────────┴────────┴──────┴──────────┘
┌─────────┬──────────┬────────────┐
│course_id│ title │ instructor │
├─────────┼──────────┼────────────┤
│ C1 │ DBMS │ Prof.J │
│ C2 │ OS │ Prof.K │
└─────────┴──────────┴────────────┘
┌────────┬───────────┐
│roll_no │ course_id │
├────────┼───────────┤
│ 101 │ C1 │
│ 101 │ C2 │
│ 102 │ C1 │
└────────┴───────────┘
Remaining problem: In R1, dept_head depends on dept, NOT roll_no.
dept -> dept_head is a TRANSITIVE dependency.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 3: Convert to 3NF ──────────────────────────────────────
Rule: Remove transitive dependencies.
Non-prime attributes must depend ONLY on the primary key (directly).
Decompose R1:
R1a(roll_no, name, dept) PK: roll_no, FK: dept
R1b(dept, dept_head) PK: dept
Result (3NF):
┌────────┬────────┬──────┐ ┌────────┬──────────┐
│roll_no │ name │ dept │ │ dept │ dept_head│
├────────┼────────┼──────┤ ├────────┼──────────┤
│ 101 │ Alice │ CS │ │ CS │ Dr.Smith │
│ 102 │ Bob │ EE │ │ EE │ Dr.Patel │
└────────┴────────┴──────┘ └────────┴──────────┘
Final 3NF Schema:
Student(roll_no, name, dept) PK: roll_no
Department(dept, dept_head) PK: dept
Course(course_id, title, instructor) PK: course_id
Enrollment(roll_no, course_id) PK: (roll_no, course_id)
All tables in 3NF:
- No repeating groups (1NF) ✓
- No partial dependencies (2NF) ✓
- No transitive dependencies (3NF) ✓
- Lossless join decomposition ✓
- Dependency preserving ✓| Property | 3NF | BCNF |
|---|---|---|
| Definition | X->A: either X is super key OR A is prime | X->A: X MUST be a super key |
| Stricter | Less strict | More strict (stronger) |
| Dependency Preservation | Always achievable | May NOT be dependency preserving |
| Lossless Join | Always achievable | Always achievable |
| Redundancy | May have some redundancy | Zero redundancy |
| Example Violation | Consider R(A,B,C) with AB->C, C->B | Candidate keys: AB, AC. FD C->B: C is not super key, B is prime. Violates BCNF but NOT 3NF. |
╔══════════════════════════════════════════════════════════════╗
║ ATTRIBUTE CLOSURE & MINIMAL COVER ║
╠══════════════════════════════════════════════════════════════╣
ATTRIBUTE CLOSURE (X+):
Given R(A,B,C,D,E) and F = { A->B, B->C, AD->E }
Compute (A)+:
Start: (A)+ = { A }
A->B: (A)+ = { A, B }
B->C: (A)+ = { A, B, C }
No more apply. Final: { A, B, C }
Compute (AD)+:
Start: (AD)+ = { A, D }
A->B: (AD)+ = { A, B, D }
B->C: (AD)+ = { A, B, C, D }
AD->E: (AD)+ = { A, B, C, D, E }
Final: { A, B, C, D, E } = all attributes
Candidate Keys: AD is a super key. Is any subset? Check A -> no. Check D -> no.
So AD is the only candidate key. Prime attributes: A, D. Non-prime: B, C, E.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
MINIMAL COVER (Canonical Cover):
Given F = { AB->C, C->BE, B->E }
Step 1: Decompose RHS (split):
F = { AB->C, C->B, C->E, B->E }
Step 2: Remove redundant FDs:
- Remove AB->C: check if (AB)+ in remaining = {A,B}
C is not in closure. AB->C is NOT redundant. Keep.
- Remove C->B: check if (C)+ in remaining = {C,E}
B not in closure. C->B is NOT redundant. Keep.
- Remove C->E: check if (C)+ in remaining = {C,B}
E not in closure. C->E is NOT redundant. Keep.
- Remove B->E: check if (B)+ in remaining = {B}
E not in closure. B->E is NOT redundant. Keep.
Step 3: Remove extraneous attributes from LHS:
- AB->C: Is A extraneous? Check (B)+ with {B->E, C->B, C->E}
(B)+ = {B, E, C} (C->B adds B, then B->E adds E)
C is in (B)+, so A is extraneous. Replace AB->C with B->C.
Minimal Cover: { B->C, C->B, C->E, B->E }
Note: B<->C (both determine each other), C->E, B->E
Simplified further: { B->C, C->E, B->E } (since B->C and C->B means B and C are equivalent)╔══════════════════════════════════════════════════════════════╗
║ LOSSLESS JOIN DECOMPOSITION CHECK ║
╠══════════════════════════════════════════════════════════════╣
For R decomposed into R1 and R2:
Condition: R1 ∩ R2 -> R1 OR R1 ∩ R2 -> R2
That is, the common attributes between R1 and R2 must
determine ALL attributes of at least one relation.
Example: R(A,B,C,D,E) decomposed into:
R1(A,B,C) and R2(A,D,E)
R1 ∩ R2 = {A} (common attribute)
Check: A -> {A,B,C} (i.e., A is a key for R1)?
If A -> BC holds in F, then this is a lossless decomposition.
If A -> BC does NOT hold, check: A -> {A,D,E} (key for R2)?
If A -> DE holds in F, then it is lossless.
If NEITHER holds, the decomposition is LOSSY (not allowed).
For multi-table decomposition R = R1 ∪ R2 ∪ ... ∪ Rn:
Apply lossless check pairwise or use the common attribute
graph / chase algorithm.| ACID Property | Description | Example |
|---|---|---|
| Atomicity | Transaction is all-or-nothing; cannot be partially completed | Transfer $100: debit AND credit must both succeed or both fail |
| Consistency | Transaction takes DB from one valid state to another | Total balance before = after (debit + credit = 0) |
| Isolation | Concurrent transactions do not interfere with each other | Two users transferring money simultaneously see correct results |
| Durability | Once committed, changes survive system failures | After COMMIT, power failure does not lose the transaction |
| State | Description | Trigger |
|---|---|---|
| Active | Transaction executing normally | After BEGIN TRANSACTION |
| Partially Committed | Last statement executed, awaiting final commit | After last SQL statement |
| Committed | Transaction successfully completed | After COMMIT (survives failures) |
| Failed | Transaction cannot continue | Error during execution |
| Aborted | Transaction rolled back, changes undone | After ROLLBACK |
| Terminated | Transaction left the system | After commit or abort cleanup |
-- ═══ Transaction in SQL ═══
BEGIN TRANSACTION;
-- Debit Account A
UPDATE Account SET balance = balance - 100
WHERE account_id = 'A';
-- Credit Account B
UPDATE Account SET balance = balance + 100
WHERE account_id = 'B';
-- Check constraint: neither account should go negative
-- If balance would be negative, this triggers an error
-- and the transaction is rolled back
COMMIT; -- both updates permanent
-- If error occurs:
-- ROLLBACK; -- both updates undone
-- ═══ Savepoints ═══
BEGIN TRANSACTION;
UPDATE Account SET balance = balance - 50 WHERE account_id = 'A';
SAVEPOINT sp1;
UPDATE Account SET balance = balance - 30 WHERE account_id = 'A';
SAVEPOINT sp2;
-- If second debit was wrong, rollback to sp1 (keep first debit)
ROLLBACK TO sp1;
COMMIT; -- only first debit is permanent
-- ═══ Isolation Levels ═══
-- SET TRANSACTION ISOLATION LEVEL <level>;
-- READ UNCOMMITTED: dirty reads allowed (lowest isolation)
-- READ COMMITTED: only committed data visible (prevents dirty read)
-- REPEATABLE READ: same query returns same rows (prevents non-repeatable read)
-- SERIALIZABLE: fully isolated, transactions appear sequential (prevents phantom read)| Type | Definition | Method |
|---|---|---|
| Conflict Serializable | Schedule can be reordered to a serial schedule by swapping non-conflicting ops | Check if precedence graph is acyclic (DAG) |
| View Serializable | Same initial reads, same writes, same final writes as some serial schedule | More complex; all conflict-serializable are view-serializable but not vice versa |
| Method | How It Works | Pros/Cons |
|---|---|---|
| 2PL (Two-Phase Locking) | Growing phase (acquire locks) then Shrinking phase (release locks) | Simple, guarantees serializability; may cause deadlocks |
| Timestamp Ordering | Assign timestamps; older TXN wins conflicts | No deadlocks; may cause starvation |
| Optimistic (OCC) | Read freely, validate before commit, abort if conflict | Best for low-conflict workloads; wastes work on abort |
| Multi-version (MVCC) | Each TXN sees a snapshot of data | Readers never block writers; PostgreSQL uses this |
| Technique | Description | Use Case |
|---|---|---|
| Undo Log | Records before-images; used to reverse uncommitted changes | Rollback aborted transactions |
| Redo Log | Records after-images; used to replay committed changes | Recover committed transactions after crash |
| Undo/Redo Log | Records both before and after images | Full recovery support (ARIES uses this) |
| WAL (Write-Ahead Logging) | Log must be written to stable storage BEFORE data is written | Ensures recoverability; prevents lost updates |
| Checkpointing | Periodically flush dirty pages and write checkpoint record | Reduces recovery time (less log to replay) |
| ARIES | Advanced Recovery: WAL + steal/no-force + 3-pass recovery (analysis, redo, undo) | Industry standard (IBM DB2, SQL Server, PostgreSQL) |
╔══════════════════════════════════════════════════════════════╗
║ ARIES Recovery Algorithm (3 Passes) ║
╠══════════════════════════════════════════════════════════════╣
PRE-RECOVERY CONCEPTS:
- Steal Policy: Uncommitted pages CAN be written to disk (flushed)
- No-Force Policy: Committed pages DON'T have to be written to disk
- This maximizes buffer utilization but requires careful recovery
LOG RECORD FORMAT: [LSN, TXN_ID, PageID, UndoInfo, RedoInfo, PrevLSN]
PASS 1 — ANALYSIS:
Purpose: Find dirty pages and active transactions at crash time.
Steps:
1. Start from last checkpoint record.
2. Scan forward through log.
3. Build:
- Dirty Page Table (DPT): pages modified since checkpoint
- Transaction Table (TT): transactions and their status
- Active transactions at crash -> added to loser list (need undo)
PASS 2 — REDO:
Purpose: Replay ALL logged changes to restore DB to crash state.
Steps:
1. Start from earliest LSN in DPT.
2. Scan forward to end of log.
3. For each redo-able record:
- If page LSN < record LSN, reapply the change.
4. This restores DB to exactly the state at crash time.
Note: Redo is idempotent (safe to reapply).
PASS 3 — UNDO:
Purpose: Reverse effects of uncommitted (aborted/active) transactions.
Steps:
1. Process loser transactions in reverse LSN order.
2. For each record of loser TXN: undo the change.
3. Write CLRs (Compensation Log Records) for undos.
4. CLR allows redo of undos if system crashes during undo.
RESULT: DB is in a consistent state with only committed transactions applied.COMMIT also requires the commit log record to be on stable storage before acknowledging success.| Problem | Description | Example |
|---|---|---|
| Lost Update | Two transactions update same data; one overwrites the other | T1 and T2 both read balance=100, T1 writes 90, T2 writes 80 (T1's update lost) |
| Dirty Read | Reading uncommitted data that may be rolled back | T1 updates row but doesn't commit; T2 reads the uncommitted value |
| Unrepeatable Read | Same query returns different data within a transaction | T1 reads row; T2 updates same row and commits; T1 re-reads and gets different value |
| Phantom Read | New rows appear/disappear in repeated range queries | T1 reads 5 rows; T2 inserts a new row; T1 re-reads and gets 6 rows |
| Write Skew | Two transactions read overlapping data and update different rows, violating constraint | T1 reads A and B; T2 reads A and B; T1 writes A, T2 writes B (sum constraint violated) |
| Lock | Compatible With | Purpose |
|---|---|---|
| Shared (S / Read) | S-lock, IS-lock | Multiple readers can hold S-lock simultaneously |
| Exclusive (X / Write) | None (exclusive) | Only one TXN can write; blocks all others |
| Intent Shared (IS) | IS, IX, S | Intends to read children; placed on parent node |
| Intent Exclusive (IX) | IS, IX | Intends to write children; placed on parent node |
| Six (SIX) | IS, S | Holding S-lock + intends to upgrade to X on some children |
| Upgrade (U) | S, IS | Used when trying to upgrade S-lock to X-lock |
| S | X | IS | IX | SIX | |
|---|---|---|---|---|---|
| S | ✔ | ✘ | ✔ | ✘ | ✘ |
| X | ✘ | ✘ | ✘ | ✘ | ✘ |
| IS | ✔ | ✘ | ✔ | ✔ | ✔ |
| IX | ✘ | ✘ | ✔ | ✔ | ✘ |
| SIX | ✘ | ✘ | ✔ | ✘ | ✘ |
╔══════════════════════════════════════════════════════════════╗
║ CONCURRENCY PROBLEMS — ILLUSTRATED ║
╠══════════════════════════════════════════════════════════════╣
LOST UPDATE:
T1: read(X) → X = 100
T2: read(X) → X = 100
T1: write(X=90) → X = 90 (T1 deducted 10)
T2: write(X=80) → X = 80 (T2 deducted 20, but used stale value!)
Result: T1's update is LOST. Should be 70.
DIRTY READ:
T1: read(A) → A = 1000
T1: write(A=900) → A = 900 (NOT committed)
T2: read(A) → A = 900 (reads uncommitted data!)
T1: ROLLBACK → A = 1000 (restored)
Result: T2 made decisions based on invalid data (900).
UNREPEATABLE READ:
T1: read(A) → A = 1000
T2: update A=900, COMMIT
T1: read(A) → A = 900 (different from first read!)
Result: T1 sees different data within its own transaction.
PHANTOM READ:
T1: SELECT * FROM Emp WHERE dept='CS' → returns 5 rows
T2: INSERT INTO Emp VALUES (99, 'New', 'CS'), COMMIT
T1: SELECT * FROM Emp WHERE dept='CS' → returns 6 rows!
Result: New "phantom" row appeared between reads.| Variant | Rule | Deadlock-free? | Guarantee |
|---|---|---|---|
| Basic 2PL | Growing phase: acquire locks. Shrinking phase: release locks. | No | Conflict serializable |
| Strict 2PL | All exclusive locks held until commit/abort. | No | Conflict serializable + no cascading aborts |
| Rigorous 2PL | ALL locks (shared + exclusive) held until commit/abort. | No | Conflict serializable + simplest recovery |
| Conservative 2PL | ALL locks acquired BEFORE any operation starts. | Yes (no wait) | Conflict serializable + deadlock-free |
| Technique | Description | Pros/Cons |
|---|---|---|
| Deadlock Prevention | Ensure at least one necessary condition cannot hold | May reduce concurrency; timestamp-based ordering |
| Deadlock Avoidance | Check if granting a lock is safe before granting (Banker's algorithm analog) | Requires advance knowledge of lock needs; overhead |
| Deadlock Detection & Recovery | Allow deadlocks, detect using wait-for graph, then abort one TXN | Better concurrency; rollback cost on detection |
| Timeout | If a TXN waits too long, abort it | Simple but may abort non-deadlocked TXNs |
╔══════════════════════════════════════════════════════════════╗
║ DEADLOCK PREVENTION: Wait-Die vs Wound-Wait ║
╠══════════════════════════════════════════════════════════════╣
Both use timestamps to decide which TXN "wins" in a conflict.
Older TXN (smaller timestamp) has higher priority.
WAIT-DIE (Non-preemptive — older TXN waits, younger dies):
┌─────────────────────────────────────────────────────┐
│ If Ti requests lock held by Tj: │
│ If Ti is OLDER than Tj → Ti WAITS │
│ If Ti is YOUNGER than Tj → Ti ABORTS (dies) │
└─────────────────────────────────────────────────────┘
Property: Only younger TXNs are rolled back.
No starvation: a TXN only dies for older ones, and it ages.
Deadlock-free: cycle would require older waiting for younger.
WOUND-WAIT (Preemptive — older TXN wounds, younger waits):
┌─────────────────────────────────────────────────────┐
│ If Ti requests lock held by Tj: │
│ If Ti is OLDER than Tj → Tj is ABORTED (wounded) │
│ If Ti is YOUNGER than Tj → Ti WAITS │
└─────────────────────────────────────────────────────┘
Property: Only younger TXNs are rolled back.
No starvation: same reasoning as wait-die.
Deadlock-free: same reasoning.
COMPARISON:
- Both prevent deadlocks and ensure no starvation.
- Wait-Die: younger TXN rolls back immediately (wastes less work).
- Wound-Wait: older TXN forces rollback of younger (may be more disruptive).
- In practice: Wait-Die is preferred as fewer rollbacks.READ COMMITTED releases S-locks after each row read.REPEATABLE READ holds S-locks until TXN end. SERIALIZABLE uses range locks or predicate locks to prevent phantom reads. PostgreSQL uses snapshot isolation for REPEATABLE READ (true serializability requires SERIALIZABLE).| Index Type | Description | Use Case |
|---|---|---|
| Primary Index | Index on ordered key field; sparse (one per block) | Main table access on primary key |
| Clustering Index | Index on non-key ordered field; sparse | Physical ordering by search key |
| Secondary Index | Index on non-ordered field; dense (one per record) | Alternate access paths, non-key lookups |
| Dense Index | Entry for every search key value in the file | Fast lookup, more space |
| Sparse Index | Entry only for some search keys (block anchors) | Less space, may need block scan |
| Multilevel Index | Index of indexes (like B+ tree upper levels) | Handle large files efficiently |
╔══════════════════════════════════════════════════════════════╗
║ B-TREE (Order m) ║
╠══════════════════════════════════════════════════════════════╣
PROPERTIES:
- Every node has at most m children.
- Every non-leaf node (except root) has at least ceil(m/2) children.
- Root has at least 2 children (if not a leaf).
- A non-leaf node with k children has k-1 keys.
- All leaves appear at the SAME level (balanced).
- Data pointers are in BOTH internal nodes AND leaf nodes.
B-TREE of Order 3 (each node: 2 keys, 3 pointers max):
[30 | 60]
/ | [10|20] [40|50] [70|80|90]
INSERTION EXAMPLE (Order 3, max 2 keys per node):
Insert: 10, 20, 30, 40, 50, 60, 70, 80
Step 1: Insert 10 → [10]
Step 2: Insert 20 → [10 | 20]
Step 3: Insert 30 → OVERFLOW! Split:
[10 | 20 | 30] → promote 20 to new root:
[20]
/ [10] [30]
Step 4: Insert 40 → [20], [10], [30|40]
Step 5: Insert 50 → [20], [10], [30|40|50] → Split!
promote 40:
[20 | 40]
/ | [10] [30] [50]
Step 6: Insert 60 → [20|40], [10], [30], [50|60]
Step 7: Insert 70 → [20|40], [10], [30], [50|60|70] → Split!
promote 60:
[20 | 40 | 60] → Root overflow! Split root:
promote 40:
[40]
/ [20] [60]
/ / [10] [30] [50] [70]
Step 8: Insert 80 → [40], [20], [60], [10], [30], [50], [70|80]
SEARCH: O(log n) — start at root, compare, go to appropriate child.
Complexity: h <= log_n(m) where n is number of keys.╔══════════════════════════════════════════════════════════════╗
║ B+ TREE (Order m) ║
╠══════════════════════════════════════════════════════════════╣
DIFFERENCES FROM B-TREE:
1. Data pointers exist ONLY in leaf nodes (internal = index only).
2. Leaves are linked as a sorted linked list (sequential access).
3. Internal nodes can have MORE keys (only for navigation).
4. Better for range queries and sequential access.
B+ TREE of Order 3:
[30 | 60] ← Internal (index only)
/ | [10|20] [40|50] [70|80|90] ← Leaf (data + pointers)
↕
[10] ↔ [20] ↔ [40] ↔ [50] ↔ [70] ↔ [80] ↔ [90]
↑ ↑
First Last
Leaf Leaf
INSERTION (Order 3, max 2 keys in leaf, max 1 key in internal):
Insert: 5, 15, 25, 35, 45
Step 1: Insert 5 → Leaf: [5]
Step 2: Insert 15 → Leaf: [5 | 15]
Step 3: Insert 25 → Leaf overflow [5|15|25] → Split!
Leaf 1: [5], Leaf 2: [15|25]
Copy-up 15 to internal: [15]
Step 4: Insert 35 → Compare with 15 → go right → Leaf: [15|25|35] → Split!
Leaf 2: [25], Leaf 3: [35]
Internal: [15 | 25]
Step 5: Insert 45 → Compare → go rightmost → Leaf: [35|45]
SEARCH:
Point Query: Start at root, navigate to leaf. O(log n).
Range Query: Find start key, then follow leaf pointers. O(log n + k).
DELETION:
1. Find the leaf node containing the key.
2. Remove the key.
3. If underflow (too few keys):
a. Try to redistribute from sibling.
b. If not possible, merge with sibling.
4. Update parent (may cause parent underflow → cascade).| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data Pointers | In all nodes | Only in leaf nodes |
| Search | May stop early | Always reaches leaf |
| Duplicate Keys | Keys appear once | Keys copied up to internal |
| Leaf Links | No linked list | Leaves linked sequentially |
| Range Queries | Slower (inorder traversal) | Fast (follow leaf pointers) |
| Tree Height | Shorter (data in internal) | Slightly taller |
| Disk I/O | More for range queries | Fewer for sequential |
| Usage | General purpose | Databases, file systems (most common) |
| Method | Description | Pros/Cons |
|---|---|---|
| Static Hashing | Fixed number of buckets; hash(key) -> bucket | Fast O(1); fixed size, overflow chains |
| Dynamic Hashing | Buckets grow/shrink as data changes | Adapts to data size; more complex |
| Extendible Hashing | Directory doubles when bucket overflows; local depth | No chaining; directory may grow large |
| Linear Hashing | Buckets split sequentially (round-robin); no directory | Gradual splits; good for growing data |
╔══════════════════════════════════════════════════════════════╗
║ EXTENDIBLE HASHING EXAMPLE ║
╠══════════════════════════════════════════════════════════════╣
Hash Function: h(key) → binary string
Global Depth (gd): number of bits used in directory
Local Depth (ld): number of bits relevant for each bucket
INITIAL STATE (gd=1, ld=1 per bucket):
Directory:
0 → Bucket A
1 → Bucket B
INSERT 6 (h=110): last 1 bit = 0 → Bucket A → [6]
INSERT 1 (h=001): last 1 bit = 1 → Bucket B → [1]
INSERT 11 (h=1011): last 1 bit = 1 → Bucket B → [1, 11]
INSERT 12 (h=1100): last 1 bit = 0 → Bucket A → [6, 12]
Assume bucket capacity = 2, so [6, 12] is full.
INSERT 16 (h=10000): last 1 bit = 0 → Bucket A → OVERFLOW!
Split Bucket A:
- Bucket A local depth: 1 → 2
- Global depth: 1 → 2
- Rehash: 6 (110) last 2 bits = 10, 12 (1100) last 2 bits = 00
- 16 (10000) last 2 bits = 00
Directory (gd=2):
00 → Bucket A' [12, 16]
01 → Bucket B [1, 11]
10 → Bucket A'' [6]
11 → Bucket B [1, 11] (shared pointer)
KEY INSIGHT: Directory doubles on overflow, but only the overflowing
bucket splits. Shared pointers keep directory size manageable.B+ Tree for range queries, ordered access, and when data changes frequently. Use Hash for exact-match queries (=, <IN>) where O(1) access is critical. Most databases use B+ trees as default; hash indexes are secondary. PostgreSQL supports both. Hash indexes cannot handle range queries or sorting.| Organization | Description | Pros / Cons |
|---|---|---|
| Heap (Unordered) | Records stored in insertion order | Fast insert; slow search (full scan) |
| Sequential (Sorted) | Records ordered by key field | Fast search (binary); slow insert (reorder) |
| Hashed | Records placed by hash function | O(1) exact lookup; no range queries |
| Clustered | Physically store related records together | Fast join on cluster key; one table per cluster key |
| B+ Tree Indexed | Heap file + B+ tree index | Balanced: good for both point and range queries |
| RAID | Min Disks | Redundancy | Read Perf | Write Perf | Capacity | Key Feature |
|---|---|---|---|---|---|---|
| 0 | 1 | None | Fast | Fast | n disks | Striping only; no fault tolerance |
| 1 | 2 | Full mirror | Fast | Moderate | n/2 disks | Exact copy; 100% redundancy |
| 2 | 3 | Hamming code | Fast | Slow | n-1 disks | Bit-level striping + ECC |
| 3 | 3 | Byte-level parity | Fast | Slow | n-1 disks | Byte-level + dedicated parity disk |
| 4 | 3 | Block-level parity | Fast | Slow | n-1 disks | Block-level + dedicated parity disk |
| 5 | 3 | Distributed parity | Fast | Moderate | n-1 disks | Parity distributed across all disks |
| 6 | 4 | Dual parity | Fast | Slow | n-2 disks | Can survive 2 disk failures |
| 10 | 4 | Mirrored stripe | Very Fast | Moderate | n/2 disks | RAID 0 + RAID 1; best read performance |
| Policy | Description | Use Case |
|---|---|---|
| LRU (Least Recently Used) | Replace page that was least recently accessed | General purpose; good for locality |
| MRU (Most Recently Used) | Replace page that was most recently used | Sequential scans (revisit unlikely) |
| Clock / Second Chance | Circular list with reference bit; similar to LRU | Practical compromise; widely used |
| LFU (Least Frequently Used) | Replace page accessed fewest times | Stable access patterns; slow to adapt |
| Pin / No-Replace | Mark pages as pinned (cannot be replaced) | Active transaction pages, log pages |
╔══════════════════════════════════════════════════════════════╗
║ QUERY OPTIMIZATION ║
╠══════════════════════════════════════════════════════════════╣
GOAL: Find the most efficient execution plan for a SQL query.
QUERY PROCESSING PHASES:
1. Parsing & Translation → Relational Algebra expression
2. Optimization → Best equivalent algebra expression
3. Evaluation → Execute the query plan
EQUIVALENCE RULES (transform the relational algebra tree):
┌───────────────────────────────────────────────────────┐
│ 1. Selection Split: │
│ σc1 AND c2(R) = σc1(σc2(R)) │
│ │
│ 2. Selection Commute: │
│ σc1(σc2(R)) = σc2(ρc1(R)) │
│ │
│ 3. Selection Pushdown (EARLY SELECTION): │
│ σc(R ⋈ S) = σc(R) ⋈ S (if c uses only R attrs) │
│ This reduces the size of intermediate results! │
│ │
│ 4. Join Commute: │
│ R ⋈ S = S ⋈ R │
│ │
│ 5. Join Associativity: │
│ (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T) │
│ │
│ 6. Projection Pushdown (EARLY PROJECTION): │
│ Push π down the tree to reduce column size early │
└───────────────────────────────────────────────────────┘
EXAMPLE:
Query: SELECT name FROM Emp WHERE dept='CS' AND salary > 50000
JOIN Dept ON Emp.dept_id = Dept.dept_id
Bad Plan:
1. Cartesian product Emp x Dept
2. Select where conditions
3. Project name
Good Plan (after optimization):
1. Select Emp where dept='CS' (reduces rows early)
2. Select result where salary > 50000
3. Join with Dept (fewer rows to join)
4. Project name
COST ESTIMATION:
Cost = (disk I/O) + (CPU computation) + (memory usage)
For a Selection σattr=value(R) with index on attr:
- B+ Tree index: O(log n) disk accesses
- Hash index: O(1) disk accesses
- No index (full scan): O(b) where b = number of blocks
For a Join R ⋈ S (R has nR tuples, S has nS tuples):
- Nested Loop Join: O(nR * nS) — slowest
- Block Nested Loop: O(bR * bS) — uses buffer blocks
- Sort-Merge Join: O(bR log bR + bS log bS) — good for sorted
- Hash Join: O(bR + bS) — good for equijoin
- Index Nested Loop: O(nR * (log nS)) — if index on S| Algorithm | Best When | Cost | Notes |
|---|---|---|---|
| Nested Loop | Small tables, no indexes | O(nR * nS) | Worst but most general |
| Block Nested Loop | Medium tables, limited memory | O(bR * bS / M) | M = buffer blocks available |
| Index Nested Loop | One table has index on join key | O(nR * (log nS)) | Good if index exists on inner table |
| Sort-Merge | Both tables large, output sorted | O(n log n) | Good for sorted output, range joins |
| Hash Join | Equi-join, large tables | O(nR + nS) | Best for large equi-joins; one-pass or two-pass |
SELECT down as early as possible to reduce row count. (2) Push PROJECT down to reduce column width. (3) Perform most restrictive selections first. (4) Compute CROSS JOIN / Cartesian products last (most expensive). (5) Use EXPLAIN / EXPLAIN ANALYZE to inspect actual query plans in production.Atomicity: Like a bank transfer — debiting and crediting are one unit. If one fails, both roll back. Ensures all-or-nothing execution using COMMIT and ROLLBACK.
Consistency: Like a Sudoku puzzle — every move must maintain validity. Total debits always equal credits. DB integrity constraints (PK, FK, CHECK) must hold before and after the transaction.
Isolation: Like voting booths — one person at a time, no peeking. Concurrent transactions execute as if serialized. Isolation levels control the tradeoff between consistency and performance.
Durability:Like a notarized contract — once signed (committed), it's permanent. Even if the power goes out, the committed transaction's effects survive. Implemented via WAL (Write-Ahead Logging) and checkpointing.
1NF: Ensure every attribute is atomic (no repeating groups, no arrays/lists in columns). Example: If a student has multiple phone numbers, create separate rows or a related table.
2NF: Remove partial dependencies — non-prime attributes must depend on the entire primary key, not just a part of it. Only applies to tables with composite keys. Example: In Order(order_id, product_id, product_name, qty),product_name depends only on product_id (partial dependency). Decompose into Order and Product.
3NF: Remove transitive dependencies — non-prime attributes must depend directly on the primary key, not through other non-prime attributes. Example: In Employee(emp_id, name, dept_id, dept_name),dept_name -> depends on dept_id, not emp_id (transitive). Decompose into Employee and Department.
BCNF: For every FD X->A, X must be a super key. Stronger than 3NF — eliminates all redundancy. However, BCNF decomposition may not preserve all functional dependencies. 3NF is usually sufficient in practice.
Key difference: In a B-Tree, both internal nodes and leaf nodes contain data pointers (actual records/addresses). In a B+ Tree, only leaf nodes contain data; internal nodes contain only keys for navigation (copy-ups, not actual data).
Linked leaves: B+ Tree leaf nodes are connected as a sorted doubly-linked list, enabling efficient sequential and range access (critical for ORDER BY, BETWEEN, > / < queries). B-Trees require tree traversal for range queries.
Practical choice: B+ Tree is almost universally used in databases (MySQL InnoDB, PostgreSQL, Oracle) because of its superior range query performance and the ability to cache internal nodes (which are smaller since they lack data pointers). B-Trees are used in file systems and some NoSQL databases.
Core idea: A transaction has two phases — a growing phase (acquiring locks) and a shrinking phase (releasing locks). Once a lock is released, no new locks can be acquired. This ensures conflict serializability.
Strict 2PL: Exclusive (write) locks are held until COMMIT or ROLLBACK. This prevents cascading aborts (where aborting one TXN forces others to abort because they read uncommitted data). Most databases use this.
Rigorous 2PL: ALL locks (both shared and exclusive) held until end. Simplest recovery but more restrictive concurrency.Conservative 2PL: All locks acquired before any operation; prevents deadlocks but requires knowing all needed locks in advance.
Trade-off: 2PL guarantees serializability but does not prevent deadlocks. Use deadlock detection (wait-for graph) or prevention (wait-die, wound-wait) to handle deadlocks. Many modern databases use MVCC instead of strict locking for better concurrency.
DBMS (Database Management System): The broad category of any software that manages data. Includes RDBMS, NoSQL, NewSQL, graph databases, etc. Not a specific technology.
RDBMS (Relational DBMS): Stores data in tables with rows and columns. Uses SQL. Enforces ACID via schemas, foreign keys, and constraints. Examples: MySQL, PostgreSQL, Oracle, SQL Server.Use when: Data is structured, relationships matter, consistency is critical (banking, ERP, e-commerce orders).
NoSQL: Non-relational databases that trade ACID consistency for availability, partition tolerance (CAP theorem), and flexible schemas. Types: Document (MongoDB), Key-Value (Redis), Column-Family (Cassandra), Graph (Neo4j).Use when: Data is unstructured/semi-structured, massive scale, high write throughput, flexible schema needed (social feeds, IoT, real-time analytics).
Key trade-off: RDBMS = strong consistency + structured schema. NoSQL = high availability + flexible schema + horizontal scalability. NewSQL (CockroachDB, Spanner) attempts to combine both.
Strong Entity -> Table:Each strong entity becomes a table. Attributes become columns. Primary key is the entity's key attribute.
Weak Entity -> Table:Create a table with the weak entity's attributes plus the owner's primary key as a foreign key. The composite key is (owner_PK, partial_key). The relationship connecting them is not represented as a separate table.
1:1 Relationship:Add the foreign key of one entity into the other's table (prefer adding to the table with total participation, or the one with fewer rows to save space).
1:N Relationship:Add the primary key of the "one" side as a foreign key in the "many" side table. No separate table needed.
M:N Relationship: Create a separate bridge/junction table with the primary keys of both participating entities. The combination of both foreign keys forms the composite primary key. Add any relationship attributes to this table.
ISA Hierarchy: Method 1: Create one table per entity (with shared attributes in parent; child tables reference parent PK via FK). Method 2: Create one table with all attributes + a type discriminator column. Method 1 is preferred for normalization.