DBMS Cheatsheet Cheatsheet

📐

ER Model

MODELING

Concept	Description	Example
Entity	Real-world object with independent existence	Student, Employee, Course
Entity Set	Collection of similar entities	All Students, All Courses
Weak Entity	Cannot exist without an owner entity; has partial key	Dependent (relies on Employee)
Strong Entity	Exists independently with a primary key	Employee (emp_id)
Attribute	Property that describes an entity	name, age, salary
Key Attribute	Uniquely identifies each entity	emp_id, roll_no
Composite Attribute	Attribute made of sub-attributes	Full_Name(First, Middle, Last)
Multi-valued	Attribute with multiple values	Phone_Numbers, Email_Addresses
Derived Attribute	Computed from other attributes	Age (from DOB), GPA (from marks)
Relationship	Association among entities	Works_In, Enrolls_In, Manages
Participation	How entities participate in a relationship	Total (must) vs Partial (may)
Cardinality	Number of entity occurrences in relationship	1:1, 1:N, M:N

Attribute Types

Type	Notation	Example
Simple	Oval (single line)	age, gender
Composite	Oval with sub-ovals	Address(street, city, zip)
Multi-valued	Double oval	phone_numbers, hobbies
Derived	Dashed oval	age (from DOB)
Key	Underlined text in oval	emp_id
Stored	Solid oval	date_of_birth

Relationship Cardinalities

Cardinality	Notation	Example
1:1	Line with no arrowheads	Person HAS Passport
1:N (One-to-Many)	Arrow from One to Many	Department HAS Employees
M:N (Many-to-Many)	Diamond connecting both	Students TAKE Courses
Total Participation	Double line to entity	Every Employee MUST belong to a Dept
Partial Participation	Single line to entity	Some Employees may manage a Dept

ER Diagram Notation Reference

Symbol	Represents	Description
Rectangle	Entity / Entity Set	Strong entity with independent existence
Double Rectangle	Weak Entity Set	Depends on identifying/owner entity
Diamond	Relationship Set	Association between entities
Double Diamond	Identifying Relationship	Links weak entity to owner entity
Oval	Attribute	Property of an entity or relationship
Double Oval	Multi-valued Attribute	Can hold multiple values (e.g., phones)
Dashed Oval	Derived Attribute	Value computed from other attributes
Underlined Oval	Key Attribute	Uniquely identifies entity instance
Double Line	Total Participation	Every entity must participate
Arrow / Directed Line	Role / Cardinality	Indicates one side of 1:N relationship
Dashed Line	Weak Entity Link	Connects weak entity to identifying owner

er-diagram-example

╔══════════════════════════════════════════════════════════╗
║  ER Diagram Example: University Database                  ║
║                                                          ║
║    ┌──────────┐      ┌──────────┐      ┌──────────┐     ║
║    │  Student │──────│ Enrolls  │──────│  Course  │     ║
║    └──────────┘      └──────────┘      └──────────┘     ║
║    (roll_no) PK                          (course_id) PK   ║
║    (name)                               (title)          ║
║    (phone) [multi]                      (credits)        ║
║    (age) [derived]                      (dept_name) FK    ║
║                                                          ║
║    ┌──────────┐      ┌──────────┐                        ║
║    │  Dept    │──────│  Has     │──┐                     ║
║    └──────────┘      └──────────┘  │                     ║
║    (dept_id) PK                     │                     ║
║    (d_name)                         v                     ║
║    (location)                  ┌──────────┐              ║
║    (budget)                    │Instructor│              ║
║                                └──────────┘              ║
║    Cardinalities:              (emp_id) PK               ║
║    Student-Enrolls: M:N        (name)                    ║
║    Dept-Has-Instructor: 1:N    (salary)                  ║
║    Instructor-Teaches: M:N                               ║
╚══════════════════════════════════════════════════════════╝

ISA Hierarchy (Generalization / Specialization)

Type	Description	Example
Generalization	Bottom-up approach: identify common features from specific entities	Employee {Employee + Manager + Engineer} -> Person
Specialization	Top-down approach: break a general entity into specific ones	Account -> Savings_Account, Current_Account
Overlapping	Entity can belong to multiple specializations	Person can be both Student AND Employee
Disjoint	Entity belongs to exactly one specialization	Customer is EITHER Individual OR Corporate
Total	Every entity must belong to at least one specialization	Every Account must be Savings or Current
Partial	Some entities may not belong to any specialization	Some People are neither Student nor Employee

Weak Entity Rules: A weak entity always has a total participationin its identifying relationship. Its partial key combined with the owner's primary key forms the weak entity's composite primary key. Example: Dependent(partial_key: dependent_name) + Employee(emp_id) -> PK = (emp_id, dependent_name).

📊

Relational Model

RELATIONAL

Key Type	Definition	Example
Super Key	Set of attributes that uniquely identifies a tuple	{emp_id}, {emp_id, name}, {emp_id, dept, age}
Candidate Key	Minimal super key (no proper subset is a super key)	{emp_id}, {email} (if email is unique)
Primary Key	Chosen candidate key; identifies tuples uniquely	emp_id (chosen from candidate keys)
Foreign Key	References primary key of another table	dept_id in Employee refs Dept(dept_id)
Composite Key	Key consisting of 2+ attributes	{emp_id, project_id} in Works_On
Alternate Key	Candidate keys not chosen as primary key	email (when emp_id is PK)
Surrogate Key	System-generated artificial key	auto-increment id columns

Key Hierarchy & Relationship

Candidate Key ⊊ Super Key: Every candidate key is a super key, but not vice-versa. A super key may have extra (redundant) attributes. A candidate key is minimal.

Primary Key ⊊ Candidate Key: The primary key is one specific candidate key chosen for the table. All other candidate keys become alternate keys.

Foreign Key: Creates a relationship between two tables. May or may not be unique. Can be NULL (unless NOT NULL is specified). Must match a value in the referenced primary key or be NULL.

Relational Algebra Operations

Operation	Symbol	Description	Example
Select	σ (sigma)	Returns rows matching condition	σₐgₑ>30(Student)
Project	π (pi)	Returns specific columns	πname,salary(Employee)
Union	∪	All tuples from both relations (no dupes)	R ∪ S
Intersection	∩	Common tuples in both relations	R ∩ S
Difference	−	Tuples in R but not in S	R − S
Cartesian Product	×	All combinations of tuples	R × S
Join	⋈ (bowtie)	Combine tuples matching a condition	R ⋈ R.dept=S.dept S
Natural Join	⋈*	Join on all common attributes (auto-match)	R ⋈* S
Left Outer Join	⟕⋈	All from R + matching from S (NULL fill)	R ⟕⋈ S
Right Outer Join	⋈⟖	All from S + matching from R (NULL fill)	R ⋈⟖ S
Full Outer Join	⟗⋈	All tuples from both (NULL fill)	R ⟗⋈ S
Rename	ρ (rho)	Rename a relation or attributes	ρₓ(Emp)(Employee)

relational-algebra-examples.sql

-- Example Tables
-- Student(roll_no, name, age, dept)
-- Course(course_id, title, credits, dept)
-- Enrollment(roll_no, course_id, grade)

-- 1. SELECT: Find students older than 20
--    σage>20(Student)

-- 2. PROJECT: Get names and departments of all students
--    πname,dept(Student)

-- 3. SELECT + PROJECT: Names of CS students
--    πname(σdept='CS'(Student))

-- 4. NATURAL JOIN: Students with their enrollments
--    Student ⋈* Enrollment

-- 5. EQUI-JOIN: Students enrolled in 'DBMS'
--    πname(σtitle='DBMS'(Course ⋈ Course.course_id=Enrollment.course_id Enrollment))
--    Or using Natural Join:
--    πname(Student ⋈* Enrollment ⋈* (σtitle='DBMS'(Course)))

-- 6. UNION: All departments from Student and Course
--    πdept(Student) ∪ πdept(Course)

-- 7. INTERSECTION: Depts that have both students and courses
--    πdept(Student) ∩ πdept(Course)

-- 8. DIFFERENCE: Depts with students but no courses
--    πdept(Student) − πdept(Course)

-- 9. CARTESIAN PRODUCT: Every student with every course
--    Student × Course

-- 10. LEFT OUTER JOIN: All students, with course info if enrolled
--    Student ⟕⋈ Enrollment

-- 11. DIVISION: Students enrolled in ALL courses
--     πroll_no,course_id(Enrollment) ÷ πcourse_id(Course)

Division Operator: R \u00F7 S returns tuples from R that are associated with everytuple in S. Example: "Find students who have enrolled in allcourses." This is tricky in SQL but natural in relational algebra. SQL equivalent uses GROUP BY ... HAVING COUNT(*) = (SELECT COUNT(*) FROM Course).

🗃️

SQL

SQL Sub-Languages

Sub-language	Purpose	Commands
DDL	Define/modify DB structure	CREATE, ALTER, DROP, TRUNCATE, RENAME
DML	Manipulate data	INSERT, UPDATE, DELETE
DQL	Query data	SELECT
DCL	Control access/permissions	GRANT, REVOKE
TCL	Manage transactions	COMMIT, ROLLBACK, SAVEPOINT

Data Types

Type	Description	Example
INT / INTEGER	Whole numbers	42
FLOAT / DOUBLE	Decimal numbers	3.14
DECIMAL(p,s)	Fixed-point number	DECIMAL(10,2) = 99999999.99
VARCHAR(n)	Variable-length string	VARCHAR(255)
CHAR(n)	Fixed-length string	CHAR(10)
DATE	Date (YYYY-MM-DD)	2024-01-15
TIMESTAMP	Date + time	2024-01-15 10:30:00
BOOLEAN	True/False	TRUE
TEXT / BLOB	Large text / binary data	Article body
ENUM	Set of allowed values	ENUM('active','inactive')

ddl-commands.sql

-- ═══ DDL: Data Definition Language ═══

-- CREATE TABLE with all constraints
CREATE TABLE Employee (
    emp_id      INT           PRIMARY KEY,
    name        VARCHAR(100)  NOT NULL,
    email       VARCHAR(255)  UNIQUE NOT NULL,
    age         INT           CHECK (age >= 18),
    salary      DECIMAL(12,2) DEFAULT 50000.00,
    dept_id     INT           REFERENCES Department(dept_id),
    hire_date   DATE          DEFAULT CURRENT_DATE,
    CONSTRAINT chk_salary CHECK (salary > 0)
);

-- CREATE TABLE with composite primary key
CREATE TABLE Works_On (
    emp_id     INT REFERENCES Employee(emp_id),
    project_id INT REFERENCES Project(project_id),
    hours      INT CHECK (hours > 0),
    PRIMARY KEY (emp_id, project_id)
);

-- ALTER TABLE
ALTER TABLE Employee ADD COLUMN phone VARCHAR(20);
ALTER TABLE Employee ALTER COLUMN salary TYPE DECIMAL(15,2);
ALTER TABLE Employee DROP COLUMN phone;
ALTER TABLE Employee ADD CONSTRAINT uk_email UNIQUE (email);
ALTER TABLE Employee DROP CONSTRAINT chk_salary;

-- DROP TABLE
DROP TABLE Employee;                  -- fails if referenced
DROP TABLE Employee CASCADE;         -- drops dependents too
DROP TABLE IF EXISTS Employee;

-- TRUNCATE (faster than DELETE, resets identity)
TRUNCATE TABLE Employee;

dml-commands.sql

-- ═══ DML: Data Manipulation Language ═══

-- INSERT
INSERT INTO Employee (emp_id, name, email, salary, dept_id)
VALUES (1, 'Alice', 'alice@co.com', 75000, 101);

INSERT INTO Employee VALUES
    (2, 'Bob', 'bob@co.com', 60000, 102),
    (3, 'Carol', 'carol@co.com', 85000, 101);

INSERT INTO Employee (name, email)
VALUES ('Dave', 'dave@co.com');  -- salary uses DEFAULT

-- INSERT from SELECT
INSERT INTO Archived_Employee SELECT * FROM Employee WHERE hire_date < '2020-01-01';

-- UPDATE
UPDATE Employee SET salary = salary * 1.10 WHERE dept_id = 101;
UPDATE Employee SET salary = 90000, dept_id = 103 WHERE emp_id = 2;

-- UPDATE with subquery
UPDATE Employee SET dept_id = (
    SELECT dept_id FROM Department WHERE dept_name = 'Marketing'
) WHERE emp_id = 3;

-- DELETE
DELETE FROM Employee WHERE emp_id = 99;
DELETE FROM Employee WHERE salary < 30000;

-- DELETE with subquery
DELETE FROM Enrollment
WHERE student_id IN (
    SELECT emp_id FROM Employee WHERE dept_id IS NULL
);

dql-queries.sql

-- ═══ DQL: Data Query Language ═══

-- Basic SELECT
SELECT name, salary FROM Employee WHERE dept_id = 101;

-- DISTINCT
SELECT DISTINCT dept_id FROM Employee;

-- ORDER BY
SELECT * FROM Employee ORDER BY salary DESC, name ASC;

-- LIMIT / OFFSET (pagination)
SELECT * FROM Employee ORDER BY emp_id LIMIT 10 OFFSET 20;

-- Aggregate Functions
SELECT
    COUNT(*)            AS total_emp,
    COUNT(DISTINCT dept_id) AS num_depts,
    SUM(salary)         AS total_salary,
    AVG(salary)         AS avg_salary,
    MAX(salary)         AS max_salary,
    MIN(salary)         AS min_salary
FROM Employee;

-- GROUP BY
SELECT dept_id, COUNT(*) AS emp_count, AVG(salary) AS avg_sal
FROM Employee
GROUP BY dept_id
HAVING COUNT(*) > 5
ORDER BY avg_sal DESC;

-- JOINs
-- INNER JOIN: matching rows from both tables
SELECT e.name, d.dept_name FROM Employee e
INNER JOIN Department d ON e.dept_id = d.dept_id;

-- LEFT JOIN: all from left + matching from right
SELECT e.name, d.dept_name FROM Employee e
LEFT JOIN Department d ON e.dept_id = d.dept_id;

-- RIGHT JOIN: all from right + matching from left
SELECT e.name, d.dept_name FROM Employee e
RIGHT JOIN Department d ON e.dept_id = d.dept_id;

-- FULL OUTER JOIN: all from both
SELECT e.name, d.dept_name FROM Employee e
FULL OUTER JOIN Department d ON e.dept_id = d.dept_id;

-- SELF JOIN: join table with itself
SELECT e1.name AS employee, e2.name AS manager
FROM Employee e1
LEFT JOIN Employee e2 ON e1.manager_id = e2.emp_id;

-- CROSS JOIN: Cartesian product
SELECT * FROM Employee CROSS JOIN Department;

-- SUBQUERY in WHERE
SELECT * FROM Employee
WHERE salary > (SELECT AVG(salary) FROM Employee);

-- Correlated Subquery
SELECT * FROM Employee e
WHERE salary > (SELECT AVG(salary) FROM Employee WHERE dept_id = e.dept_id);

-- EXISTS
SELECT * FROM Department d
WHERE EXISTS (
    SELECT 1 FROM Employee e WHERE e.dept_id = d.dept_id
);

-- NOT EXISTS (division-like)
SELECT s.name FROM Student s
WHERE NOT EXISTS (
    SELECT c.course_id FROM Course c
    WHERE NOT EXISTS (
        SELECT 1 FROM Enrollment e
        WHERE e.student_id = s.roll_no AND e.course_id = c.course_id
    )
);

-- IN / ANY / ALL
SELECT * FROM Employee WHERE dept_id IN (101, 102, 103);
SELECT * FROM Employee WHERE salary > ANY (SELECT salary FROM Manager);
SELECT * FROM Employee WHERE salary > ALL (SELECT salary FROM Intern);

-- UNION / INTERSECT / EXCEPT
SELECT name FROM Employee
UNION
SELECT name FROM Manager;

-- CASE expression
SELECT name, salary,
    CASE
        WHEN salary > 100000 THEN 'Senior'
        WHEN salary > 60000  THEN 'Mid-Level'
        ELSE 'Junior'
    END AS level
FROM Employee;

-- WINDOW Functions
SELECT name, dept_id, salary,
    RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank,
    SUM(salary) OVER (PARTITION BY dept_id) AS dept_total,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS global_rank
FROM Employee;

views-indexes-constraints.sql

-- ═══ VIEWS ═══
CREATE VIEW EmpView AS
SELECT e.name, d.dept_name, e.salary
FROM Employee e JOIN Department d ON e.dept_id = d.dept_id
WHERE e.salary > 50000;

CREATE VIEW DeptStats AS
SELECT dept_id, COUNT(*) AS emp_count, AVG(salary) AS avg_sal
FROM Employee GROUP BY dept_id;

-- Updatable View: must reference one table, no aggregate/group by
CREATE VIEW EmpSalary AS
SELECT name, salary FROM Employee WHERE dept_id = 101;

INSERT INTO EmpSalary VALUES ('New', 60000);  -- inserts into Employee

-- ═══ INDEXES ═══
CREATE INDEX idx_emp_dept ON Employee(dept_id);
CREATE UNIQUE INDEX idx_emp_email ON Employee(email);
CREATE INDEX idx_emp_name ON Employee(name, salary);  -- composite

DROP INDEX idx_emp_dept;

-- Index types:
-- B-Tree (default): good for range queries
-- Hash: good for exact match (no range)
-- GIN: good for full-text search, arrays (PostgreSQL)

-- ═══ CONSTRAINTS ═══
-- Column-level
CREATE TABLE T (
    id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    age INT CHECK (age >= 0),
    dept_id INT REFERENCES Dept(dept_id)
        ON DELETE CASCADE
        ON UPDATE SET NULL
);

-- Table-level constraints
CREATE TABLE Enrollment (
    student_id INT,
    course_id INT,
    grade CHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES Student(roll_no),
    FOREIGN KEY (course_id) REFERENCES Course(course_id),
    CHECK (grade IN ('A','B','C','D','F'))
);

-- Referential Actions:
-- ON DELETE/UPDATE: CASCADE | SET NULL | SET DEFAULT | RESTRICT | NO ACTION

💡

View vs Materialized View: A standard VIEW is a stored query (virtual table) — always shows latest data. A MATERIALIZED VIEW stores the result physically — faster reads but must be REFRESHed to stay current. Use materialized views for complex, expensive queries that don't need real-time data.

🔧

Normalization

NORMALIZATION

Key Concepts

Concept	Definition	Example
Functional Dependency (FD)	X -> Y means Y is determined by X	roll_no -> name, dept_id
Full FD	X -> Y, no proper subset of X determines Y	(roll_no, course_id) -> grade
Partial FD	X -> Y where only part of X determines Y	(roll_no, course_id) -> name (roll_no alone determines name)
Transitive FD	X -> Y and Y -> Z, so X -> Z indirectly	emp_id -> dept_id, dept_id -> dept_name => emp_id -> dept_name
Closure (X+)	Set of all attributes functionally determined by X	If A->B, B->C, then (A)+ = {A,B,C}
Minimal Cover	Simplified equivalent set of FDs	Remove redundant FDs, extraneous attributes
Lossless Join	Joining decomposed tables recovers original exactly	R1 ⋈ R2 = R with no spurious tuples
Dependency Preservation	All original FDs checkable on decomposed tables	Every FD in F+ is in closure of union of F_i
Prime Attribute	Part of any candidate key	Attributes in at least one candidate key
Non-prime Attribute	Not part of any candidate key	Attributes not in any candidate key

Normal Forms Summary

Normal Form	Condition	Handles
1NF	All attributes have atomic (indivisible) values; no repeating groups	Multi-valued attributes, composite cells
2NF	In 1NF + no partial dependencies (no non-prime partially dependent on PK)	Partial dependencies on composite keys
3NF	In 2NF + no transitive dependencies (non-prime -> non-prime)	Transitive dependencies
BCNF	For every FD X->A, X must be a super key (stronger than 3NF)	All anomalies, even with prime attributes
4NF	In BCNF + no multi-valued dependencies (independent multi-valued attributes)	Multi-valued dependencies
5NF	In 4NF + no join dependencies (cannot be decomposed further without loss	Join dependencies

step-by-step-normalization

╔══════════════════════════════════════════════════════════════╗
║  STEP-BY-STEP NORMALIZATION EXAMPLE                          ║
║  Scenario: Student Course Registration                       ║
╠══════════════════════════════════════════════════════════════╣

UN-NORMALIZED TABLE:
┌─────────────────────────────────────────────────────────────┐
│ Student(roll_no, name, dept, dept_head, courses_taken)      │
│                                                             │
│ courses_taken = [(course_id, title, instructor)]            │
│   This is a REPEATING GROUP (non-atomic)                    │
└─────────────────────────────────────────────────────────────┘

Sample Data (0NF):
┌────────┬────────┬──────┬──────────┬──────────────────────────┐
│roll_no │ name   │ dept │ dept_head│ courses_taken            │
├────────┼────────┼──────┼──────────┼──────────────────────────┤
│ 101    │ Alice  │ CS   │ Dr.Smith │(C1, DBMS, Prof.J),      │
│        │        │      │          │(C2, OS, Prof.K)         │
│ 102    │ Bob    │ EE   │ Dr.Patel │(C1, DBMS, Prof.J)       │
└────────┴────────┴──────┴──────────┴──────────────────────────┘

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

STEP 1: Convert to 1NF ──────────────────────────────────────
Rule: Eliminate repeating groups; make all cells atomic.
      Each row must have a single value per column.

Result (1NF):
┌────────┬────────┬──────┬──────────┬────────┬──────────┬───────────┐
│roll_no │ name   │ dept │ dept_head│course_id│ title   │ instructor│
├────────┼────────┼──────┼──────────┼────────┼──────────┼───────────┤
│ 101    │ Alice  │ CS   │ Dr.Smith │ C1     │ DBMS    │ Prof.J    │
│ 101    │ Alice  │ CS   │ Dr.Smith │ C2     │ OS      │ Prof.K    │
│ 102    │ Bob    │ EE   │ Dr.Patel │ C1     │ DBMS    │ Prof.J    │
└────────┴────────┴──────┴──────────┴────────┴──────────┴───────────┘

PK = (roll_no, course_id)   [Composite Key]

FDs identified:
  roll_no -> name, dept, dept_head           (partial dependency!)
  course_id -> title, instructor              (partial dependency!)
  dept -> dept_head                           (transitive dependency!)

Problems in 1NF: Redundancy, update anomalies, insertion anomalies

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

STEP 2: Convert to 2NF ──────────────────────────────────────
Rule: Remove partial dependencies.
      Non-prime attributes must depend on the ENTIRE PK, not part of it.

Decomposition:
  R1(roll_no, name, dept, dept_head)         PK: roll_no
  R2(course_id, title, instructor)            PK: course_id
  R3(roll_no, course_id)                       PK: (roll_no, course_id)

Result (2NF):
┌────────┬────────┬──────┬──────────┐
│roll_no │ name   │ dept │ dept_head│
├────────┼────────┼──────┼──────────┤
│ 101    │ Alice  │ CS   │ Dr.Smith │
│ 102    │ Bob    │ EE   │ Dr.Patel │
└────────┴────────┴──────┴──────────┘

┌─────────┬──────────┬────────────┐
│course_id│ title    │ instructor │
├─────────┼──────────┼────────────┤
│ C1      │ DBMS     │ Prof.J     │
│ C2      │ OS       │ Prof.K     │
└─────────┴──────────┴────────────┘

┌────────┬───────────┐
│roll_no │ course_id │
├────────┼───────────┤
│ 101    │ C1        │
│ 101    │ C2        │
│ 102    │ C1        │
└────────┴───────────┘

Remaining problem: In R1, dept_head depends on dept, NOT roll_no.
dept -> dept_head is a TRANSITIVE dependency.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

STEP 3: Convert to 3NF ──────────────────────────────────────
Rule: Remove transitive dependencies.
      Non-prime attributes must depend ONLY on the primary key (directly).

Decompose R1:
  R1a(roll_no, name, dept)                  PK: roll_no,  FK: dept
  R1b(dept, dept_head)                       PK: dept

Result (3NF):
┌────────┬────────┬──────┐   ┌────────┬──────────┐
│roll_no │ name   │ dept │   │ dept   │ dept_head│
├────────┼────────┼──────┤   ├────────┼──────────┤
│ 101    │ Alice  │ CS   │   │ CS     │ Dr.Smith │
│ 102    │ Bob    │ EE   │   │ EE     │ Dr.Patel │
└────────┴────────┴──────┘   └────────┴──────────┘

Final 3NF Schema:
  Student(roll_no, name, dept)              PK: roll_no
  Department(dept, dept_head)               PK: dept
  Course(course_id, title, instructor)      PK: course_id
  Enrollment(roll_no, course_id)            PK: (roll_no, course_id)

All tables in 3NF:
  - No repeating groups (1NF) ✓
  - No partial dependencies (2NF) ✓
  - No transitive dependencies (3NF) ✓
  - Lossless join decomposition ✓
  - Dependency preserving ✓

BCNF vs 3NF — Key Difference

Property	3NF	BCNF
Definition	X->A: either X is super key OR A is prime	X->A: X MUST be a super key
Stricter	Less strict	More strict (stronger)
Dependency Preservation	Always achievable	May NOT be dependency preserving
Lossless Join	Always achievable	Always achievable
Redundancy	May have some redundancy	Zero redundancy
Example Violation	Consider R(A,B,C) with AB->C, C->B	Candidate keys: AB, AC. FD C->B: C is not super key, B is prime. Violates BCNF but NOT 3NF.

closure-and-minimal-cover

╔══════════════════════════════════════════════════════════════╗
║  ATTRIBUTE CLOSURE & MINIMAL COVER                          ║
╠══════════════════════════════════════════════════════════════╣

ATTRIBUTE CLOSURE (X+):
Given R(A,B,C,D,E) and F = { A->B, B->C, AD->E }

Compute (A)+:
  Start: (A)+ = { A }
  A->B:  (A)+ = { A, B }
  B->C:  (A)+ = { A, B, C }
  No more apply. Final: { A, B, C }

Compute (AD)+:
  Start: (AD)+ = { A, D }
  A->B:  (AD)+ = { A, B, D }
  B->C:  (AD)+ = { A, B, C, D }
  AD->E: (AD)+ = { A, B, C, D, E }
  Final: { A, B, C, D, E } = all attributes

Candidate Keys: AD is a super key. Is any subset? Check A -> no. Check D -> no.
  So AD is the only candidate key. Prime attributes: A, D. Non-prime: B, C, E.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

MINIMAL COVER (Canonical Cover):
Given F = { AB->C, C->BE, B->E }

Step 1: Decompose RHS (split):
  F = { AB->C, C->B, C->E, B->E }

Step 2: Remove redundant FDs:
  - Remove AB->C: check if (AB)+ in remaining = {A,B}
    C is not in closure. AB->C is NOT redundant. Keep.
  - Remove C->B: check if (C)+ in remaining = {C,E}
    B not in closure. C->B is NOT redundant. Keep.
  - Remove C->E: check if (C)+ in remaining = {C,B}
    E not in closure. C->E is NOT redundant. Keep.
  - Remove B->E: check if (B)+ in remaining = {B}
    E not in closure. B->E is NOT redundant. Keep.

Step 3: Remove extraneous attributes from LHS:
  - AB->C: Is A extraneous? Check (B)+ with {B->E, C->B, C->E}
    (B)+ = {B, E, C} (C->B adds B, then B->E adds E)
    C is in (B)+, so A is extraneous. Replace AB->C with B->C.

Minimal Cover: { B->C, C->B, C->E, B->E }

Note: B<->C (both determine each other), C->E, B->E
Simplified further: { B->C, C->E, B->E } (since B->C and C->B means B and C are equivalent)

lossless-join-check

╔══════════════════════════════════════════════════════════════╗
║  LOSSLESS JOIN DECOMPOSITION CHECK                           ║
╠══════════════════════════════════════════════════════════════╣

For R decomposed into R1 and R2:

Condition: R1 ∩ R2 -> R1  OR  R1 ∩ R2 -> R2

That is, the common attributes between R1 and R2 must
determine ALL attributes of at least one relation.

Example: R(A,B,C,D,E) decomposed into:
  R1(A,B,C)  and  R2(A,D,E)

R1 ∩ R2 = {A}  (common attribute)

Check: A -> {A,B,C} (i.e., A is a key for R1)?
If A -> BC holds in F, then this is a lossless decomposition.

If A -> BC does NOT hold, check: A -> {A,D,E} (key for R2)?
If A -> DE holds in F, then it is lossless.

If NEITHER holds, the decomposition is LOSSY (not allowed).

For multi-table decomposition R = R1 ∪ R2 ∪ ... ∪ Rn:
Apply lossless check pairwise or use the common attribute
graph / chase algorithm.

Interview Tip: Always check for dependency preservation after decomposition. A decomposition into BCNF may lose some functional dependencies, requiring them to be checked via joins at runtime. 3NF decompositionalways preserves dependencies. For most practical purposes, 3NF is sufficient and preferred. BCNF is used when zero redundancy is critical (e.g., financial systems).

🔄

Transactions & Recovery

TRANSACTIONS

ACID Property	Description	Example
Atomicity	Transaction is all-or-nothing; cannot be partially completed	Transfer &dollar;100: debit AND credit must both succeed or both fail
Consistency	Transaction takes DB from one valid state to another	Total balance before = after (debit + credit = 0)
Isolation	Concurrent transactions do not interfere with each other	Two users transferring money simultaneously see correct results
Durability	Once committed, changes survive system failures	After COMMIT, power failure does not lose the transaction

Transaction States

State	Description	Trigger
Active	Transaction executing normally	After BEGIN TRANSACTION
Partially Committed	Last statement executed, awaiting final commit	After last SQL statement
Committed	Transaction successfully completed	After COMMIT (survives failures)
Failed	Transaction cannot continue	Error during execution
Aborted	Transaction rolled back, changes undone	After ROLLBACK
Terminated	Transaction left the system	After commit or abort cleanup

transaction-example.sql

-- ═══ Transaction in SQL ═══

BEGIN TRANSACTION;

-- Debit Account A
UPDATE Account SET balance = balance - 100
WHERE account_id = 'A';

-- Credit Account B
UPDATE Account SET balance = balance + 100
WHERE account_id = 'B';

-- Check constraint: neither account should go negative
-- If balance would be negative, this triggers an error
-- and the transaction is rolled back

COMMIT;  -- both updates permanent

-- If error occurs:
-- ROLLBACK;  -- both updates undone

-- ═══ Savepoints ═══
BEGIN TRANSACTION;

UPDATE Account SET balance = balance - 50 WHERE account_id = 'A';
SAVEPOINT sp1;

UPDATE Account SET balance = balance - 30 WHERE account_id = 'A';
SAVEPOINT sp2;

-- If second debit was wrong, rollback to sp1 (keep first debit)
ROLLBACK TO sp1;

COMMIT;  -- only first debit is permanent

-- ═══ Isolation Levels ═══
-- SET TRANSACTION ISOLATION LEVEL <level>;
-- READ UNCOMMITTED: dirty reads allowed (lowest isolation)
-- READ COMMITTED:  only committed data visible (prevents dirty read)
-- REPEATABLE READ: same query returns same rows (prevents non-repeatable read)
-- SERIALIZABLE:    fully isolated, transactions appear sequential (prevents phantom read)

Serializability

Type	Definition	Method
Conflict Serializable	Schedule can be reordered to a serial schedule by swapping non-conflicting ops	Check if precedence graph is acyclic (DAG)
View Serializable	Same initial reads, same writes, same final writes as some serial schedule	More complex; all conflict-serializable are view-serializable but not vice versa

Concurrency Control Methods

Method	How It Works	Pros/Cons
2PL (Two-Phase Locking)	Growing phase (acquire locks) then Shrinking phase (release locks)	Simple, guarantees serializability; may cause deadlocks
Timestamp Ordering	Assign timestamps; older TXN wins conflicts	No deadlocks; may cause starvation
Optimistic (OCC)	Read freely, validate before commit, abort if conflict	Best for low-conflict workloads; wastes work on abort
Multi-version (MVCC)	Each TXN sees a snapshot of data	Readers never block writers; PostgreSQL uses this

Recovery Techniques

Technique	Description	Use Case
Undo Log	Records before-images; used to reverse uncommitted changes	Rollback aborted transactions
Redo Log	Records after-images; used to replay committed changes	Recover committed transactions after crash
Undo/Redo Log	Records both before and after images	Full recovery support (ARIES uses this)
WAL (Write-Ahead Logging)	Log must be written to stable storage BEFORE data is written	Ensures recoverability; prevents lost updates
Checkpointing	Periodically flush dirty pages and write checkpoint record	Reduces recovery time (less log to replay)
ARIES	Advanced Recovery: WAL + steal/no-force + 3-pass recovery (analysis, redo, undo)	Industry standard (IBM DB2, SQL Server, PostgreSQL)

aries-recovery

╔══════════════════════════════════════════════════════════════╗
║  ARIES Recovery Algorithm (3 Passes)                         ║
╠══════════════════════════════════════════════════════════════╣

PRE-RECOVERY CONCEPTS:
  - Steal Policy: Uncommitted pages CAN be written to disk (flushed)
  - No-Force Policy: Committed pages DON'T have to be written to disk
  - This maximizes buffer utilization but requires careful recovery

LOG RECORD FORMAT: [LSN, TXN_ID, PageID, UndoInfo, RedoInfo, PrevLSN]

PASS 1 — ANALYSIS:
  Purpose: Find dirty pages and active transactions at crash time.
  Steps:
  1. Start from last checkpoint record.
  2. Scan forward through log.
  3. Build:
     - Dirty Page Table (DPT): pages modified since checkpoint
     - Transaction Table (TT): transactions and their status
     - Active transactions at crash -> added to loser list (need undo)

PASS 2 — REDO:
  Purpose: Replay ALL logged changes to restore DB to crash state.
  Steps:
  1. Start from earliest LSN in DPT.
  2. Scan forward to end of log.
  3. For each redo-able record:
     - If page LSN < record LSN, reapply the change.
  4. This restores DB to exactly the state at crash time.
  Note: Redo is idempotent (safe to reapply).

PASS 3 — UNDO:
  Purpose: Reverse effects of uncommitted (aborted/active) transactions.
  Steps:
  1. Process loser transactions in reverse LSN order.
  2. For each record of loser TXN: undo the change.
  3. Write CLRs (Compensation Log Records) for undos.
  4. CLR allows redo of undos if system crashes during undo.

RESULT: DB is in a consistent state with only committed transactions applied.

⚠️

WAL Protocol Rule: Before a dirty page is flushed to disk, all log records up to that page's LSN must be on stable storage. This ensures that even if the system crashes mid-flush, we can redo or undo from the log.COMMIT also requires the commit log record to be on stable storage before acknowledging success.

🔒

Concurrency Control

CONCURRENCY

Problem	Description	Example
Lost Update	Two transactions update same data; one overwrites the other	T1 and T2 both read balance=100, T1 writes 90, T2 writes 80 (T1's update lost)
Dirty Read	Reading uncommitted data that may be rolled back	T1 updates row but doesn't commit; T2 reads the uncommitted value
Unrepeatable Read	Same query returns different data within a transaction	T1 reads row; T2 updates same row and commits; T1 re-reads and gets different value
Phantom Read	New rows appear/disappear in repeated range queries	T1 reads 5 rows; T2 inserts a new row; T1 re-reads and gets 6 rows
Write Skew	Two transactions read overlapping data and update different rows, violating constraint	T1 reads A and B; T2 reads A and B; T1 writes A, T2 writes B (sum constraint violated)

Lock Types

Lock	Compatible With	Purpose
Shared (S / Read)	S-lock, IS-lock	Multiple readers can hold S-lock simultaneously
Exclusive (X / Write)	None (exclusive)	Only one TXN can write; blocks all others
Intent Shared (IS)	IS, IX, S	Intends to read children; placed on parent node
Intent Exclusive (IX)	IS, IX	Intends to write children; placed on parent node
Six (SIX)	IS, S	Holding S-lock + intends to upgrade to X on some children
Upgrade (U)	S, IS	Used when trying to upgrade S-lock to X-lock

Lock Compatibility Matrix

	S	X	IS	IX	SIX
S	✔	✘	✔	✘	✘
X	✘	✘	✘	✘	✘
IS	✔	✘	✔	✔	✔
IX	✘	✘	✔	✔	✘
SIX	✘	✘	✔	✘	✘

concurrency-problems-example

╔══════════════════════════════════════════════════════════════╗
║  CONCURRENCY PROBLEMS — ILLUSTRATED                         ║
╠══════════════════════════════════════════════════════════════╣

LOST UPDATE:
  T1: read(X)     →  X = 100
  T2: read(X)     →  X = 100
  T1: write(X=90) →  X = 90   (T1 deducted 10)
  T2: write(X=80) →  X = 80   (T2 deducted 20, but used stale value!)
  Result: T1's update is LOST. Should be 70.

DIRTY READ:
  T1: read(A)     →  A = 1000
  T1: write(A=900) →  A = 900  (NOT committed)
  T2: read(A)     →  A = 900   (reads uncommitted data!)
  T1: ROLLBACK    →  A = 1000  (restored)
  Result: T2 made decisions based on invalid data (900).

UNREPEATABLE READ:
  T1: read(A)     →  A = 1000
  T2: update A=900, COMMIT
  T1: read(A)     →  A = 900   (different from first read!)
  Result: T1 sees different data within its own transaction.

PHANTOM READ:
  T1: SELECT * FROM Emp WHERE dept='CS'   →  returns 5 rows
  T2: INSERT INTO Emp VALUES (99, 'New', 'CS'), COMMIT
  T1: SELECT * FROM Emp WHERE dept='CS'   →  returns 6 rows!
  Result: New "phantom" row appeared between reads.

Two-Phase Locking (2PL) Variants

Variant	Rule	Deadlock-free?	Guarantee
Basic 2PL	Growing phase: acquire locks. Shrinking phase: release locks.	No	Conflict serializable
Strict 2PL	All exclusive locks held until commit/abort.	No	Conflict serializable + no cascading aborts
Rigorous 2PL	ALL locks (shared + exclusive) held until commit/abort.	No	Conflict serializable + simplest recovery
Conservative 2PL	ALL locks acquired BEFORE any operation starts.	Yes (no wait)	Conflict serializable + deadlock-free

Deadlock Handling Techniques

Technique	Description	Pros/Cons
Deadlock Prevention	Ensure at least one necessary condition cannot hold	May reduce concurrency; timestamp-based ordering
Deadlock Avoidance	Check if granting a lock is safe before granting (Banker's algorithm analog)	Requires advance knowledge of lock needs; overhead
Deadlock Detection & Recovery	Allow deadlocks, detect using wait-for graph, then abort one TXN	Better concurrency; rollback cost on detection
Timeout	If a TXN waits too long, abort it	Simple but may abort non-deadlocked TXNs

deadlock-prevention

╔══════════════════════════════════════════════════════════════╗
║  DEADLOCK PREVENTION: Wait-Die vs Wound-Wait                 ║
╠══════════════════════════════════════════════════════════════╣
Both use timestamps to decide which TXN "wins" in a conflict.
Older TXN (smaller timestamp) has higher priority.

WAIT-DIE (Non-preemptive — older TXN waits, younger dies):
  ┌─────────────────────────────────────────────────────┐
  │ If Ti requests lock held by Tj:                      │
  │   If Ti is OLDER than Tj → Ti WAITS                 │
  │   If Ti is YOUNGER than Tj → Ti ABORTS (dies)       │
  └─────────────────────────────────────────────────────┘
  Property: Only younger TXNs are rolled back.
  No starvation: a TXN only dies for older ones, and it ages.
  Deadlock-free: cycle would require older waiting for younger.

WOUND-WAIT (Preemptive — older TXN wounds, younger waits):
  ┌─────────────────────────────────────────────────────┐
  │ If Ti requests lock held by Tj:                      │
  │   If Ti is OLDER than Tj → Tj is ABORTED (wounded)  │
  │   If Ti is YOUNGER than Tj → Ti WAITS               │
  └─────────────────────────────────────────────────────┘
  Property: Only younger TXNs are rolled back.
  No starvation: same reasoning as wait-die.
  Deadlock-free: same reasoning.

COMPARISON:
  - Both prevent deadlocks and ensure no starvation.
  - Wait-Die: younger TXN rolls back immediately (wastes less work).
  - Wound-Wait: older TXN forces rollback of younger (may be more disruptive).
  - In practice: Wait-Die is preferred as fewer rollbacks.

Isolation Level vs Lock: In practice, databases use MVCC + locks. READ COMMITTED releases S-locks after each row read.REPEATABLE READ holds S-locks until TXN end. SERIALIZABLE uses range locks or predicate locks to prevent phantom reads. PostgreSQL uses snapshot isolation for REPEATABLE READ (true serializability requires SERIALIZABLE).

📑

Indexing & Hashing

INDEXING

Index Type	Description	Use Case
Primary Index	Index on ordered key field; sparse (one per block)	Main table access on primary key
Clustering Index	Index on non-key ordered field; sparse	Physical ordering by search key
Secondary Index	Index on non-ordered field; dense (one per record)	Alternate access paths, non-key lookups
Dense Index	Entry for every search key value in the file	Fast lookup, more space
Sparse Index	Entry only for some search keys (block anchors)	Less space, may need block scan
Multilevel Index	Index of indexes (like B+ tree upper levels)	Handle large files efficiently

b-tree-structure

╔══════════════════════════════════════════════════════════════╗
║  B-TREE (Order m)                                            ║
╠══════════════════════════════════════════════════════════════╣

PROPERTIES:
  - Every node has at most m children.
  - Every non-leaf node (except root) has at least ceil(m/2) children.
  - Root has at least 2 children (if not a leaf).
  - A non-leaf node with k children has k-1 keys.
  - All leaves appear at the SAME level (balanced).
  - Data pointers are in BOTH internal nodes AND leaf nodes.

B-TREE of Order 3 (each node: 2 keys, 3 pointers max):
                  [30 | 60]
                 /    |               [10|20]  [40|50]  [70|80|90]

INSERTION EXAMPLE (Order 3, max 2 keys per node):
  Insert: 10, 20, 30, 40, 50, 60, 70, 80

  Step 1: Insert 10        →  [10]
  Step 2: Insert 20        →  [10 | 20]
  Step 3: Insert 30        →  OVERFLOW! Split:
          [10 | 20 | 30] → promote 20 to new root:
                [20]
               /                [10]   [30]

  Step 4: Insert 40 → [20], [10], [30|40]
  Step 5: Insert 50 → [20], [10], [30|40|50] → Split!
          promote 40:
                [20 | 40]
               /    |                 [10]  [30]   [50]

  Step 6: Insert 60 → [20|40], [10], [30], [50|60]
  Step 7: Insert 70 → [20|40], [10], [30], [50|60|70] → Split!
          promote 60:
                [20 | 40 | 60]  → Root overflow! Split root:
                promote 40:
                     [40]
                    /                     [20]    [60]
                /      /                [10] [30] [50] [70]

  Step 8: Insert 80 → [40], [20], [60], [10], [30], [50], [70|80]

SEARCH: O(log n) — start at root, compare, go to appropriate child.
Complexity: h <= log_n(m) where n is number of keys.

b-plus-tree-structure

╔══════════════════════════════════════════════════════════════╗
║  B+ TREE (Order m)                                           ║
╠══════════════════════════════════════════════════════════════╣

DIFFERENCES FROM B-TREE:
  1. Data pointers exist ONLY in leaf nodes (internal = index only).
  2. Leaves are linked as a sorted linked list (sequential access).
  3. Internal nodes can have MORE keys (only for navigation).
  4. Better for range queries and sequential access.

B+ TREE of Order 3:
                [30 | 60]          ← Internal (index only)
               /    |               [10|20]  [40|50]  [70|80|90]  ← Leaf (data + pointers)
              ↕
        [10] ↔ [20] ↔ [40] ↔ [50] ↔ [70] ↔ [80] ↔ [90]
         ↑                                          ↑
        First                                      Last
        Leaf                                      Leaf

INSERTION (Order 3, max 2 keys in leaf, max 1 key in internal):
  Insert: 5, 15, 25, 35, 45

  Step 1: Insert 5    →  Leaf: [5]
  Step 2: Insert 15   →  Leaf: [5 | 15]
  Step 3: Insert 25   →  Leaf overflow [5|15|25] → Split!
          Leaf 1: [5], Leaf 2: [15|25]
          Copy-up 15 to internal: [15]

  Step 4: Insert 35   →  Compare with 15 → go right → Leaf: [15|25|35] → Split!
          Leaf 2: [25], Leaf 3: [35]
          Internal: [15 | 25]

  Step 5: Insert 45   →  Compare → go rightmost → Leaf: [35|45]

SEARCH:
  Point Query: Start at root, navigate to leaf. O(log n).
  Range Query: Find start key, then follow leaf pointers. O(log n + k).

DELETION:
  1. Find the leaf node containing the key.
  2. Remove the key.
  3. If underflow (too few keys):
     a. Try to redistribute from sibling.
     b. If not possible, merge with sibling.
  4. Update parent (may cause parent underflow → cascade).

B-Tree vs B+ Tree

Feature	B-Tree	B+ Tree
Data Pointers	In all nodes	Only in leaf nodes
Search	May stop early	Always reaches leaf
Duplicate Keys	Keys appear once	Keys copied up to internal
Leaf Links	No linked list	Leaves linked sequentially
Range Queries	Slower (inorder traversal)	Fast (follow leaf pointers)
Tree Height	Shorter (data in internal)	Slightly taller
Disk I/O	More for range queries	Fewer for sequential
Usage	General purpose	Databases, file systems (most common)

Hashing Methods

Method	Description	Pros/Cons
Static Hashing	Fixed number of buckets; hash(key) -> bucket	Fast O(1); fixed size, overflow chains
Dynamic Hashing	Buckets grow/shrink as data changes	Adapts to data size; more complex
Extendible Hashing	Directory doubles when bucket overflows; local depth	No chaining; directory may grow large
Linear Hashing	Buckets split sequentially (round-robin); no directory	Gradual splits; good for growing data

extendible-hashing

╔══════════════════════════════════════════════════════════════╗
║  EXTENDIBLE HASHING EXAMPLE                                  ║
╠══════════════════════════════════════════════════════════════╣

Hash Function: h(key) → binary string
Global Depth (gd): number of bits used in directory
Local Depth (ld): number of bits relevant for each bucket

INITIAL STATE (gd=1, ld=1 per bucket):
  Directory:
    0 → Bucket A
    1 → Bucket B

INSERT 6 (h=110): last 1 bit = 0 → Bucket A → [6]
INSERT 1 (h=001): last 1 bit = 1 → Bucket B → [1]
INSERT 11 (h=1011): last 1 bit = 1 → Bucket B → [1, 11]

INSERT 12 (h=1100): last 1 bit = 0 → Bucket A → [6, 12]
Assume bucket capacity = 2, so [6, 12] is full.

INSERT 16 (h=10000): last 1 bit = 0 → Bucket A → OVERFLOW!
Split Bucket A:
  - Bucket A local depth: 1 → 2
  - Global depth: 1 → 2
  - Rehash: 6 (110) last 2 bits = 10, 12 (1100) last 2 bits = 00
  - 16 (10000) last 2 bits = 00

Directory (gd=2):
  00 → Bucket A' [12, 16]
  01 → Bucket B  [1, 11]
  10 → Bucket A'' [6]
  11 → Bucket B  [1, 11]  (shared pointer)

KEY INSIGHT: Directory doubles on overflow, but only the overflowing
bucket splits. Shared pointers keep directory size manageable.

When to use B+ Tree vs Hash: Use B+ Tree for range queries, ordered access, and when data changes frequently. Use Hash for exact-match queries (=, <IN>) where O(1) access is critical. Most databases use B+ trees as default; hash indexes are secondary. PostgreSQL supports both. Hash indexes cannot handle range queries or sorting.

💾

Storage & Query Optimization

STORAGE

Organization	Description	Pros / Cons
Heap (Unordered)	Records stored in insertion order	Fast insert; slow search (full scan)
Sequential (Sorted)	Records ordered by key field	Fast search (binary); slow insert (reorder)
Hashed	Records placed by hash function	O(1) exact lookup; no range queries
Clustered	Physically store related records together	Fast join on cluster key; one table per cluster key
B+ Tree Indexed	Heap file + B+ tree index	Balanced: good for both point and range queries

RAID Levels Comparison

RAID	Min Disks	Redundancy	Read Perf	Write Perf	Capacity	Key Feature
0	1	None	Fast	Fast	n disks	Striping only; no fault tolerance
1	2	Full mirror	Fast	Moderate	n/2 disks	Exact copy; 100% redundancy
2	3	Hamming code	Fast	Slow	n-1 disks	Bit-level striping + ECC
3	3	Byte-level parity	Fast	Slow	n-1 disks	Byte-level + dedicated parity disk
4	3	Block-level parity	Fast	Slow	n-1 disks	Block-level + dedicated parity disk
5	3	Distributed parity	Fast	Moderate	n-1 disks	Parity distributed across all disks
6	4	Dual parity	Fast	Slow	n-2 disks	Can survive 2 disk failures
10	4	Mirrored stripe	Very Fast	Moderate	n/2 disks	RAID 0 + RAID 1; best read performance

Buffer Management

Policy	Description	Use Case
LRU (Least Recently Used)	Replace page that was least recently accessed	General purpose; good for locality
MRU (Most Recently Used)	Replace page that was most recently used	Sequential scans (revisit unlikely)
Clock / Second Chance	Circular list with reference bit; similar to LRU	Practical compromise; widely used
LFU (Least Frequently Used)	Replace page accessed fewest times	Stable access patterns; slow to adapt
Pin / No-Replace	Mark pages as pinned (cannot be replaced)	Active transaction pages, log pages

query-optimization

╔══════════════════════════════════════════════════════════════╗
║  QUERY OPTIMIZATION                                         ║
╠══════════════════════════════════════════════════════════════╣

GOAL: Find the most efficient execution plan for a SQL query.

QUERY PROCESSING PHASES:
  1. Parsing & Translation  →  Relational Algebra expression
  2. Optimization           →  Best equivalent algebra expression
  3. Evaluation             →  Execute the query plan

EQUIVALENCE RULES (transform the relational algebra tree):
  ┌───────────────────────────────────────────────────────┐
  │ 1. Selection Split:                                    │
  │    σc1 AND c2(R) = σc1(σc2(R))                         │
  │                                                       │
  │ 2. Selection Commute:                                  │
  │    σc1(σc2(R)) = σc2(ρc1(R))                       │
  │                                                       │
  │ 3. Selection Pushdown (EARLY SELECTION):               │
  │    σc(R ⋈ S) = σc(R) ⋈ S (if c uses only R attrs) │
  │    This reduces the size of intermediate results!      │
  │                                                       │
  │ 4. Join Commute:                                       │
  │    R ⋈ S = S ⋈ R                                    │
  │                                                       │
  │ 5. Join Associativity:                                 │
  │    (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)                  │
  │                                                       │
  │ 6. Projection Pushdown (EARLY PROJECTION):             │
  │    Push π down the tree to reduce column size early  │
  └───────────────────────────────────────────────────────┘

EXAMPLE:
  Query: SELECT name FROM Emp WHERE dept='CS' AND salary > 50000
         JOIN Dept ON Emp.dept_id = Dept.dept_id

  Bad Plan:
    1. Cartesian product Emp x Dept
    2. Select where conditions
    3. Project name

  Good Plan (after optimization):
    1. Select Emp where dept='CS' (reduces rows early)
    2. Select result where salary > 50000
    3. Join with Dept (fewer rows to join)
    4. Project name

COST ESTIMATION:
  Cost = (disk I/O) + (CPU computation) + (memory usage)

  For a Selection σattr=value(R) with index on attr:
    - B+ Tree index: O(log n) disk accesses
    - Hash index: O(1) disk accesses
    - No index (full scan): O(b) where b = number of blocks

  For a Join R ⋈ S (R has nR tuples, S has nS tuples):
    - Nested Loop Join:   O(nR * nS)         — slowest
    - Block Nested Loop:  O(bR * bS)         — uses buffer blocks
    - Sort-Merge Join:    O(bR log bR + bS log bS) — good for sorted
    - Hash Join:          O(bR + bS)         — good for equijoin
    - Index Nested Loop:  O(nR * (log nS))   — if index on S

Join Algorithms Comparison

Algorithm	Best When	Cost	Notes
Nested Loop	Small tables, no indexes	O(nR * nS)	Worst but most general
Block Nested Loop	Medium tables, limited memory	O(bR * bS / M)	M = buffer blocks available
Index Nested Loop	One table has index on join key	O(nR * (log nS))	Good if index exists on inner table
Sort-Merge	Both tables large, output sorted	O(n log n)	Good for sorted output, range joins
Hash Join	Equi-join, large tables	O(nR + nS)	Best for large equi-joins; one-pass or two-pass

🚫

Query Optimization Heuristics: (1) Push SELECT down as early as possible to reduce row count. (2) Push PROJECT down to reduce column width. (3) Perform most restrictive selections first. (4) Compute CROSS JOIN / Cartesian products last (most expensive). (5) Use EXPLAIN / EXPLAIN ANALYZE to inspect actual query plans in production.

🎓

Interview Q&A

PLACEMENT QUESTIONS

Q1: Explain ACID properties with real-world analogies

Atomicity: Like a bank transfer — debiting and crediting are one unit. If one fails, both roll back. Ensures all-or-nothing execution using COMMIT and ROLLBACK.

Consistency: Like a Sudoku puzzle — every move must maintain validity. Total debits always equal credits. DB integrity constraints (PK, FK, CHECK) must hold before and after the transaction.

Isolation: Like voting booths — one person at a time, no peeking. Concurrent transactions execute as if serialized. Isolation levels control the tradeoff between consistency and performance.

Durability:Like a notarized contract — once signed (committed), it's permanent. Even if the power goes out, the committed transaction's effects survive. Implemented via WAL (Write-Ahead Logging) and checkpointing.

Q2: Walk through normalization from 1NF to BCNF step by step

1NF: Ensure every attribute is atomic (no repeating groups, no arrays/lists in columns). Example: If a student has multiple phone numbers, create separate rows or a related table.

2NF: Remove partial dependencies — non-prime attributes must depend on the entire primary key, not just a part of it. Only applies to tables with composite keys. Example: In Order(order_id, product_id, product_name, qty),product_name depends only on product_id (partial dependency). Decompose into Order and Product.

3NF: Remove transitive dependencies — non-prime attributes must depend directly on the primary key, not through other non-prime attributes. Example: In Employee(emp_id, name, dept_id, dept_name),dept_name -> depends on dept_id, not emp_id (transitive). Decompose into Employee and Department.

BCNF: For every FD X->A, X must be a super key. Stronger than 3NF — eliminates all redundancy. However, BCNF decomposition may not preserve all functional dependencies. 3NF is usually sufficient in practice.

Q3: What is the difference between B-Tree and B+ Tree?

Key difference: In a B-Tree, both internal nodes and leaf nodes contain data pointers (actual records/addresses). In a B+ Tree, only leaf nodes contain data; internal nodes contain only keys for navigation (copy-ups, not actual data).

Linked leaves: B+ Tree leaf nodes are connected as a sorted doubly-linked list, enabling efficient sequential and range access (critical for ORDER BY, BETWEEN, > / < queries). B-Trees require tree traversal for range queries.

Practical choice: B+ Tree is almost universally used in databases (MySQL InnoDB, PostgreSQL, Oracle) because of its superior range query performance and the ability to cache internal nodes (which are smaller since they lack data pointers). B-Trees are used in file systems and some NoSQL databases.

Q4: Explain Two-Phase Locking (2PL) and its variants

Core idea: A transaction has two phases — a growing phase (acquiring locks) and a shrinking phase (releasing locks). Once a lock is released, no new locks can be acquired. This ensures conflict serializability.

Strict 2PL: Exclusive (write) locks are held until COMMIT or ROLLBACK. This prevents cascading aborts (where aborting one TXN forces others to abort because they read uncommitted data). Most databases use this.

Rigorous 2PL: ALL locks (both shared and exclusive) held until end. Simplest recovery but more restrictive concurrency.Conservative 2PL: All locks acquired before any operation; prevents deadlocks but requires knowing all needed locks in advance.

Trade-off: 2PL guarantees serializability but does not prevent deadlocks. Use deadlock detection (wait-for graph) or prevention (wait-die, wound-wait) to handle deadlocks. Many modern databases use MVCC instead of strict locking for better concurrency.

Q5: DBMS vs RDBMS vs NoSQL — when to use what?

DBMS (Database Management System): The broad category of any software that manages data. Includes RDBMS, NoSQL, NewSQL, graph databases, etc. Not a specific technology.

RDBMS (Relational DBMS): Stores data in tables with rows and columns. Uses SQL. Enforces ACID via schemas, foreign keys, and constraints. Examples: MySQL, PostgreSQL, Oracle, SQL Server.Use when: Data is structured, relationships matter, consistency is critical (banking, ERP, e-commerce orders).

NoSQL: Non-relational databases that trade ACID consistency for availability, partition tolerance (CAP theorem), and flexible schemas. Types: Document (MongoDB), Key-Value (Redis), Column-Family (Cassandra), Graph (Neo4j).Use when: Data is unstructured/semi-structured, massive scale, high write throughput, flexible schema needed (social feeds, IoT, real-time analytics).

Key trade-off: RDBMS = strong consistency + structured schema. NoSQL = high availability + flexible schema + horizontal scalability. NewSQL (CockroachDB, Spanner) attempts to combine both.

Q6: How do you convert an ER diagram to a relational schema?

Strong Entity -> Table:Each strong entity becomes a table. Attributes become columns. Primary key is the entity's key attribute.

Weak Entity -> Table:Create a table with the weak entity's attributes plus the owner's primary key as a foreign key. The composite key is (owner_PK, partial_key). The relationship connecting them is not represented as a separate table.

1:1 Relationship:Add the foreign key of one entity into the other's table (prefer adding to the table with total participation, or the one with fewer rows to save space).

1:N Relationship:Add the primary key of the "one" side as a foreign key in the "many" side table. No separate table needed.

M:N Relationship: Create a separate bridge/junction table with the primary keys of both participating entities. The combination of both foreign keys forms the composite primary key. Add any relationship attributes to this table.

ISA Hierarchy: Method 1: Create one table per entity (with shared attributes in parent; child tables reference parent PK via FK). Method 2: Create one table with all attributes + a type discriminator column. Method 1 is preferred for normalization.

DBMS Placement Prep Tips: Know normalization steps by heart (able to decompose any table to 3NF/BCNF). Practice drawing ER diagrams and converting them. Understand ACID deeply with real examples. Be able to explain concurrency problems (lost update, dirty read, phantom read) and how each isolation level prevents them. Know B+ tree operations (insert/search). Understand RAID levels, join algorithms, and query optimization heuristics. Practice with SQL joins, subqueries, and window functions.

⏳

Loading cheatsheet...

	S	X	IS	IX	SIX
S	✔	✘	✔	✘	✘
X	✘	✘	✘	✘	✘
IS	✔	✘	✔	✔	✔
IX	✘	✘	✔	✔	✘
SIX	✘	✘	✔	✘	✘

	S	X	IS	IX	SIX
S	✔	✘	✔	✘	✘
X	✘	✘	✘	✘	✘
IS	✔	✘	✔	✔	✔
IX	✘	✘	✔	✔	✘
SIX	✘	✘	✔	✘	✘

	S	X	IS	IX	SIX
S	✔	✘	✔	✘	✘
X	✘	✘	✘	✘	✘
IS	✔	✘	✔	✔	✔
IX	✘	✘	✔	✔	✘
SIX	✘	✘	✔	✘	✘

	S	X	IS	IX	SIX
S	✔	✘	✔	✘	✘
X	✘	✘	✘	✘	✘
IS	✔	✘	✔	✔	✔
IX	✘	✘	✔	✔	✘
SIX	✘	✘	✔	✘	✘

	S	X	IS	IX	SIX
S	✔	✘	✔	✘	✘
X	✘	✘	✘	✘	✘
IS	✔	✘	✔	✔	✔
IX	✘	✘	✔	✔	✘
SIX	✘	✘	✔	✘	✘

	S	X	IS	IX	SIX
S	✔	✘	✔	✘	✘
X	✘	✘	✘	✘	✘
IS	✔	✘	✔	✔	✔
IX	✘	✘	✔	✔	✘
SIX	✘	✘	✔	✘	✘