SQL fundamentals, joins, subqueries, window functions and interview-ready query patterns.
This cheatsheet uses a consistent sample schema across all sections. All queries work on standard SQL (PostgreSQL/MySQL compatible).
-- ═══════════════════════════════════════════════════
-- SAMPLE SCHEMA (used throughout this cheatsheet)
-- ═══════════════════════════════════════════════════
CREATE TABLE departments (
dept_id INT PRIMARY KEY AUTO_INCREMENT,
dept_name VARCHAR(100) NOT NULL UNIQUE,
location VARCHAR(100),
budget DECIMAL(15,2)
);
CREATE TABLE employees (
emp_id INT PRIMARY KEY AUTO_INCREMENT,
emp_name VARCHAR(100) NOT NULL,
email VARCHAR(150) UNIQUE,
salary DECIMAL(10,2),
dept_id INT,
manager_id INT,
hire_date DATE,
is_active BOOLEAN DEFAULT TRUE,
FOREIGN KEY (dept_id) REFERENCES departments(dept_id),
FOREIGN KEY (manager_id) REFERENCES employees(emp_id)
);
CREATE TABLE salaries (
salary_id INT PRIMARY KEY AUTO_INCREMENT,
emp_id INT NOT NULL,
amount DECIMAL(10,2) NOT NULL,
from_date DATE NOT NULL,
to_date DATE,
FOREIGN KEY (emp_id) REFERENCES employees(emp_id)
);
CREATE TABLE projects (
project_id INT PRIMARY KEY AUTO_INCREMENT,
project_name VARCHAR(100) NOT NULL,
dept_id INT,
start_date DATE,
end_date DATE,
budget DECIMAL(15,2),
FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);
-- Sample data
INSERT INTO departments (dept_name, location, budget) VALUES
('Engineering', 'San Francisco', 5000000),
('Marketing', 'New York', 2000000),
('Sales', 'Chicago', 3000000),
('HR', 'Remote', 1000000),
('Finance', 'New York', 2500000);
INSERT INTO employees (emp_name, salary, dept_id, manager_id, hire_date) VALUES
('Alice', 120000, 1, NULL, '2020-01-15'),
('Bob', 95000, 1, 1, '2020-03-20'),
('Charlie', 85000, 2, 1, '2021-06-10'),
('Diana', 110000, 1, 1, '2019-11-01'),
('Eve', 75000, 3, NULL, '2022-01-05'),
('Frank', 90000, 2, 3, '2021-09-15'),
('Grace', NULL, 4, NULL, '2023-02-20'),
('Henry', 105000, 1, 1, '2020-07-01'),
('Ivy', 70000, 5, NULL, '2022-08-10'),
('Jack', 80000, 3, 5, '2023-01-15');-- ═══════════════════════════════════════════════════
-- SELECT & WHERE
-- ═══════════════════════════════════════════════════
-- Basic SELECT
SELECT emp_name, salary FROM employees;
-- SELECT with WHERE clauses
SELECT * FROM employees
WHERE salary > 90000
AND dept_id = 1
AND hire_date >= '2020-01-01';
-- Comparison operators: =, <>, !=, <, >, <=, >=
-- Logical: AND, OR, NOT, BETWEEN, IN, LIKE, IS NULL
SELECT * FROM employees
WHERE salary BETWEEN 80000 AND 100000;
-- IN operator
SELECT * FROM employees
WHERE dept_id IN (1, 2, 3);
-- LIKE pattern matching (% = any chars, _ = one char)
SELECT * FROM employees
WHERE emp_name LIKE 'A%'; -- starts with A
SELECT * FROM employees
WHERE emp_name LIKE '%ar%'; -- contains 'ar'
-- NOT LIKE, NOT IN, NOT BETWEEN
SELECT * FROM employees
WHERE dept_id NOT IN (4, 5);-- ═══════════════════════════════════════════════════
-- ORDER BY, DISTINCT, LIMIT / OFFSET
-- ═══════════════════════════════════════════════════
-- ORDER BY (ASC default, DESC for descending)
SELECT * FROM employees
ORDER BY salary DESC;
-- Multi-column ordering
SELECT * FROM employees
ORDER BY dept_id ASC, salary DESC;
-- DISTINCT (remove duplicates)
SELECT DISTINCT dept_id FROM employees;
-- DISTINCT with multiple columns
SELECT DISTINCT dept_id, manager_id FROM employees;
-- LIMIT (MySQL/PostgreSQL) — pagination
SELECT * FROM employees
ORDER BY salary DESC
LIMIT 5; -- top 5 highest paid
-- LIMIT with OFFSET — page 2 of 5 results
SELECT * FROM employees
ORDER BY salary DESC
LIMIT 5 OFFSET 5; -- rows 6-10
-- Aliases (AS keyword, optional for columns)
SELECT emp_name AS name,
salary AS annual_salary,
salary * 12 AS total_compensation
FROM employees
ORDER BY total_compensation DESC;
-- Table aliases
SELECT e.emp_name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;-- ═══════════════════════════════════════════════════
-- NULL HANDLING
-- ═══════════════════════════════════════════════════
-- NULL is NOT equal to anything, not even NULL
-- Always use IS NULL / IS NOT NULL
SELECT * FROM employees WHERE salary IS NULL;
SELECT * FROM employees WHERE manager_id IS NULL;
-- COALESCE — returns first non-NULL value
SELECT emp_name,
COALESCE(salary, 0) AS salary,
COALESCE(manager_id, -1) AS manager
FROM employees;
-- NULLIF — returns NULL if two values are equal
-- Useful for preventing division by zero
SELECT emp_name,
salary / NULLIF(dept_avg, 0) AS ratio
FROM employees e
JOIN (SELECT dept_id, AVG(salary) AS dept_avg
FROM employees GROUP BY dept_id) d
ON e.dept_id = d.dept_id;
-- IFNULL (MySQL) / COALESCE (standard SQL)
-- IFNULL(salary, 0) -- MySQL only
-- COALESCE(salary, 0) -- works everywhere-- ═══════════════════════════════════════════════════
-- CASE WHEN EXPRESSIONS
-- ═══════════════════════════════════════════════════
-- Simple CASE
SELECT emp_name,
CASE dept_id
WHEN 1 THEN 'Engineering'
WHEN 2 THEN 'Marketing'
WHEN 3 THEN 'Sales'
ELSE 'Other'
END AS department
FROM employees;
-- Searched CASE (more flexible)
SELECT emp_name, salary,
CASE
WHEN salary >= 100000 THEN 'Senior'
WHEN salary >= 80000 THEN 'Mid-Level'
WHEN salary >= 60000 THEN 'Junior'
ELSE 'Intern'
END AS level,
CASE
WHEN salary >= 100000 THEN salary * 0.10
WHEN salary >= 80000 THEN salary * 0.08
ELSE salary * 0.05
END AS bonus
FROM employees;
-- CASE in ORDER BY
SELECT * FROM employees
ORDER BY
CASE dept_id
WHEN 1 THEN 1
WHEN 2 THEN 2
ELSE 3
END,
salary DESC;
-- CASE with aggregation
SELECT dept_id,
COUNT(*) AS total,
SUM(CASE WHEN salary > 90000 THEN 1 ELSE 0 END) AS high_earners,
SUM(CASE WHEN salary IS NULL THEN 1 ELSE 0 END) AS no_salary
FROM employees
GROUP BY dept_id;| Type | Description | Example |
|---|---|---|
| INT / INTEGER | Whole numbers (-2B to 2B) | 42 |
| BIGINT | Large whole numbers | 9007199254740991 |
| DECIMAL(p,s) | Exact fixed-point numbers | DECIMAL(10,2) |
| FLOAT / DOUBLE | Approximate floating point | 3.14159 |
| VARCHAR(n) | Variable-length string | VARCHAR(255) |
| CHAR(n) | Fixed-length string | CHAR(10) |
| TEXT | Unlimited length string | Long text... |
| BOOLEAN | True / False | TRUE / FALSE |
| DATE | Calendar date (no time) | '2024-01-15' |
| TIME | Time of day | '14:30:00' |
| TIMESTAMP | Date + time + timezone | NOW() |
| JSON / JSONB | JSON data (PostgreSQL) | {"key": "value"} |
| UUID | Universally unique identifier | uuid_generate_v4() |
| BLOB / BYTEA | Binary large object | File data |
| Order | Clause | Description |
|---|---|---|
| 1 | FROM / JOIN | Determine source tables |
| 2 | WHERE | Filter rows before grouping |
| 3 | GROUP BY | Group rows for aggregation |
| 4 | HAVING | Filter groups after aggregation |
| 5 | SELECT | Choose columns / compute values |
| 6 | DISTINCT | Remove duplicate rows |
| 7 | ORDER BY | Sort the result set |
| 8 | LIMIT / OFFSET | Paginate the result |
-- ═══════════════════════════════════════════════════
-- INNER JOIN — only matching rows from both tables
-- ═══════════════════════════════════════════════════
SELECT e.emp_name, d.dept_name, e.salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
-- Returns only employees who have a department
-- AND departments that have employees-- ═══════════════════════════════════════════════════
-- LEFT JOIN — all rows from left table + matching right
-- ═══════════════════════════════════════════════════
SELECT e.emp_name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;
-- All employees shown, even those without a department
-- d.dept_name will be NULL for unmatched rows
-- LEFT JOIN with filtering for unmatched (no department)
SELECT e.emp_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;
-- Employees who don't belong to any department-- ═══════════════════════════════════════════════════
-- RIGHT JOIN — all rows from right table + matching left
-- ═══════════════════════════════════════════════════
SELECT e.emp_name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
-- All departments shown, even those with no employees
-- ═══════════════════════════════════════════════════
-- FULL OUTER JOIN — all rows from both tables
-- ═══════════════════════════════════════════════════
SELECT e.emp_name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;
-- Everything: employees with no dept, depts with no employees
-- ═══════════════════════════════════════════════════
-- CROSS JOIN — Cartesian product (every combination)
-- ═══════════════════════════════════════════════════
SELECT e.emp_name, d.dept_name
FROM employees e
CROSS JOIN departments d;
-- If 10 employees x 5 departments = 50 rows
-- ═══════════════════════════════════════════════════
-- SELF JOIN — join a table with itself
-- ═══════════════════════════════════════════════════
-- Find employees and their managers
SELECT e.emp_name AS employee,
m.emp_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
-- Find employees who earn more than their manager
SELECT e.emp_name AS employee,
e.salary AS emp_salary,
m.emp_name AS manager,
m.salary AS mgr_salary
FROM employees e
JOIN employees m ON e.manager_id = m.emp_id
WHERE e.salary > m.salary;-- ═══════════════════════════════════════════════════
-- MULTI-TABLE JOIN (3+ tables)
-- ═══════════════════════════════════════════════════
SELECT e.emp_name,
d.dept_name,
p.project_name,
s.amount AS latest_salary
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
JOIN projects p ON p.dept_id = d.dept_id
JOIN salaries s ON s.emp_id = e.emp_id
WHERE e.is_active = TRUE
ORDER BY e.emp_name, p.project_name;
-- Join with aggregation
SELECT d.dept_name,
COUNT(e.emp_id) AS employee_count,
AVG(e.salary) AS avg_salary,
MAX(e.salary) AS max_salary
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_name
ORDER BY employee_count DESC;| Join Type | Left Table | Right Table | Result |
|---|---|---|---|
| INNER JOIN | A ∩ B | A ∩ B | Only matches in both |
| LEFT JOIN | All of A | Matched from B | All left + matched right |
| RIGHT JOIN | Matched from A | All of B | All right + matched left |
| FULL OUTER | All of A | All of B | Everything (union) |
| CROSS JOIN | All of A | All of B | Cartesian product |
| SELF JOIN | A matched | with A | Compare rows within table |
| Tip | Why It Matters |
|---|---|
| Add indexes on JOIN columns | Avoids full table scans |
| Filter early with WHERE | Reduces rows before the join |
| Use INNER JOIN over LEFT when possible | Smaller result set, faster |
| Avoid joining too many tables | Query planner gets exponential paths |
| Use EXISTS over IN for subqueries | Stops at first match |
| Prefer equi-joins (=) | Other operators are slower |
INNER JOIN and LEFT JOIN? Inner join returns only rows that match in BOTH tables. Left join returns ALL rows from the left table, with NULLs for non-matching right-side rows. Use left join when you need to find "orphans" (rows in the left table with no match on the right).-- ═══════════════════════════════════════════════════
-- SCALAR SUBQUERY — returns exactly one value
-- ═══════════════════════════════════════════════════
SELECT emp_name, salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
salary - (SELECT AVG(salary) FROM employees) AS diff
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- ═══════════════════════════════════════════════════
-- ROW SUBQUERY — returns a single row
-- ═══════════════════════════════════════════════════
SELECT emp_name, salary
FROM employees
WHERE (salary, dept_id) = (
SELECT MAX(salary), dept_id
FROM employees
GROUP BY dept_id
);
-- ═══════════════════════════════════════════════════
-- TABLE SUBQUERY — returns multiple rows/columns
-- Used in FROM clause (derived table / inline view)
-- ═══════════════════════════════════════════════════
SELECT dept_avg.dept_id, dept_avg.avg_salary
FROM (
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
) AS dept_avg
WHERE dept_avg.avg_salary > 90000;-- ═══════════════════════════════════════════════════
-- CORRELATED vs NON-CORRELATED SUBQUERY
-- ═══════════════════════════════════════════════════
-- NON-CORRELATED: runs once, independent of outer query
SELECT emp_name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Subquery runs once, returns one value
-- CORRELATED: runs once per outer row, references outer query
SELECT e.emp_name, e.salary, e.dept_id
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.dept_id = e.dept_id -- references outer e.dept_id
);
-- For each employee, compute their department's average
-- Performance concern: O(n) subquery executions!
-- Correlated subquery in SELECT
SELECT e.emp_name,
e.salary,
(SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.dept_id = e.dept_id) AS dept_avg
FROM employees e;-- ═══════════════════════════════════════════════════
-- EXISTS vs IN
-- ═══════════════════════════════════════════════════
-- IN: checks membership in a list
SELECT emp_name FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE location = 'New York');
-- EXISTS: returns TRUE if subquery finds at least one row
SELECT emp_name FROM employees e
WHERE EXISTS (
SELECT 1 FROM departments d
WHERE d.dept_id = e.dept_id
AND d.location = 'New York'
);
-- NOT EXISTS: find departments with no employees
SELECT d.dept_name FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees e
WHERE e.dept_id = d.dept_id
);
-- NOT IN: same result but different behavior with NULLs!
-- NOT IN returns no rows if subquery contains NULL values
-- NOT EXISTS handles NULLs correctly (safer)
-- Performance: EXISTS is often faster because it
-- short-circuits (stops at first match per row)-- ═══════════════════════════════════════════════════
-- CTE (Common Table Expression) — WITH clause
-- ═══════════════════════════════════════════════════
-- Basic CTE: improves readability over subqueries
WITH dept_averages AS (
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
)
SELECT e.emp_name, e.salary, d.avg_salary
FROM employees e
JOIN dept_averages d ON e.dept_id = d.dept_id
WHERE e.salary > d.avg_salary;
-- Multiple CTEs chained together
WITH dept_stats AS (
SELECT dept_id,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
),
top_depts AS (
SELECT dept_id, avg_salary
FROM dept_stats
WHERE emp_count >= 3
)
SELECT e.emp_name, e.salary, td.avg_salary
FROM employees e
JOIN top_depts td ON e.dept_id = td.dept_id
ORDER BY td.avg_salary DESC;
-- CTE vs Subquery: CTE is reusable, easier to read
-- CTE vs Temp Table: CTE is scoped to single statement,
-- temp table persists across statements in session-- ═══════════════════════════════════════════════════
-- RECURSIVE CTE — for hierarchical / tree data
-- ═══════════════════════════════════════════════════
-- Employee hierarchy (manager chain)
WITH RECURSIVE hierarchy AS (
-- Anchor: top-level employees (no manager)
SELECT emp_id, emp_name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: find direct reports of current rows
SELECT e.emp_id, e.emp_name, e.manager_id, h.level + 1
FROM employees e
JOIN hierarchy h ON e.manager_id = h.emp_id
)
SELECT level, emp_name, manager_id
FROM hierarchy
ORDER BY level, emp_name;
-- Organization tree with full path
WITH RECURSIVE org_tree AS (
SELECT emp_id, emp_name, manager_id,
CAST(emp_name AS VARCHAR(500)) AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.emp_id, e.emp_name, e.manager_id,
CONCAT(ot.path, ' > ', e.emp_name)
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.emp_id
)
SELECT emp_id, emp_name, level, path
FROM org_tree
ORDER BY path;
-- Recursive CTE to generate a number series
WITH RECURSIVE numbers AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM numbers WHERE n < 100
)
SELECT n FROM numbers;
-- Generates numbers 1 through 100EXISTS for correlated conditions (checks per row, short-circuits). Use IN when you have a static or small list. NOT EXISTS is always safer than NOT IN becauseNOT IN returns no rows if the subquery contains NULLs, while NOT EXISTS handles them correctly.-- ═══════════════════════════════════════════════════
-- AGGREGATE FUNCTIONS
-- ═══════════════════════════════════════════════════
-- COUNT(*) counts all rows (including NULLs)
-- COUNT(column) counts non-NULL values in column
SELECT COUNT(*) AS total_employees,
COUNT(salary) AS with_salary, -- excludes NULL salaries
COUNT(DISTINCT dept_id) AS departments,
SUM(salary) AS total_payroll,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees;
-- MIN/MAX work on dates and strings too
SELECT MIN(hire_date) AS earliest_hire,
MAX(hire_date) AS latest_hire
FROM employees;-- ═══════════════════════════════════════════════════
-- GROUP BY — single and multiple columns
-- ═══════════════════════════════════════════════════
-- Single column GROUP BY
SELECT dept_id,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary,
SUM(salary) AS total_salary
FROM employees
WHERE salary IS NOT NULL
GROUP BY dept_id;
-- Multiple columns GROUP BY
SELECT dept_id, manager_id,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id, manager_id
ORDER BY dept_id, emp_count DESC;
-- ═══════════════════════════════════════════════════
-- HAVING — filter after aggregation (WHERE is before)
-- ═══════════════════════════════════════════════════
-- Find departments with more than 2 employees
SELECT dept_id, COUNT(*) AS emp_count
FROM employees
GROUP BY dept_id
HAVING COUNT(*) > 2;
-- Find departments where avg salary > company average
SELECT dept_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
-- WHERE + GROUP BY + HAVING combined
SELECT dept_id, COUNT(*) AS emp_count, AVG(salary) AS avg_sal
FROM employees
WHERE hire_date >= '2021-01-01' -- filter rows first
GROUP BY dept_id -- then group
HAVING AVG(salary) > 80000; -- then filter groups-- ═══════════════════════════════════════════════════
-- GROUPING SETS, ROLLUP, CUBE
-- ═══════════════════════════════════════════════════
-- GROUPING SETS: multiple GROUP BY in one query
SELECT dept_id, manager_id, COUNT(*) AS emp_count
FROM employees
GROUP BY GROUPING SETS (
(dept_id), -- group by dept only
(manager_id), -- group by manager only
(dept_id, manager_id), -- group by both
() -- grand total
);
-- ROLLUP: hierarchical aggregation (subtotals + total)
-- Equivalent to GROUPING SETS (d), (d, m), (m), ()
SELECT dept_id, manager_id, COUNT(*) AS emp_count,
SUM(salary) AS total_salary
FROM employees
GROUP BY ROLLUP (dept_id, manager_id);
-- Produces: per dept+mgr, per dept subtotal, grand total
-- CUBE: all possible combinations
SELECT dept_id, manager_id, COUNT(*) AS emp_count
FROM employees
GROUP BY CUBE (dept_id, manager_id);
-- Produces: (d,m), (d), (m), () — all 4 combinations
-- GROUPING: identify which columns are aggregated (NULL)
SELECT dept_id, manager_id, COUNT(*) AS cnt,
GROUPING(dept_id) AS dept_grouped,
GROUPING(manager_id) AS mgr_grouped
FROM employees
GROUP BY ROLLUP (dept_id, manager_id);
-- GROUPING returns 1 if the column is a subtotal/total row-- ═══════════════════════════════════════════════════
-- DISTINCT inside aggregates
-- ═══════════════════════════════════════════════════
-- Count unique departments employees belong to
SELECT COUNT(DISTINCT dept_id) AS unique_depts
FROM employees;
-- Average of unique salaries (deduplicated)
SELECT AVG(DISTINCT salary) AS unique_avg_salary
FROM employees;
-- Multiple grouping with different aggregations
SELECT dept_id,
COUNT(*) AS total,
COUNT(DISTINCT manager_id) AS unique_managers,
AVG(salary) AS avg_salary,
SUM(DISTINCT salary) AS sum_unique_salaries
FROM employees
GROUP BY dept_id;| Function | Ignores NULLs | Use Case |
|---|---|---|
| COUNT(*) | No | Count all rows |
| COUNT(col) | Yes | Count non-NULL values |
| COUNT(DISTINCT col) | Yes | Count unique values |
| SUM(col) | Yes | Total of numeric values |
| AVG(col) | Yes | Average (arithmetic mean) |
| MIN(col) | Yes | Smallest value |
| MAX(col) | Yes | Largest value |
| STRING_AGG(col, sep) | Yes | Concatenate strings (PostgreSQL) |
| GROUP_CONCAT(col) | Yes | Concatenate strings (MySQL) |
| Feature | WHERE | HAVING |
|---|---|---|
| Executes | Before GROUP BY | After GROUP BY |
| Filters | Individual rows | Aggregated groups |
| Aggregate allowed | No | Yes |
| Can use aliases | No | Yes (PostgreSQL) |
| Performance | Reduces rows early | Processes all groups first |
| Best practice | Filter as early as possible | Only for group conditions |
-- ═══════════════════════════════════════════════════
-- ROW_NUMBER, RANK, DENSE_RANK
-- ═══════════════════════════════════════════════════
-- ROW_NUMBER: unique sequential number (1,2,3,4,5...)
SELECT emp_name, dept_id, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn
FROM employees;
-- RANK: same rank for ties, skips numbers (1,2,2,4,5...)
SELECT emp_name, dept_id, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees;
-- DENSE_RANK: same rank for ties, NO skip (1,2,2,3,4...)
SELECT emp_name, dept_id, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM employees;
-- Example: if salaries are 100, 90, 90, 80
-- ROW_NUMBER: 1, 2, 3, 4
-- RANK: 1, 2, 2, 4 (skips 3)
-- DENSE_RANK: 1, 2, 2, 3 (no skip)
-- PARTITION BY: restart numbering per group
SELECT emp_name, dept_id, salary,
ROW_NUMBER() OVER (
PARTITION BY dept_id
ORDER BY salary DESC
) AS dept_rank
FROM employees;
-- Rank employees within each department-- ═══════════════════════════════════════════════════
-- NTILE, LAG, LEAD
-- ═══════════════════════════════════════════════════
-- NTILE: divide rows into N groups (quartiles, percentiles)
SELECT emp_name, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile,
NTILE(100) OVER (ORDER BY salary DESC) AS percentile
FROM employees;
-- Quartile 1 = highest paid, Quartile 4 = lowest paid
-- LAG: access value from a previous row
SELECT emp_id, emp_name, salary, hire_date,
LAG(salary, 1) OVER (ORDER BY hire_date) AS prev_salary,
LAG(salary, 1) OVER (ORDER BY hire_date) - salary AS salary_diff
FROM employees
ORDER BY hire_date;
-- LEAD: access value from a next row
SELECT emp_name, salary, hire_date,
LEAD(salary, 1) OVER (ORDER BY hire_date) AS next_salary,
LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS next_hire
FROM employees
ORDER BY hire_date;
-- LAG/LEAD with PARTITION BY
SELECT emp_name, dept_id, salary, hire_date,
LAG(salary) OVER (
PARTITION BY dept_id ORDER BY hire_date
) AS prev_dept_salary
FROM employees;-- ═══════════════════════════════════════════════════
-- FIRST_VALUE, LAST_VALUE
-- ═══════════════════════════════════════════════════
-- FIRST_VALUE: value from first row in window
SELECT emp_name, dept_id, salary,
FIRST_VALUE(emp_name) OVER (
PARTITION BY dept_id
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS highest_paid_in_dept
FROM employees;
-- LAST_VALUE: value from last row in window
-- NOTE: default window frame is ROWS BETWEEN UNBOUNDED PRECEDING
-- AND CURRENT ROW — must expand to get true last value!
SELECT emp_name, dept_id, salary,
LAST_VALUE(emp_name) OVER (
PARTITION BY dept_id
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_paid_in_dept
FROM employees;-- ═══════════════════════════════════════════════════
-- RUNNING TOTALS & MOVING AVERAGES
-- ═══════════════════════════════════════════════════
-- Running total (cumulative sum)
SELECT emp_name, salary, hire_date,
SUM(salary) OVER (
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM employees
ORDER BY hire_date;
-- Moving average (3-month window)
SELECT emp_name, salary, hire_date,
AVG(salary) OVER (
ORDER BY hire_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS moving_avg_3
FROM employees
ORDER BY hire_date;
-- ═══════════════════════════════════════════════════
-- TOP-N PER GROUP (classic interview pattern)
-- ═══════════════════════════════════════════════════
WITH ranked AS (
SELECT emp_name, dept_id, salary,
ROW_NUMBER() OVER (
PARTITION BY dept_id
ORDER BY salary DESC
) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 2;
-- Top 2 earners in each department| Clause | Purpose | Example |
|---|---|---|
| OVER() | Required wrapper | OVER (ORDER BY date) |
| PARTITION BY | Group for calculation | PARTITION BY dept_id |
| ORDER BY | Order within partition | ORDER BY salary DESC |
| ROWS BETWEEN | Define window frame | BETWEEN 1 PRECEDING AND 1 FOLLOWING |
| UNBOUNDED PRECEDING | Start of partition | From the very first row |
| CURRENT ROW | Current row boundary | Up to current row |
| UNBOUNDED FOLLOWING | End of partition | To the very last row |
| Feature | GROUP BY | Window Function |
|---|---|---|
| Rows | 1 row per group | 1 row per input row |
| Aggregates | Yes | Yes |
| Non-aggregate cols | Must be in GROUP BY | Can access all columns |
| Detail preserved | No (collapsed) | Yes (all rows kept) |
| Running totals | No | Yes |
| Ranking | No | Yes (ROW_NUMBER, RANK) |
LAST_VALUE isROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — it only looks at rows up to the current row. You MUST specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to get the true last value in the partition.-- ═══════════════════════════════════════════════════
-- INDEXES
-- ═══════════════════════════════════════════════════
-- B-Tree Index (default) — great for =, >, <, BETWEEN, LIKE 'prefix%'
CREATE INDEX idx_emp_dept ON employees(dept_id);
CREATE INDEX idx_emp_name ON employees(emp_name);
CREATE INDEX idx_emp_salary ON employees(salary DESC);
-- Composite Index — order matters! (leftmost prefix rule)
-- Works for: (dept_id), (dept_id, salary), (dept_id, salary, hire_date)
-- Does NOT work for: (salary) alone, (hire_date) alone
CREATE INDEX idx_compound ON employees(dept_id, salary, hire_date);
-- Unique Index — enforces uniqueness + speeds up lookups
CREATE UNIQUE INDEX idx_emp_email ON employees(email);
-- Hash Index — only equality (=), no range queries
CREATE INDEX idx_emp_hash ON employees USING HASH (dept_id);
-- Partial Index — index only matching rows (smaller, faster)
CREATE INDEX idx_active_emp ON employees(emp_name)
WHERE is_active = TRUE;
-- Expression Index — index computed value
CREATE INDEX idx_lower_email ON employees(LOWER(email));
-- Covering Index — includes all needed columns (index-only scan)
CREATE INDEX idx_covering ON employees(dept_id) INCLUDE (emp_name, salary);
-- Remove index
DROP INDEX idx_emp_dept;-- ═══════════════════════════════════════════════════
-- QUERY EXECUTION PLAN (EXPLAIN)
-- ═══════════════════════════════════════════════════
-- Basic EXPLAIN — shows the plan without executing
EXPLAIN SELECT * FROM employees WHERE dept_id = 1;
-- EXPLAIN ANALYZE — executes and shows actual timing
EXPLAIN ANALYZE
SELECT e.emp_name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE e.salary > 90000;
-- Key things to look for in EXPLAIN output:
-- Seq Scan = full table scan (bad for large tables)
-- Index Scan = using an index (good)
-- Bitmap Scan = index + sort (good for many matches)
-- Nested Loop = nested iteration (ok for small datasets)
-- Hash Join = build hash table (good for large joins)
-- Merge Join = sorted input merge (good for sorted data-- ═══════════════════════════════════════════════════
-- QUERY OPTIMIZATION TIPS
-- ═══════════════════════════════════════════════════
-- 1. Avoid SELECT * — only fetch needed columns
-- Bad:
SELECT * FROM employees WHERE dept_id = 1;
-- Good:
SELECT emp_id, emp_name, salary FROM employees WHERE dept_id = 1;
-- 2. Use EXISTS instead of IN for large subqueries
-- Bad: IN materializes the full subquery result
SELECT * FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments);
-- Good: EXISTS stops at first match
SELECT * FROM employees e
WHERE EXISTS (SELECT 1 FROM departments d WHERE d.dept_id = e.dept_id);
-- 3. Use LIMIT to reduce result sets
SELECT emp_name FROM employees
WHERE dept_id = 1
ORDER BY salary DESC LIMIT 10;
-- 4. Avoid functions on indexed columns (breaks index)
-- Bad: index on emp_name won't be used
SELECT * FROM employees WHERE LOWER(emp_name) = 'alice';
-- Good: use expression index or store normalized value
SELECT * FROM employees WHERE emp_name = 'Alice';
-- 5. Use JOIN instead of correlated subqueries when possible
-- Bad: O(n) subquery executions
SELECT e.emp_name,
(SELECT dept_name FROM departments WHERE dept_id = e.dept_id)
FROM employees e;
-- Good: single execution with JOIN
SELECT e.emp_name, d.dept_name
FROM employees e JOIN departments d ON e.dept_id = d.dept_id;
-- 6. Avoid leading wildcards in LIKE (breaks index)
-- Bad: full table scan
SELECT * FROM employees WHERE emp_name LIKE '%son';
-- Good: can use index
SELECT * FROM employees WHERE emp_name LIKE 'John%';-- ═══════════════════════════════════════════════════
-- VIEWS (Simple vs Materialized)
-- ═══════════════════════════════════════════════════
-- Simple View: virtual table, query runs every time
CREATE VIEW employee_details AS
SELECT e.emp_name, e.salary, d.dept_name, d.location
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
SELECT * FROM employee_details WHERE salary > 90000;
-- Materialized View: stores result, refreshes manually
CREATE MATERIALIZED VIEW dept_summary AS
SELECT dept_id, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees GROUP BY dept_id;
-- Refresh materialized view
REFRESH MATERIALIZED VIEW dept_summary;
-- Concurrent refresh (doesn't lock reads)
REFRESH MATERIALIZED VIEW CONCURRENTLY dept_summary;
-- ═══════════════════════════════════════════════════
-- STORED PROCEDURE
-- ═══════════════════════════════════════════════════
CREATE PROCEDURE give_raise(IN dept INT, IN pct DECIMAL(5,2))
BEGIN
UPDATE employees
SET salary = salary * (1 + pct / 100)
WHERE dept_id = dept;
END;
CALL give_raise(1, 10.0);
-- ═══════════════════════════════════════════════════
-- TRIGGER
-- ═══════════════════════════════════════════════════
CREATE TRIGGER log_salary_change
AFTER UPDATE ON employees
FOR EACH ROW
BEGIN
IF OLD.salary <> NEW.salary THEN
INSERT INTO salary_audit (emp_id, old_salary, new_salary, changed_at)
VALUES (NEW.emp_id, OLD.salary, NEW.salary, NOW());
END IF;
END;-- ═══════════════════════════════════════════════════
-- TRANSACTIONS (ACID)
-- ═══════════════════════════════════════════════════
-- ACID Properties:
-- Atomicity: all or nothing (no partial commits)
-- Consistency: database remains in valid state
-- Isolation: concurrent transactions don't interfere
-- Durability: committed data survives crashes
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
-- If anything fails, rollback both
COMMIT; -- or ROLLBACK;
-- ═══════════════════════════════════════════════════
-- ISOLATION LEVELS (from weakest to strongest)
-- ═══════════════════════════════════════════════════
-- Read Uncommitted: can read uncommitted changes (dirty reads)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
-- Read Committed: only read committed data (no dirty reads)
-- But same query can return different data (non-repeatable read)
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Repeatable Read: same query returns same data within transaction
-- But new rows can appear (phantom reads)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- Serializable: strongest — no dirty/non-repeatable/phantom reads
-- Locks entire ranges, slowest but safest
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ═══════════════════════════════════════════════════
-- CTE vs TEMP TABLE
-- ═══════════════════════════════════════════════════
-- CTE: single query scope, optimized as one unit, no IO
-- Temp Table: session-scoped, can be indexed, can be JOINed
-- multiple times, materialized to disk
-- CTE
WITH filtered AS (
SELECT * FROM employees WHERE salary > 80000
)
SELECT * FROM filtered WHERE dept_id = 1;
-- Temp Table
CREATE TEMPORARY TABLE tmp_high_earners AS
SELECT * FROM employees WHERE salary > 80000;
CREATE INDEX idx_tmp_dept ON tmp_high_earners(dept_id);
SELECT * FROM tmp_high_earners WHERE dept_id = 1;
DROP TABLE tmp_high_earners;| Index Type | Operations | Use Case |
|---|---|---|
| B-Tree (default) | =, >, <, BETWEEN, LIKE | General purpose, most queries |
| Hash | = only | Exact lookups, hash tables |
| Composite | Leftmost prefix | Multi-column WHERE/ORDER BY |
| Partial | Subset of rows | Filtered indexes, smaller size |
| Expression | Computed value | LOWER(), UPPER(), etc. |
| Unique | =, INSERT | Enforce uniqueness + lookup |
| GIN | Full-text, JSON | PostgreSQL text search, JSONB |
| GiST | Spatial, range | Geometric data, ranges |
| Level | Dirty Read | Non-Repeatable | Phantom | Performance |
|---|---|---|---|---|
| Read Uncommitted | Yes | Yes | Yes | Fastest |
| Read Committed | No | Yes | Yes | Fast |
| Repeatable Read | No | No | Yes | Medium |
| Serializable | No | No | No | Slowest |
(a, b, c) supports queries on(a), (a, b), and (a, b, c) — but NOT (b) or(c) alone. This is the leftmost prefix rule. Place the most selective column first.-- ═══════════════════════════════════════════════════════════
-- PATTERN 1: Nth Highest Salary (using DENSE_RANK)
-- ═══════════════════════════════════════════════════════════
WITH ranked AS (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees WHERE salary IS NOT NULL
)
SELECT DISTINCT salary AS nth_highest
FROM ranked WHERE rnk = 3; -- 3rd highest
-- Using LIMIT/OFFSET (MySQL — handles ties differently)
SELECT DISTINCT salary
FROM employees
WHERE salary IS NOT NULL
ORDER BY salary DESC
LIMIT 1 OFFSET 2; -- 3rd highest (0-indexed offset)-- ═══════════════════════════════════════════════════════════
-- PATTERN 2: Department-Wise Highest Salary
-- ═══════════════════════════════════════════════════════════
-- Using window function (cleanest)
WITH ranked AS (
SELECT emp_name, dept_id, salary,
ROW_NUMBER() OVER (
PARTITION BY dept_id ORDER BY salary DESC
) AS rn
FROM employees WHERE salary IS NOT NULL
)
SELECT * FROM ranked WHERE rn = 1;
-- Using correlated subquery (classic approach)
SELECT e.emp_name, e.dept_id, e.salary
FROM employees e
WHERE salary = (
SELECT MAX(e2.salary)
FROM employees e2
WHERE e2.dept_id = e.dept_id
);
-- ═══════════════════════════════════════════════════════════
-- PATTERN 3: Employees With No Manager (top-level)
-- ═══════════════════════════════════════════════════════════
SELECT emp_name, salary, hire_date
FROM employees
WHERE manager_id IS NULL;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 4: Find Duplicate Records
-- ═══════════════════════════════════════════════════════════
SELECT email, COUNT(*) AS dup_count
FROM employees
WHERE email IS NOT NULL
GROUP BY email
HAVING COUNT(*) > 1;
-- Get full duplicate rows
SELECT * FROM employees
WHERE email IN (
SELECT email FROM employees
WHERE email IS NOT NULL
GROUP BY email
HAVING COUNT(*) > 1
);-- ═══════════════════════════════════════════════════════════
-- PATTERN 5: Consecutive Days Login (hard pattern!)
-- ═══════════════════════════════════════════════════════════
-- Using ROW_NUMBER with date arithmetic
WITH login_data AS (
SELECT emp_id, login_date,
ROW_NUMBER() OVER (
PARTITION BY emp_id ORDER BY login_date
) AS rn
FROM logins
)
SELECT DISTINCT emp_id
FROM login_data l1
WHERE EXISTS (
SELECT 1 FROM login_data l2
WHERE l2.emp_id = l1.emp_id
AND l2.login_date = DATE(l1.login_date, '+1 day')
AND l2.rn = l1.rn + 1
);
-- ═══════════════════════════════════════════════════════════
-- PATTERN 6: Month-Over-Month Revenue Growth
-- ═══════════════════════════════════════════════════════════
WITH monthly AS (
SELECT DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
(revenue - LAG(revenue) OVER (ORDER BY month))
/ LAG(revenue) OVER (ORDER BY month) * 100 AS growth_pct
FROM monthly
ORDER BY month;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 7: Top Earners Per Department
-- ═══════════════════════════════════════════════════════════
SELECT * FROM (
SELECT emp_name, dept_id, salary,
DENSE_RANK() OVER (
PARTITION BY dept_id ORDER BY salary DESC
) AS rnk
FROM employees WHERE salary IS NOT NULL
) ranked
WHERE rnk <= 3 -- top 3 per department
ORDER BY dept_id, rnk;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 8: Second Highest Salary WITHOUT LIMIT
-- ═══════════════════════════════════════════════════════════
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- Works because it finds the max salary that is less than the highest-- ═══════════════════════════════════════════════════════════
-- PATTERN 9: Find All Managers
-- ═══════════════════════════════════════════════════════════
SELECT DISTINCT e.emp_name, e.salary, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE e.emp_id IN (SELECT DISTINCT manager_id FROM employees WHERE manager_id IS NOT NULL);
-- ═══════════════════════════════════════════════════════════
-- PATTERN 10: Department With Most Employees
-- ═══════════════════════════════════════════════════════════
SELECT d.dept_name, COUNT(e.emp_id) AS emp_count
FROM departments d
JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_name
ORDER BY emp_count DESC
LIMIT 1;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 11: Salary Rank Across Company
-- ═══════════════════════════════════════════════════════════
SELECT emp_name, salary,
RANK() OVER (ORDER BY salary DESC) AS company_rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees
WHERE salary IS NOT NULL;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 12: Employee Tenure (years/months of service)
-- ═══════════════════════════════════════════════════════════
SELECT emp_name, hire_date,
DATEDIFF(CURRENT_DATE, hire_date) AS total_days,
DATEDIFF(CURRENT_DATE, hire_date) / 365 AS years_of_service,
TIMESTAMPDIFF(MONTH, hire_date, CURRENT_DATE) AS total_months
FROM employees
ORDER BY hire_date;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 13: Pivot Table (rows to columns)
-- ═══════════════════════════════════════════════════════════
SELECT dept_id,
SUM(CASE WHEN job_title = 'Engineer' THEN 1 ELSE 0 END) AS engineers,
SUM(CASE WHEN job_title = 'Manager' THEN 1 ELSE 0 END) AS managers,
SUM(CASE WHEN job_title = 'Analyst' THEN 1 ELSE 0 END) AS analysts
FROM employees
GROUP BY dept_id;-- ═══════════════════════════════════════════════════════════
-- PATTERN 14: Delete Duplicate Rows (keep lowest ID)
-- ═══════════════════════════════════════════════════════════
DELETE FROM employees
WHERE emp_id NOT IN (
SELECT MIN(emp_id)
FROM employees
GROUP BY email
);
-- Using CTE (PostgreSQL)
WITH duplicates AS (
SELECT emp_id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY emp_id) AS rn
FROM employees
)
DELETE FROM employees
WHERE emp_id IN (SELECT emp_id FROM duplicates WHERE rn > 1);
-- ═══════════════════════════════════════════════════════════
-- PATTERN 15: Percentage of Total
-- ═══════════════════════════════════════════════════════════
SELECT dept_id, SUM(salary) AS dept_total,
SUM(salary) / (SELECT SUM(salary) FROM employees) * 100 AS pct_of_total,
SUM(salary) OVER () AS grand_total -- window function approach
FROM employees
WHERE salary IS NOT NULL
GROUP BY dept_id
ORDER BY pct_of_total DESC;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 16: Employees who changed departments
-- ═══════════════════════════════════════════════════════════
SELECT e.emp_id, e.emp_name,
e.dept_id AS old_dept, s.dept_id AS new_dept
FROM employee_history e
JOIN (
SELECT emp_id, dept_id,
LAG(dept_id) OVER (PARTITION BY emp_id ORDER BY changed_at) AS prev_dept
FROM employee_history
) s ON e.emp_id = s.emp_id
WHERE e.dept_id <> s.prev_dept;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 17: Cumulative distribution (percentile rank)
-- ═══════════════════════════════════════════════════════════
SELECT emp_name, salary,
PERCENT_RANK() OVER (ORDER BY salary DESC) AS pct_rank,
CUME_DIST() OVER (ORDER BY salary DESC) AS cumulative_dist
FROM employees WHERE salary IS NOT NULL;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 18: Gaps and Islands (consecutive sequences)
-- ═══════════════════════════════════════════════════════════
-- Find consecutive login streaks
WITH numbered AS (
SELECT emp_id, login_date,
ROW_NUMBER() OVER (PARTITION BY emp_id ORDER BY login_date) AS rn,
DATE(login_date) - INTERVAL '1 day' * ROW_NUMBER()
OVER (PARTITION BY emp_id ORDER BY login_date) AS grp
FROM logins
)
SELECT emp_id, grp, MIN(login_date) AS streak_start,
MAX(login_date) AS streak_end, COUNT(*) AS streak_length
FROM numbered
GROUP BY emp_id, grp
HAVING COUNT(*) >= 3
ORDER BY streak_length DESC;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 19: Median salary
-- ═══════════════════════════════════════════════════════════
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees WHERE salary IS NOT NULL;
-- ═══════════════════════════════════════════════════════════
-- PATTERN 20: Self-referencing hierarchy (salary > all juniors)
-- ═══════════════════════════════════════════════════════════
SELECT e.emp_name AS manager, e.salary AS mgr_salary,
COUNT(r.emp_id) AS reports_count
FROM employees e
LEFT JOIN employees r ON r.manager_id = e.emp_id
GROUP BY e.emp_id, e.emp_name, e.salary
HAVING COUNT(r.emp_id) > 0
ORDER BY reports_count DESC;ROW_NUMBER for no ties, RANK for ties with gaps, DENSE_RANK for ties without gaps.WHERE filters rows before aggregation (GROUP BY). It operates on individual rows and cannot reference aggregate functions like SUM() or COUNT(). HAVING filters groups after aggregation. It can reference aggregate functions. Execution order: FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY. Always use WHERE when possible to reduce the number of rows before the expensive grouping operation.
UNION combines result sets and removes duplicates (does a sort + dedup, slower).UNION ALL combines result sets without removing duplicates (faster, no sort). Both require the same number of columns and compatible data types. Use UNION ALLwhen you know there are no duplicates or when you don't care about them — it's always faster. UNION is equivalent toUNION ALL + DISTINCT.
B-Tree (default): Supports equality, range, and prefix searches. Best for general-purpose use.Hash: Only supports equality (=). Faster for exact lookups but useless for ranges.Composite:Multiple columns in one index. Follow the leftmost prefix rule — order matters.Partial: Indexes only rows matching a WHERE clause. Smaller index, faster for filtered queries.Expression: Indexes a computed value (e.g., LOWER(email)). Create an index on frequently used expressions.Full-text (GIN): Optimized for text search in PostgreSQL.
The logical execution order is: (1) FROM / JOIN -> determine source tables and joins.(2) WHERE -> filter individual rows. (3) GROUP BY -> group remaining rows.(4) HAVING -> filter aggregated groups. (5) SELECT -> compute columns and aliases.(6) DISTINCT -> remove duplicates. (7) ORDER BY -> sort the result.(8) LIMIT / OFFSET -> paginate. Note: column aliases defined in SELECT are NOT available in WHERE or GROUP BY but ARE available in ORDER BY and HAVING (in some databases like PostgreSQL).
Step 1: Run EXPLAIN ANALYZE to identify bottlenecks (full table scans, nested loops, etc.).Step 2: Add appropriate indexes on WHERE, JOIN, and ORDER BY columns.Step 3: Replace SELECT * with only needed columns.Step 4: Replace subqueries with JOINs where possible.Step 5: Use EXISTS instead of IN for large subqueries.Step 6: Avoid functions on indexed columns (breaks the index).Step 7: Use LIMIT to reduce result sets.Step 8: Consider materialized views for heavy aggregations.Step 9: Analyze and vacuum tables to update statistics.
From weakest to strongest: (1) Read Uncommitted -> allows dirty reads (reading uncommitted data).(2) Read Committed -> prevents dirty reads but allows non-repeatable reads.(3) Repeatable Read -> prevents dirty + non-repeatable reads but allows phantom reads.(4) Serializable -> prevents all three (strongest, slowest). Most databases default to Read Committed (PostgreSQL, SQL Server) or Repeatable Read (MySQL/InnoDB). Higher isolation levels use more locking, reducing concurrency but preventing more anomalies. The ACID properties (Atomicity, Consistency, Isolation, Durability) guarantee reliable transactions.