AI & Machine Learning Cheatsheet Cheatsheet

🧠

AI Fundamentals

Core Concepts

Artificial Intelligence is the broad field of creating machines capable of intelligent behavior. Understanding the hierarchy of AI, ML, DL, and GenAI is foundational.

Field	Definition	Scope	Examples
AI (Artificial Intelligence)	Machines simulating human intelligence — reasoning, learning, perception	Broadest — umbrella for all intelligent systems	Rule engines, expert systems, robotics
ML (Machine Learning)	Systems that learn from data without explicit programming	Subset of AI — data-driven learning	Spam filters, recommendation engines, fraud detection
DL (Deep Learning)	Neural networks with many layers learning hierarchical representations	Subset of ML — inspired by brain structure	Image recognition, speech synthesis, GPT
GenAI (Generative AI)	Models that create new content (text, images, code, audio, video)	Subset of DL — focused on generation	ChatGPT, Midjourney, Copilot, Sora

Types of Machine Learning

Type	Data	Goal	Key Algorithms	Use Cases
Supervised	Labeled data	Predict outcomes from labeled examples	Linear/Logistic Regression, Decision Trees, SVM, Random Forest, KNN	Classification (spam/not spam), Regression (price prediction)
Unsupervised	Unlabeled data	Discover hidden patterns and structure	K-Means, DBSCAN, PCA, Hierarchical Clustering, Autoencoders	Customer segmentation, anomaly detection, topic modeling
Reinforcement	Reward signals	Learn optimal actions through trial and reward	Q-Learning, PPO, DQN, A2C, SAC	Game playing (AlphaGo), robotics, recommendation systems
Self-Supervised	Unlabeled (creates labels)	Learn representations by predicting parts of input	BERT, GPT, SimCLR, BYOL	Language modeling, pretraining LLMs
Semi-Supervised	Mix of labeled + unlabeled	Improve learning with small labeled + large unlabeled data	Label propagation, consistency regularization	Medical imaging, NLP with limited labels

Common AI Terminology Glossary

EpochOne full pass through the entire training dataset

BatchSubset of training data processed before updating model weights

Batch SizeNumber of samples per batch (typical: 32, 64, 128, 256)

Learning RateStep size for weight updates during optimization (typical: 1e-5 to 1e-2)

Loss FunctionMeasures how far predictions are from actual values; model minimizes this

GradientVector of partial derivatives; direction and rate of steepest ascent of loss

BackpropagationAlgorithm to compute gradients by chaining derivatives from output to input layers

Weights & BiasesLearnable parameters; weights scale inputs, biases shift activation

ParametersTotal learnable values in a model (GPT-4: ~1.8T, Llama 3: 70B, Claude 3: unknown)

InferenceUsing a trained model to make predictions on new data

TokenBasic unit of text for LLMs (~0.75 words per token in English)

EmbeddingDense vector representation capturing semantic meaning of data

Fine-TuningFurther training a pretrained model on task-specific data

Transfer LearningApplying knowledge from one task to improve another related task

OverfittingModel memorizes training data but fails on unseen data (high variance)

UnderfittingModel is too simple to capture patterns in data (high bias)

Ground TruthThe correct/actual label or answer used to evaluate predictions

LatencyTime taken for a single inference request (p50, p95, p99)

ThroughputNumber of inference requests processed per unit time

💡

Key insight: GenAI models are fundamentally prediction engines. GPT predicts the next token; diffusion models predict noise to remove from images. Understanding this helps you reason about their capabilities and limitations.

🛠️

Popular AI Tools & Platforms (2025-26)

Current

The AI tool landscape in 2025-2026 is dominated by large language model chatbots, image generators, coding assistants, and specialized AI platforms.

AI Chatbot Comparison (2025-26)

Platform	Best Model	Context Window	Key Features	Pricing (Starts At)	Best For
ChatGPT (OpenAI)	GPT-4o / o3	128K tokens	Web browsing, Code Interpreter, vision, DALL-E, plugins, GPTs	Free / Plus $20/mo	General purpose, coding, analysis
Claude (Anthropic)	Claude 4 Sonnet/Opus	200K tokens	Artifacts, projects, large file analysis, vision, extended thinking	Free / Pro $20/mo	Long documents, coding, careful analysis
Gemini (Google)	Gemini 2.5 Pro	1M tokens	Google Search integration, Workspace apps, video/audio understanding	Free / Advanced $20/mo	Research, Google ecosystem, multimodal
Copilot (Microsoft)	GPT-4o + internal models	128K tokens	Office integration, GitHub, Teams, Windows, Enterprise GraphRAG	Free / Pro $20/mo	Microsoft users, enterprise workflows

AI Image Generation Tools

Tool	Developer	Style	Resolution	Key Features	Pricing
DALL-E 3	OpenAI	Versatile, follows prompts	1024x1024, 1792x1024	ChatGPT integration, inpainting, edits	Included in ChatGPT Plus
Midjourney v6.1	Midjourney	Artistic, photorealistic	Up to 2048x2048	Style tuning, character reference, zoom, vary	$10-$60/mo subscription
Stable Diffusion 3.5	Stability AI	Open, customizable	Up to 1MP	Open-source, local running, ControlNet, LoRA fine-tuning	Free (open-source)
Flux (Black Forest)	Black Forest Labs	Photorealistic, creative	Variable	Fast inference, high quality, open weights	Free tier / API pricing
Ideogram 2.0	Ideogram	Text-in-image, design	Up to 2048x2048	Best text rendering in images, style controls	Free / Pro plans
Adobe Firefly	Adobe	Commercially safe, design	Vector + raster	Photoshop integration, Content Credentials, training on licensed data	Included in Creative Cloud

AI Coding Assistants

Tool	Type	Models Used	Key Features	IDE Integration	Pricing
GitHub Copilot	Autocomplete + Chat	Claude 3.5 Sonnet, GPT-4o	Inline suggestions, Copilot Chat, PR summaries, Actions	VS Code, JetBrains, Neovim, Vim	$10/mo (Individual)
Cursor	AI-first IDE	Claude 3.5 Sonnet, GPT-4o	Codebase-aware edits, Composer, multi-file edits, tab completion	Built-in (fork of VS Code)	Free / Pro $20/mo
Tabnine	Autocomplete	Proprietary + open models	Privacy-first, on-premise option, whole-line completion	VS Code, JetBrains, all major IDEs	Free / Pro $12/mo
Windsurf (Codeium)	AI-first IDE	Multiple models	Cascade (multi-step agents), Flow state, inline edits	Built-in	Free / Pro $15/mo
Amazon Q Developer	Chat + Agent	Amazon models	Code transformation, security scans, legacy upgrades (Java)	VS Code, JetBrains, CLI	Free tier / Pro $19/mo

AI Video & Audio Tools

Tool	Category	Key Capability	Pricing	Best For
Sora (OpenAI)	Video generation	Text-to-video up to 60s, realistic physics	ChatGPT Pro $200/mo	Creative video content, ads
Runway Gen-3 Alpha	Video generation	Text/image-to-video, motion brush, camera controls	$12-$76/mo	Filmmakers, content creators
HeyGen	Avatar video	AI avatars, voice cloning, translation	$24-$180/mo	Corporate training, marketing videos
ElevenLabs	Voice / Audio	Voice cloning, text-to-speech, dubbing, sound effects	Free / Pro $5-$22/mo	Voiceovers, podcasts, audiobooks
Suno / Udio	Music generation	Full song generation from text prompts	Free / Pro $8-$30/mo	Music creation, content creators
Descript	Audio/Video editing	Edit media by editing transcript, AI voice, Eye contact	Free / Pro $24/mo	Podcasters, video editors
Whisper (OpenAI)	Speech-to-text	Open-source transcription, 99 languages	Free (open-source)	Transcription, accessibility
Kling AI	Video generation	Text/image-to-video, 1080p, 5-10s clips	Free credits / API	Social media, marketing

No-Code / Low-Code AI Platforms

Platform	Focus	What You Can Build	Pricing	Best For
Make (Integromat)	AI automation	Automated workflows with AI steps (GPT, vision, classification)	Free / Pro $9-$16/mo	Business automation, data pipelines
Zapier AI	AI automation	AI-powered workflows, Chatbots, summarization	Free / Pro $20-$299/mo	Business users, no-code automation
Bubble + AI	Full-stack app builder	AI-powered web apps, connect any LLM API	Free / Pro $29-$119/mo	MVPs, SaaS products
Flowise	Visual LLM builder	LangChain-based visual chatbots, RAG pipelines	Free (open-source)	Developers wanting visual AI builder
Relevance AI	AI workforce	Build AI agents, tools, no-code workflows	Free / Pro plans	Teams building AI-powered tools
Glide + AI	Mobile apps	AI-powered mobile apps from spreadsheets	Free / Pro $25-$99/mo	Mobile-first internal tools
Hugging Face Spaces	ML demos	Host ML model demos, Gradio / Streamlit apps	Free / Pro $9/mo	ML researchers, demo hosting

💡

Pro tip: Many AI tools offer free tiers that are sufficient for personal use. Always check rate limits, data privacy policies, and whether your data is used for training before choosing a platform for sensitive work.

✍️

Prompt Engineering

Essential Skill

Prompt engineering is the art and science of crafting inputs that elicit the best possible outputs from AI models. It is one of the most valuable skills in the AI era.

Prompt Structure Framework

Use the CREATE framework for effective prompts:

C — ContextBackground information and role assignment

R — RequestSpecific task or question you want answered

E — ExamplesFew-shot demonstrations of desired output format

A — AudienceWho the output is for (level of detail, tone)

T — ToneDesired style (formal, casual, technical, creative)

E — ExtrasConstraints, format requirements, length limits

example-prompt.txt

You are an expert Python developer with 10 years of experience in data engineering.

[CONTEXT] I am building a data pipeline that processes CSV files from an S3 bucket,
transforms the data, and loads it into a PostgreSQL database.

[REQUEST] Write a Python script using pandas and psycopg2 that:
1. Reads CSV files from a local directory
2. Validates the schema
3. Inserts valid rows into a PostgreSQL table

[AUDIENCE] This will be used by a junior developer on my team. Include error handling
and inline comments explaining each step.

[TONE] Professional but accessible. Include type hints.

[FORMAT] Return only the Python code with comments. No markdown code fences.

Prompting Techniques

Technique	Description	When to Use	Example
Zero-shot	Model uses its training knowledge with no examples	Simple, well-defined tasks	Translate this to French: Hello world
Few-shot	Provide 2-5 input-output examples in the prompt	Tasks needing specific format or style	Q: big -> small Q: hot -> ? A: cold
Chain-of-Thought (CoT)	Ask model to think step-by-step before answering	Math, reasoning, multi-step problems	Solve step by step: If a train leaves at...
Tree-of-Thought (ToT)	Explore multiple reasoning paths, evaluate, choose best	Complex planning, strategy problems	Consider 3 approaches. Evaluate trade-offs...
Role Prompting	Assign a specific role or persona to the model	Domain-specific expertise needed	You are a senior security auditor...
Self-Consistency	Generate multiple CoT answers, take majority vote	Reducing errors in reasoning tasks	Solve this 5 different ways and find consensus
ReAct	Combine reasoning with acting (tool use, API calls)	Agents, multi-step tool use	Think about what tool you need, then use it.

System Prompts, Temperature & Top-p

System PromptSets behavior, persona, and constraints for the entire conversation. Loaded before user messages.

TemperatureControls randomness. 0 = deterministic (factual), 1 = creative (varied). Typical: 0-0.7 for code, 0.7-1.2 for creative writing.

Top-p (Nucleus Sampling)Limits sampling to tokens whose cumulative probability exceeds p. 0.1 = very focused, 1.0 = all tokens. Typical: 0.9-0.95.

Top-kLimits sampling to the k most likely tokens. Top-k=40 means only top 40 tokens considered.

Max TokensMaximum length of the generated response. Limits cost and verbosity.

Frequency PenaltyReduces repetition of the same tokens (0-2 scale). Higher = less repetitive.

Presence PenaltyEncourages talking about new topics (0-2 scale). Higher = more diverse.

Stop SequencesCustom strings that stop generation when encountered.

Common Prompt Patterns with Examples

Pattern	Template	Example
Direct Instruction	Do [task] as [role] in [format]	Summarize this article in 3 bullet points as a project manager
Format Specification	Output as [format]: JSON/markdown/table/list	Return results as JSON with keys: name, score, grade
Constraint Setting	Rules: no X, must include Y, max Z words	Write a poem. Rules: no rhyming, must mention the moon, under 50 words
Example-Based	Input -> Output pairs, then new input	Sentiment: "Great product!" -> Positive Sentiment: "Terrible" -> Negative Sentiment: "Meh" -> ?
Chain-of-Thought	Solve step by step. Show your work.	Calculate 15% tip on $87.43. Show step-by-step math.
Deconstruction	Break [topic] into [N] parts. Explain each.	Break "machine learning pipeline" into 5 steps.
Perspective Taking	Explain [topic] to [audience] using [analogy]	Explain neural networks to a 10-year-old using Lego bricks.
Iterative Refinement	Draft, then improve based on criteria	Write a tagline. Then rewrite it to be funnier.

💡

Prompt engineering tips: (1) Be specific — vague prompts get vague answers. (2) Show, do not just tell — examples beat descriptions. (3) Iterate — your first prompt is rarely the best. (4) Use system prompts for persistent behavior. (5) Separate instructions from content using delimiters like triple quotes or XML tags.

⚠️

Common mistakes: Over-prompting (too many constraints can confuse the model), not specifying format (leading to unusable output), and assuming the model knows your context (always provide relevant background).

📊

Machine Learning Core

Algorithms & Metrics

The core of machine learning consists of algorithms for supervised and unsupervised learning, evaluation metrics, feature engineering, and techniques to prevent overfitting.

Supervised Learning Algorithms

Algorithm	Type	How It Works	Pros	Cons	Best For
Linear Regression	Regression	Fits a line y = mx + b minimizing MSE	Simple, interpretable, fast	Assumes linearity, sensitive to outliers	Baseline regression, trend prediction
Logistic Regression	Classification	Sigmoid function for binary probability	Interpretable, probabilistic output	Linear decision boundary only	Binary classification, baseline models
Decision Tree	Both	Splits data on feature thresholds recursively	Interpretable, handles non-linearity	Prone to overfitting, unstable	Feature importance analysis, interpretable models
Random Forest	Both	Ensemble of decision trees with bagging	Robust, handles non-linearity, less overfitting	Less interpretable, slower	Tabular data, feature selection
SVM	Both	Finds maximum-margin hyperplane between classes	Effective in high dimensions, kernel trick	Slow on large data, hard to tune	Classification with clear margins, text
KNN	Both	Classifies by majority vote of k nearest neighbors	Simple, no training phase, non-parametric	Slow inference, curse of dimensionality	Small datasets, recommendation
Naive Bayes	Classification	Applies Bayes theorem with feature independence	Fast, works with small data, text classification	Strong independence assumption	Spam filtering, text classification
XGBoost	Both	Gradient boosted decision trees sequentially	State-of-the-art on tabular data, fast	Complex tuning, prone to overfitting	Kaggle competitions, tabular data
LightGBM	Both	Gradient boosting with leaf-wise growth	Fastest GBM, handles large data	Can overfit on small data	Large datasets, production ML

supervised-learning-example.py

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import pandas as pd

# Load data
df = pd.read_csv("customers.csv")
X = df.drop("churn", axis=1)
y = df["churn"]

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train model
model = RandomForestClassifier(
    n_estimators=200, max_depth=10, min_samples_split=5, random_state=42
)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

# Feature importance
for feat, imp in sorted(
    zip(X.columns, model.feature_importances_), key=lambda x: -x[1]
):
    print(f"  {feat}: {imp:.4f}")

Unsupervised Learning Algorithms

Algorithm	Type	How It Works	Key Params	Best For
K-Means	Clustering	Partitions data into k clusters by minimizing within-cluster variance	k (clusters), init method, max_iter	Customer segmentation, image compression
DBSCAN	Clustering	Density-based: clusters are dense regions separated by sparse areas	eps (neighborhood radius), min_samples	Anomaly detection, non-spherical clusters
PCA	Dimensionality Reduction	Projects data onto principal components of maximum variance	n_components, explained_variance_ratio	Visualization, noise reduction, preprocessing
Hierarchical	Clustering	Builds a tree of clusters (agglomerative or divisive)	linkage (ward/complete/average), n_clusters	Taxonomy creation, small datasets
t-SNE	Visualization	Non-linear dimensionality reduction for 2D/3D visualization	perplexity, n_iter, learning_rate	Visualizing high-dimensional data
UMAP	Visualization / Reduction	Preserves both local and global structure, faster than t-SNE	n_neighbors, min_dist, n_components	Visualization, general-purpose dim reduction
Autoencoder	Representation Learning	Neural network that compresses then reconstructs data	latent_dim, layers, activation	Anomaly detection, denoising, feature learning

Model Evaluation Metrics

Metric	Formula / Description	Type	When to Use
Accuracy	(TP + TN) / Total — overall correctness	Classification	Balanced classes (not for imbalanced data)
Precision	TP / (TP + FP) — of predicted positive, how many are correct	Classification	When false positives are costly (spam detection)
Recall	TP / (TP + FN) — of actual positive, how many were found	Classification	When false negatives are costly (disease detection)
F1 Score	2 * (Precision * Recall) / (Precision + Recall)	Classification	Balance between precision and recall
AUC-ROC	Area under Receiver Operating Characteristic curve	Classification	Comparing models across all thresholds
Log Loss	-mean(ylog(p) + (1-y)log(1-p))	Classification	Probabilistic predictions, model calibration
RMSE	sqrt(mean((y_pred - y_actual)^2))	Regression	Penalizes large errors, standard regression metric
MAE	mean(\|y_pred - y_actual\|)	Regression	Robust to outliers, easy to interpret
R-squared	1 - SS_res / SS_tot	Regression	Explained variance, model comparison
BLEU	N-gram overlap between prediction and reference	NLP / Gen	Machine translation, text generation quality
ROUGE	Recall of n-grams from reference in prediction	NLP / Gen	Summarization evaluation
MMLU	Multi-subject accuracy across 57 academic topics	LLM Eval	General LLM knowledge benchmark

Feature Engineering Techniques

Technique	Description	Example
Imputation	Fill missing values (mean, median, mode, KNN)	df.fillna(df.median())
Encoding	Convert categorical to numerical (one-hot, label, target)	pd.get_dummies(df, columns=["color"])
Scaling	Normalize feature ranges (standard, min-max, robust)	StandardScaler().fit_transform(X)
Binning	Convert continuous to discrete categories	pd.cut(df["age"], bins=[0,18,35,50,100])
Polynomial Features	Create interaction and power features	X^2, X1*X2 from X1 and X2
Log Transform	Reduce skewness of distributions	np.log1p(df["income"])
Date Features	Extract components from datetime	day_of_week, is_weekend, quarter, hour
Text Features	TF-IDF, word count, sentiment, embeddings	TfidfVectorizer().fit_transform(texts)
Aggregation	Group-by statistics per category	mean/median/std per user_id
Target Encoding	Replace category with mean of target	Mean price per neighborhood

Overfitting, Underfitting & Regularization

Technique	Type	How It Works	When to Use
L1 (Lasso)	Regularization	Adds sum of \|weights\| to loss; drives weights to zero	Feature selection, sparse models
L2 (Ridge)	Regularization	Adds sum of weights^2 to loss; shrinks all weights	Preventing large weights, multicollinearity
Elastic Net	Regularization	Combines L1 + L2 penalties (alpha, l1_ratio)	Balanced regularization, many features
Dropout	Regularization (DL)	Randomly zeros activations during training	Neural networks, deep learning
Early Stopping	Training	Stop training when validation loss starts increasing	Prevent over-training on training set
Data Augmentation	Data	Create variations of training data (flip, rotate, noise)	Image, text, audio tasks
Cross-Validation	Evaluation	K-fold splits to get robust performance estimates	Small datasets, model selection
Ensemble Methods	Model	Combine multiple models (bagging, boosting, stacking)	Almost always improves performance
Batch Normalization	Regularization (DL)	Normalize activations per mini-batch	Deep networks, faster convergence
Weight Decay	Regularization	L2 penalty applied per step during optimization	Most DL training runs as default

regularization-examples.py

from sklearn.linear_model import Lasso, Ridge, ElasticNet
from sklearn.model_selection import cross_val_score

# L1 Regularization (feature selection)
lasso = Lasso(alpha=0.01)
print("Lasso R2:", cross_val_score(lasso, X, y, cv=5).mean())

# L2 Regularization (shrink weights)
ridge = Ridge(alpha=1.0)
print("Ridge R2:", cross_val_score(ridge, X, y, cv=5).mean())

# Elastic Net (L1 + L2)
elastic = ElasticNet(alpha=0.01, l1_ratio=0.5)
print("ElasticNet R2:", cross_val_score(elastic, X, y, cv=5).mean())

🔮

Deep Learning

Neural Networks

Deep learning uses multi-layered neural networks to learn hierarchical representations of data. It powers modern computer vision, NLP, speech, and generative AI.

Neural Network Architecture Basics

Input LayerReceives raw features (images as pixels, text as tokens, tabular data)

Hidden LayersStack of layers that transform representations progressively

Output LayerProduces final predictions (probabilities, values, sequences)

Dense (Fully Connected)Every neuron connected to all neurons in next layer. Used in classification/regression heads.

Convolution LayerApplies learned filters across spatial dimensions. Used for images and spatial patterns.

Recurrent LayerProcesses sequential data with hidden state memory. Used for time series and text.

Attention LayerComputes relevance scores between all positions. Core of Transformers and LLMs.

Pooling LayerDownsamples spatial dimensions (max pool, avg pool). Reduces parameters.

NormalizationBatchNorm, LayerNorm — stabilizes training by normalizing activations.

pytorch-neural-network.py

import torch
import torch.nn as nn

class SimpleClassifier(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_classes):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.BatchNorm1d(hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, num_classes),
        )

    def forward(self, x):
        return self.net(x)

# Usage
model = SimpleClassifier(input_dim=784, hidden_dim=256, num_classes=10)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)

Activation Functions Comparison

Function	Formula	Range	Pros	Cons	Common Use
ReLU	max(0, x)	[0, +inf)	Fast, no vanishing gradient, simple	Dying ReLU (neurons stuck at 0)	Hidden layers (default choice)
Sigmoid	1 / (1 + e^(-x))	(0, 1)	Output as probability, smooth	Vanishing gradient, not zero-centered	Binary classification output
Tanh	(e^x - e^-x) / (e^x + e^-x)	(-1, 1)	Zero-centered, stronger gradients	Still has vanishing gradient	RNNs, hidden states
GELU	x * Phi(x)	(~-0.17, +inf)	Smooth, non-monotonic, used in Transformers	Slightly more expensive	Transformer models (GPT, BERT)
Swish	x * sigmoid(x)	(~-0.28, +inf)	Smooth, self-gated, often outperforms ReLU	More compute	Deep networks, modern architectures
Softmax	e^xi / sum(e^xj)	(0, 1) sum=1	Probability distribution over classes	Not for multi-label; use sigmoid instead	Multi-class classification output
LeakyReLU	max(alpha*x, x)	(-inf, +inf)	No dying ReLU problem	Alpha is a hyperparameter	Variants when ReLU neurons die

Optimizers Comparison

Optimizer	Key Idea	Learning Rate	Pros	Cons	Best For
SGD	Gradient descent with momentum	Needs tuning (1e-2)	Simple, often best generalization	Slow convergence, sensitive to LR	Academic research, fine-tuning with patience
Adam	Adaptive LR per parameter (1st + 2nd moment)	1e-3 to 1e-4	Fast, adaptive, works well out of box	Can converge to sharp minima	Most DL tasks (default choice)
AdamW	Adam with decoupled weight decay	1e-3 to 1e-4	Better regularization than Adam	Same as Adam + extra hyperparam	Transformer training (LLMs)
RMSprop	Adaptive LR based on moving avg of squared grad	1e-3	Good for non-stationary objectives	Less popular than Adam	RNNs, older architectures
Adagrad	Adaptive LR per parameter, decreasing over time	1e-2	Good for sparse gradients	LR decays to near-zero	NLP with sparse features
Lion	Adaptive via sign of momentum	1e-4 to 3e-4	Less memory, faster than AdamW	Newer, less tested	Large-scale DL training

Architecture Families

Architecture	Best For	Key Models	How It Works	Year Introduced
CNN	Images, spatial data	ResNet, EfficientNet, ConvNeXt, YOLO	Convolutional filters learn spatial hierarchies	1989 (LeNet) / 2015 (ResNet)
RNN / LSTM	Sequences, time series	LSTM, GRU, Bidirectional	Hidden state processes one step at a time with gates	1997 (LSTM)
Transformer	Everything (text, vision, audio)	GPT, BERT, ViT, Whisper, DALL-E	Self-attention computes all-pairs relationships in parallel	2017 (Attention Is All You Need)
Diffusion	Image/audio/video generation	Stable Diffusion, DALL-E, Sora	Gradually denoise from pure noise to generate data	2020 (DDPM)
GAN	Image generation, style transfer	StyleGAN, CycleGAN, Pix2Pix	Generator and discriminator compete in adversarial game	2014 (GAN)
GNN	Graphs, molecules, social networks	GCN, GraphSAGE, GAT	Message passing between connected nodes	2017 (GCN)
Mamba / SSM	Long sequences, efficient LLMs	Mamba, Mamba-2, Jamba	State space models with selective memory	2023 (Mamba)

Popular Deep Learning Frameworks

Framework	Creator	Language	Strengths	Used By	Year
PyTorch	Meta (Facebook)	Python, C++	Dynamic graphs, intuitive, huge research community, torch.compile	Meta, OpenAI, most researchers	2016
TensorFlow / Keras	Google	Python, C++, JS	Production deployment, TPU support, tf.keras high-level API	Google, large enterprises	2015
JAX	Google	Python	Functional transform, auto-vectorization (vmap), JIT (jit), GPU/TPU	Google DeepMind, researchers	2018
Flax	Google	Python (JAX)	Linen API for neural nets on JAX	DeepMind researchers	2020
Transformers (HF)	Hugging Face	Python	50K+ pretrained models, easy fine-tuning, datasets, tokenizers	Most ML practitioners	2019
TensorRT	NVIDIA	Python, C++	Inference optimization, INT8 quantization, GPU acceleration	Production ML inference	2017

⚠️

Framework choice: For research and prototyping, use PyTorch. For production at scale, use PyTorch with TorchServe or ONNX Runtime. For Google Cloud / TPU, use JAX. Hugging Face Transformers wraps all frameworks and is essential for NLP/LLM work.

💬

Large Language Models (LLMs)

2025-26

Large Language Models are transformer-based models trained on massive text corpora, capable of generating text, code, and performing complex reasoning tasks.

Major LLM Model Families Comparison

Model	Developer	Parameters	Context Window	Key Strength	License	Pricing
GPT-4o	OpenAI	~1.8T (MoE)	128K	Best general purpose, multimodal, function calling	Proprietary	API: $2.50/1M input tokens
o3 / o4-mini	OpenAI	Unknown	200K	Deep reasoning, math, coding, agentic tasks	Proprietary	o3: $2/1M input, o4-mini: cheaper
Claude 4 Opus	Anthropic	Unknown	200K	Long context analysis, careful reasoning, safety	Proprietary	API: $15/1M input tokens
Claude 4 Sonnet	Anthropic	Unknown	200K	Best speed/cost ratio, excellent coding	Proprietary	API: $3/1M input tokens
Gemini 2.5 Pro	Google	Unknown	1M	Massive context, multimodal (video, audio), Google Search	Proprietary	API: $1.25-$10/1M tokens
Llama 3.1 405B	Meta	405B	128K	Largest open model, strong reasoning	Llama 3.1 License (free)	Self-hosted or cloud
Llama 4 Scout/Maverick	Meta	109B-400B+	10M	Largest context, mixture-of-experts	Llama 4 License	Self-hosted or cloud
Mistral Large 2	Mistral AI	123B	128K	Strong multilingual, function calling, coding	Mistral License	API: $2/1M input tokens
DeepSeek V3	DeepSeek	671B (MoE)	128K	Open-weight, strong coding and math, MoE efficiency	MIT License	Self-hosted or API
Qwen 3 235B	Alibaba	235B (MoE)	128K	Multilingual, strong coding, thinking mode	Apache 2.0	Self-hosted or API
Command R+	Cohere	104B	128K	RAG-optimized, multilingual, enterprise RAG	CC-BY-NC	Enterprise API pricing

Context Window Sizes

Model	Context Window	Approx. Words	Approx. Pages	Notes
GPT-4o	128K tokens	~96K words	~320 pages	Standard for most tasks
Claude 4	200K tokens	~150K words	~500 pages	Can analyze full books, large codebases
Gemini 2.5 Pro	1M tokens	~750K words	~2500 pages	Analyze entire books, hours of video
Llama 4 Maverick	10M tokens	~7.5M words	~25000 pages	Largest context ever in production
GPT-4o mini	128K tokens	~96K words	~320 pages	Cost-effective for long context
Gemini 2.5 Flash	1M tokens	~750K words	~2500 pages	Fast, cheap, huge context
DeepSeek V3	128K tokens	~96K words	~320 pages	Open-weight option for long context

Fine-Tuning vs RAG vs Prompt Engineering

Approach	Description	Cost	Effort	When to Use
Prompt Engineering	Craft optimal instructions and examples; no model changes	Lowest	Low	General tasks, quick iteration, non-expert users
RAG (Retrieval Augmented)	Retrieve relevant documents, inject into prompt at inference time	Low-Medium	Medium	Up-to-date knowledge, domain-specific data, citeable answers
Fine-Tuning (LoRA/QLoRA)	Train lightweight adapters on your data; keeps base model frozen	Medium	Medium	Specific tone/style, domain jargon, consistent formatting
Full Fine-Tuning	Retrain all model parameters on your dataset	High	High	Entirely new capabilities, domain adaptation, research
Distillation	Train a smaller model to mimic a larger teacher model	Medium	High	Reduce deployment cost, edge deployment
Pre-Training	Train a model from scratch on massive data	Very High	Very High	New languages, entirely new domains, large orgs

Tokenization Basics

TokenSubword unit — ~4 characters or 0.75 words in English. "ChatGPT" = 2 tokens, "hello" = 1 token.

BPE (Byte Pair Encoding)Iteratively merges most frequent character pairs. Used by GPT family.

SentencePieceTokenization library that works at byte level. Used by Llama, T5, others.

TiktokenOpenAI fast tokenizer (BPE). GPT-4 uses cl100k_base (~100K tokens in vocab).

Vocabulary SizeTotal unique tokens. GPT-4: 100K, Llama 3: 128K, Claude: unknown. Larger vocab = fewer tokens per word.

Special TokensBOS (begin), EOS (end), PAD (padding), UNK (unknown), system/user/assistant roles.

tokenization-example.py

import tiktoken

# Tokenize text
enc = tiktoken.encoding_for_model("gpt-4o")
text = "Machine learning is transforming the world!"
tokens = enc.encode(text)
print(f"Tokens: {tokens}")
print(f"Count: {len(tokens)} tokens")
print(f"Decoded: {enc.decode(tokens)}")

# Count tokens for cost estimation
def estimate_cost(text: str, model: str = "gpt-4o") -> float:
    enc = tiktoken.encoding_for_model(model)
    n_tokens = len(enc.encode(text))
    # GPT-4o pricing: $2.50/1M input, $10/1M output
    input_cost = (n_tokens / 1_000_000) * 2.50
    return round(input_cost, 6)

print(f"Estimated input cost: ${estimate_cost(text)}")

Embeddings & Vector Databases

Embedding Model	Dimensions	Max Input	Cost / 1M tokens	Best For
text-embedding-3-large	3072	8191	$0.13	High-quality retrieval, RAG
text-embedding-3-small	1536	8191	$0.02	Cost-effective RAG, classification
Cohere Embed v3	1024	128K	$0.10	Multilingual, RAG, reranking
Mistral Embed	1024	32K	$0.10	European languages, retrieval
BGE-large-en-v1.5	1024	512	Free	Open-source, local deployment
Nomic Embed	768	8192	Free	Open-source, long context
Snowflake Arctic Embed	384-1024	8192	Free	Open-source, high quality

Vector Database	Type	Max Scale	Key Features	Best For
Pinecone	Managed	Billions	Serverless, sparse-dense hybrid, metadata filtering	Production RAG, no-ops
Weaviate	Self-hosted / Cloud	Billions	Multi-modal, GraphQL/REST, built-in modules	Flexible deployment, hybrid search
Qdrant	Self-hosted / Cloud	Billions	Rust-based, fast, gRPC/REST, filtering	High-performance, on-premise
Chroma	Self-hosted	Millions	Lightweight, Python-native, perfect for dev	Prototyping, small projects
Milvus	Self-hosted	Tens of Billions	Distributed, GPU-accelerated, hybrid search	Enterprise-scale, multi-modal
pgvector	PostgreSQL extension	Millions	Runs in PostgreSQL, familiar SQL queries	Teams already using PostgreSQL
Elasticsearch 8+	Self-hosted / Cloud	Billions	Sparse + dense vectors, kNN search, aggregations	Combined keyword + vector search

💡

RAG architecture tip: Start with pgvector if you already use PostgreSQL for rapid prototyping. Move to Pinecone or Qdrant when you need scale. Always use a good embedding model (text-embedding-3-large or Cohere Embed v3) — the embedding model quality matters more than the vector DB choice.

🛡️

AI Ethics & Safety

Critical

As AI becomes more powerful and pervasive, understanding ethical considerations, safety practices, and regulatory frameworks is essential for responsible development and deployment.

Key AI Ethics Concerns

Concern	Description	Example	Mitigation
Bias & Fairness	Models reflect and amplify biases in training data	Hiring model favors male candidates due to historical data	Diverse training data, fairness metrics, bias audits, debiasing techniques
Transparency	Black-box models make it hard to explain decisions	Loan rejection without clear explanation	Explainable AI (XAI), SHAP values, LIME, decision logs
Privacy	Models can memorize and leak training data	LLM regurgitates PII or copyrighted text	Differential privacy, data anonymization, memorization checks
Hallucinations	Models generate plausible-sounding but false information	LLM cites non-existent legal cases or research papers	RAG with source grounding, fact-checking layers, confidence thresholds
Deepfakes	AI-generated media used for deception	Fake video of a CEO authorizing a wire transfer	Content provenance (C2PA), detection tools, watermarks
Misinformation	AI scales creation and spread of false content	Bot networks generating fake news articles	AI detection tools, platform moderation, media literacy
Job Displacement	Automation replacing human workers	Copywriters, customer service agents, entry-level programmers	Reskilling programs, UBI discussions, human-AI collaboration
Safety Alignment	Models may pursue goals misaligned with human values	Model generates harmful instructions when asked cleverly	RLHF, constitutional AI, red-teaming, safety benchmarks

Hallucinations: Types & Mitigation

Type	Description	Example	Mitigation Strategy
Factual Hallucination	Confidently states incorrect facts	Invents a book title and author	RAG with verified sources, fact-checking pipeline
Reference Hallucination	Cites non-existent sources or links	Fabricates URL or academic paper	Verify all citations against source corpus
Arithmetic Error	Wrong calculations presented confidently	Says 17 * 23 = 401 (actual: 391)	Use code interpreter tools, external calculator
Logical Error	Flawed reasoning chain leads to wrong conclusion	Correct math with wrong interpretation	Chain-of-thought with verification, step checking
Temporal Confusion	Mixes up dates, timelines, events	Claims an event happened in the wrong year	Provide date context in prompt, verify with search

AI Regulations Worldwide (2025)

Regulation	Region	Status	Key Provisions	Impact on Developers
EU AI Act	European Union	Enacted (2024), phased rollout 2025-2027	Risk-based tiers: Unacceptable, High, Limited, Minimal. High-risk AI needs conformity assessment.	Must classify AI systems, implement risk management, ensure transparency for AI-generated content.
US Executive Orders	United States	EO 14110 (Oct 2023), evolving	Safety testing for frontier models, AI watermarking standards, NIST AI RMF	Voluntary commitments for frontier model developers, sector-specific guidance.
UK AI Safety	United Kingdom	Pro-innovation approach (2023)	Sector-specific regulation, AI Safety Institute, no single AI law	Existing regulators (FCA, Ofcom) adapt to oversee AI in their domains.
China AI Regulations	China	Multiple enacted (2023-2025)	Deep synthesis rules, generative AI measures, algorithmic recommendations	Content moderation required, algorithmic transparency, real-name verification.
Canada AIDA	Canada	Proposed (C-27, under revision)	Responsible AI development, high-impact AI systems oversight	Impact assessment for high-impact systems, explainability requirements.
Brazil AI Bill	Brazil	In progress (PL 2338/2023)	Risk-based approach inspired by EU AI Act	Rights-based framework, risk classification, mandatory impact assessments.

Responsible AI Principles

Principle	Description	How to Implement
Fairness	AI should not discriminate or create unfair outcomes	Bias testing across demographic groups, fairness metrics (disparate impact, equal opportunity)
Accountability	Humans should be responsible for AI decisions	Audit trails, model cards, clear ownership, incident response plans
Transparency	AI decisions should be explainable	XAI tools (SHAP), model documentation, user-facing explanations
Privacy & Security	Protect user data throughout the AI lifecycle	Encryption, access controls, data minimization, regular security audits
Safety & Robustness	AI should behave safely and handle edge cases	Red-teaming, adversarial testing, failure mode analysis, graceful degradation
Human Oversight	Meaningful human control over AI systems	Human-in-the-loop for high-stakes decisions, override mechanisms, appeal processes
Sustainability	AI should consider environmental impact	Efficient model architectures, carbon-aware training, use small models when possible

safety-guardrails-example.py

# Example: Basic content safety check before processing
import re

def check_input_safety(prompt: str) -> tuple[bool, str]:
    """Basic input validation for AI applications."""
    blocked_patterns = [
        r"(ignore|bypass|override)s+(previous|safety|system)",
        r"(pretend|act|roleplay)s+(as|like)s+(admin|god|no one)",
        r"(jailbreak|DAN|hacked)",
    ]
    for pattern in blocked_patterns:
        if re.search(pattern, prompt, re.IGNORECASE):
            return False, "Input flagged by safety filter."

    # Length check
    if len(prompt) > 50000:
        return False, "Input exceeds maximum allowed length."

    return True, "Input passed safety checks."

# RAG answer grounding - require citations
def ground_answer(answer: str, sources: list[str]) -> str:
    """Ensure answer is grounded in provided sources."""
    return (
        f"{answer}

"
        f"Sources used: {', '.join(sources[:3])}
"
        f"Note: This answer is based on the provided documents "
        f"and may not reflect the most current information."
    )

🚫

Critical reminder: Never ship AI features that make high-stakes decisions (medical, legal, financial) without human oversight, extensive testing, and a clear escalation path. Always disclose AI involvement to end users.

💻

AI for Developers

Build AI Apps

Integrating AI into your applications is now a core developer skill. This section covers APIs, frameworks, patterns, and cost optimization for building AI-powered products.

OpenAI API Quick Reference

API	Endpoint	Use Case	Model	Pricing (per 1M tokens)
Chat Completions	POST /v1/chat/completions	Conversational AI, tasks, agents	gpt-4o, gpt-4o-mini, o3, o4-mini	$0.15-$2.50 input, $0.60-$10 output
Embeddings	POST /v1/embeddings	Text embeddings for search, RAG	text-embedding-3-large/small	$0.02-$0.13
Images	POST /v1/images/generations	Generate images from text	DALL-E 3, gpt-image-1	$0.04-$0.08/image
Audio (TTS)	POST /v1/audio/speech	Text-to-speech	tts-1, tts-1-hd	$15/1M characters
Audio (STT)	POST /v1/audio/transcriptions	Speech-to-text	whisper-1	$0.006/min
Moderation	POST /v1/moderations	Content safety filtering	omni-moderation-latest	Free (included)
Assistants	POST /v1/assistants	Stateful agents with tools	gpt-4o, gpt-4o-mini	Model pricing + $0.02/assistant/day
Batch API	POST /v1/chat/completions (batch file)	50% cheaper async processing	gpt-4o, gpt-4o-mini	50% discount on all models

openai-api-examples.py

from openai import OpenAI
import json

client = OpenAI()  # uses OPENAI_API_KEY env var

# ── 1. Chat Completion ──────────────────────────────
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain async/await in Python."},
    ],
    temperature=0.7,
    max_tokens=500,
)
print(response.choices[0].message.content)

# ── 2. Streaming Response ───────────────────────────
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a haiku about code."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# ── 3. Function Calling (Tool Use) ─────────────────
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                    },
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",
)

# Check if model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(f"Call: {tool_call.function.name}({args})")

# ── 4. Embeddings for RAG ──────────────────────────
embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="Machine learning is a subset of AI.",
)
print(f"Dimensions: {len(embedding.data[0].embedding)}")
print(f"First 5: {embedding.data[0].embedding[:5]}")

AI Application Frameworks

Framework	Focus	Key Features	Language	Best For
LangChain	LLM Orchestration	Chains, agents, tools, memory, RAG pipelines, 700+ integrations	Python / TypeScript	Complex LLM apps, RAG, agents
LlamaIndex	RAG-Focused	Data connectors, indexing, query engines, advanced RAG patterns	Python	RAG applications, document Q&A
Semantic Kernel	Enterprise AI (Microsoft)	Planners, plugins, connectors, .NET/Python, Azure integration	Python / C#	Microsoft enterprise, .NET shops
CrewAI	Multi-Agent Systems	Role-based agents, tasks, crews, collaboration patterns	Python	Multi-agent workflows, team AI
AutoGen (Microsoft)	Multi-Agent Conversations	Agent-to-agent chat, human-in-the-loop, code execution	Python	Research, complex multi-step tasks
Haystack (deepset)	NLP Pipelines	Document store, retriever, reader, generator pipelines	Python	Production NLP, search, Q&A
Vercel AI SDK	Full-Stack AI UI	Streaming UI, Edge runtime, React/Vue/Svelte helpers	TypeScript	Next.js AI features, chat UIs
Dify	Visual AI Builder	Visual workflow builder, RAG, agents, model management	Python (self-hosted)	Teams wanting no-code AI app builder

langchain-rag-example.py

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

# 1. Load and split documents
loader = PyPDFLoader("company-handbook.pdf")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["

", "
", ". ", " ", ""],
)
chunks = splitter.split_documents(docs)

# 2. Create vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db",
)

# 3. Create RAG chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

template = """Answer the question based on these context documents.
If unsure, say "I don't have enough information."

Context:
{context}

Question: {question}

Provide a clear, concise answer with source references."""

prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    return "

".join(
        f"[Doc {i+1}] {d.page_content}" for i, d in enumerate(docs)
    )

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
)

# 4. Query
answer = chain.invoke("What is the vacation policy?")
print(answer.content)

AI-Powered Features to Add to Your App

Feature	Complexity	API Needed	Impact	Implementation Tip
Semantic Search	Medium	Embeddings + Vector DB	High	Replace keyword search with embedding similarity for 10x better relevance
Chatbot / Assistant	Medium	Chat Completions	High	Start with FAQ RAG, add tools/function calling for actions
Content Generation	Low	Chat Completions	Medium	Blog drafts, product descriptions, email templates, summaries
Code Review Bot	Medium	Chat Completions + Git API	Medium	Analyze PR diffs, suggest improvements, catch bugs
Document Q&A	Medium	Embeddings + RAG	High	Upload docs, ask questions, get cited answers
Sentiment Analysis	Low	Embeddings / Classification	Medium	Analyze reviews, support tickets, social media mentions
Image Generation	Low	DALL-E / Stable Diffusion	Medium	Product mockups, social media graphics, design iterations
Text Summarization	Low	Chat Completions	High	Meeting notes, article summaries, document digests
Translation	Low	Chat Completions / GPT-4o	Medium	Multilingual support, localize content, real-time translation
Data Extraction	Medium	Chat Completions + Structured Output	High	Extract structured data from invoices, forms, emails (JSON mode)
Voice Interface	High	Whisper + TTS + Chat	High	Voice assistants, accessibility, hands-free interaction
Recommendation Engine	High	Embeddings / Collaborative Filtering	High	Personalized content, products, or features based on user behavior

Cost Optimization Tips

Strategy	Savings	Description	When to Use
Use Smaller Models	60-90%	GPT-4o-mini costs 90% less than GPT-4o. Use larger models only when needed.	Drafting, classification, simple tasks, bulk processing
Batch API	50%	Submit async batch jobs for non-urgent work. 50% discount on all models.	Bulk embedding generation, processing large document sets
Caching	80-99%	Cache identical queries and responses. Avoid redundant API calls.	FAQ bots, repeated user queries, leaderboards
Prompt Caching	Up to 50%	OpenAI caches repeated system prompts. Keep system prompt static.	Long system prompts, RAG with stable context
Token Optimization	20-40%	Shorten prompts, compress context, use fewer tokens. Every token costs money.	Production systems processing many requests
Quantization (Local)	Hardware savings	Use 4-bit or 8-bit quantized models for self-hosted inference.	Self-hosting Llama, Mistral, or other open models
Semantic Caching	70-90%	Cache semantically similar queries, not just identical ones.	Customer support, FAQ, any Q&A system
Rate Limiting	Variable	Prevent runaway costs from bugs or abuse. Set daily/monthly budgets.	All production AI applications

cost-optimization-example.py

import hashlib
import json

# ── 1. Simple Response Cache ───────────────────────
_cache: dict[str, str] = {}

def cached_completion(prompt: str, model: str = "gpt-4o-mini") -> str:
    """Cache responses to avoid duplicate API calls."""
    key = hashlib.md5(f"{model}:{prompt}".encode()).hexdigest()
    if key in _cache:
        print("Cache hit!")
        return _cache[key]

    # Actual API call
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )
    result = response.choices[0].message.content
    _cache[key] = result
    return result

# ── 2. Model Routing by Complexity ─────────────────
def route_model(prompt: str, system_prompt: str) -> str:
    """Use cheap model for simple tasks, expensive for complex."""
    # Simple heuristic: use mini for short prompts
    is_simple = len(prompt.split()) < 50

    from openai import OpenAI
    client = OpenAI()
    model = "gpt-4o-mini" if is_simple else "gpt-4o"

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ],
        temperature=0,
    )
    return response.choices[0].message.content

# ── 3. Structured Output (JSON Mode) ──────────────
response_schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "product_review",
        "strict": True,
        "schema": {
            "type": "object",
            "properties": {
                "sentiment": {
                    "type": "string",
                    "enum": ["positive", "negative", "neutral"],
                },
                "rating": {"type": "number"},
                "summary": {"type": "string"},
            },
            "required": ["sentiment", "rating", "summary"],
            "additionalProperties": False,
        },
    },
}

# Use with response_format parameter in API call
# response = client.chat.completions.create(
#     model="gpt-4o-mini",
#     messages=[...],
#     response_format=response_schema,
# )

💡

Build strategy: Start with the smallest, cheapest model that works. Use GPT-4o-mini for 80% of tasks. Only upgrade to GPT-4o or Claude for tasks requiring deep reasoning. Always cache responses. Always set a budget alert on your API provider. Use structured output (JSON mode) to build reliable pipelines.

⚠️

Production checklist: Before shipping AI features: (1) Add rate limiting, (2) Implement input/output validation, (3) Set up monitoring and alerting, (4) Test for prompt injection, (5) Add fallback for API failures, (6) Log all requests for debugging, (7) Set cost budgets, (8) Add A/B testing, (9) Create a model versioning strategy, (10) Document expected latency SLAs.

AI Learning Roadmap for Developers (2025-26)

Stage	Duration	Focus Areas	Resources
Foundation	2-4 weeks	Python, NumPy, Pandas, basic statistics, linear algebra	Khan Academy (math), Python official docs, Kaggle Learn
ML Basics	4-6 weeks	Scikit-learn, supervised/unsupervised learning, evaluation metrics	Andrew Ng Machine Learning course, Hands-On ML book
Deep Learning	4-8 weeks	PyTorch, neural networks, CNNs, RNNs, training techniques	fast.ai, Andrej Karpathy YouTube, Deep Learning book
NLP & LLMs	4-6 weeks	Transformers, Hugging Face, embeddings, RAG, fine-tuning	Hugging Face NLP course, LangChain docs, OpenAI cookbook
AI Engineering	4-6 weeks	APIs, LangChain, vector DBs, production deployment, cost optimization	OpenAI API docs, Vercel AI SDK, production ML blogs
Specialization	Ongoing	Agents, multimodal AI, fine-tuning, MLOps, AI safety	Papers, conferences, community, build projects

Key AI Benchmarks (2025-26)

MMLU (Massive Multitask Language Understanding)Tests 57 subjects (STEM, humanities, social sciences). Top models: 90%+. Measures general knowledge.

MMLU-ProHarder version with more answer choices and expert-level questions.

HumanEvalPython code generation benchmark (164 problems). Measures functional correctness of generated code.

SWE-benchReal GitHub issues solved. Tests ability to fix real-world software bugs. Top: 50%+ on verified.

GPQA (Google-Proof Q&A)PhD-level questions that are hard to Google. Tests expert reasoning, not memorization.

Arena (LMSYS Chatbot Arena)Elo rating from human pairwise comparisons. Most trusted real-world ranking. GPT-4o and Claude lead.

MATH / GSM8KMathematical reasoning benchmarks. GSM8K is grade-school level, MATH is competition level.

⏳

Loading cheatsheet...