AI Fundamentals, ML Algorithms, Deep Learning, LLMs, Prompt Engineering, AI Tools & Ethics — everything for the AI era.
Artificial Intelligence is the broad field of creating machines capable of intelligent behavior. Understanding the hierarchy of AI, ML, DL, and GenAI is foundational.
| Field | Definition | Scope | Examples |
|---|---|---|---|
| AI (Artificial Intelligence) | Machines simulating human intelligence — reasoning, learning, perception | Broadest — umbrella for all intelligent systems | Rule engines, expert systems, robotics |
| ML (Machine Learning) | Systems that learn from data without explicit programming | Subset of AI — data-driven learning | Spam filters, recommendation engines, fraud detection |
| DL (Deep Learning) | Neural networks with many layers learning hierarchical representations | Subset of ML — inspired by brain structure | Image recognition, speech synthesis, GPT |
| GenAI (Generative AI) | Models that create new content (text, images, code, audio, video) | Subset of DL — focused on generation | ChatGPT, Midjourney, Copilot, Sora |
| Type | Data | Goal | Key Algorithms | Use Cases |
|---|---|---|---|---|
| Supervised | Labeled data | Predict outcomes from labeled examples | Linear/Logistic Regression, Decision Trees, SVM, Random Forest, KNN | Classification (spam/not spam), Regression (price prediction) |
| Unsupervised | Unlabeled data | Discover hidden patterns and structure | K-Means, DBSCAN, PCA, Hierarchical Clustering, Autoencoders | Customer segmentation, anomaly detection, topic modeling |
| Reinforcement | Reward signals | Learn optimal actions through trial and reward | Q-Learning, PPO, DQN, A2C, SAC | Game playing (AlphaGo), robotics, recommendation systems |
| Self-Supervised | Unlabeled (creates labels) | Learn representations by predicting parts of input | BERT, GPT, SimCLR, BYOL | Language modeling, pretraining LLMs |
| Semi-Supervised | Mix of labeled + unlabeled | Improve learning with small labeled + large unlabeled data | Label propagation, consistency regularization | Medical imaging, NLP with limited labels |
The AI tool landscape in 2025-2026 is dominated by large language model chatbots, image generators, coding assistants, and specialized AI platforms.
| Platform | Best Model | Context Window | Key Features | Pricing (Starts At) | Best For |
|---|---|---|---|---|---|
| ChatGPT (OpenAI) | GPT-4o / o3 | 128K tokens | Web browsing, Code Interpreter, vision, DALL-E, plugins, GPTs | Free / Plus $20/mo | General purpose, coding, analysis |
| Claude (Anthropic) | Claude 4 Sonnet/Opus | 200K tokens | Artifacts, projects, large file analysis, vision, extended thinking | Free / Pro $20/mo | Long documents, coding, careful analysis |
| Gemini (Google) | Gemini 2.5 Pro | 1M tokens | Google Search integration, Workspace apps, video/audio understanding | Free / Advanced $20/mo | Research, Google ecosystem, multimodal |
| Copilot (Microsoft) | GPT-4o + internal models | 128K tokens | Office integration, GitHub, Teams, Windows, Enterprise GraphRAG | Free / Pro $20/mo | Microsoft users, enterprise workflows |
| Tool | Developer | Style | Resolution | Key Features | Pricing |
|---|---|---|---|---|---|
| DALL-E 3 | OpenAI | Versatile, follows prompts | 1024x1024, 1792x1024 | ChatGPT integration, inpainting, edits | Included in ChatGPT Plus |
| Midjourney v6.1 | Midjourney | Artistic, photorealistic | Up to 2048x2048 | Style tuning, character reference, zoom, vary | $10-$60/mo subscription |
| Stable Diffusion 3.5 | Stability AI | Open, customizable | Up to 1MP | Open-source, local running, ControlNet, LoRA fine-tuning | Free (open-source) |
| Flux (Black Forest) | Black Forest Labs | Photorealistic, creative | Variable | Fast inference, high quality, open weights | Free tier / API pricing |
| Ideogram 2.0 | Ideogram | Text-in-image, design | Up to 2048x2048 | Best text rendering in images, style controls | Free / Pro plans |
| Adobe Firefly | Adobe | Commercially safe, design | Vector + raster | Photoshop integration, Content Credentials, training on licensed data | Included in Creative Cloud |
| Tool | Type | Models Used | Key Features | IDE Integration | Pricing |
|---|---|---|---|---|---|
| GitHub Copilot | Autocomplete + Chat | Claude 3.5 Sonnet, GPT-4o | Inline suggestions, Copilot Chat, PR summaries, Actions | VS Code, JetBrains, Neovim, Vim | $10/mo (Individual) |
| Cursor | AI-first IDE | Claude 3.5 Sonnet, GPT-4o | Codebase-aware edits, Composer, multi-file edits, tab completion | Built-in (fork of VS Code) | Free / Pro $20/mo |
| Tabnine | Autocomplete | Proprietary + open models | Privacy-first, on-premise option, whole-line completion | VS Code, JetBrains, all major IDEs | Free / Pro $12/mo |
| Windsurf (Codeium) | AI-first IDE | Multiple models | Cascade (multi-step agents), Flow state, inline edits | Built-in | Free / Pro $15/mo |
| Amazon Q Developer | Chat + Agent | Amazon models | Code transformation, security scans, legacy upgrades (Java) | VS Code, JetBrains, CLI | Free tier / Pro $19/mo |
| Tool | Category | Key Capability | Pricing | Best For |
|---|---|---|---|---|
| Sora (OpenAI) | Video generation | Text-to-video up to 60s, realistic physics | ChatGPT Pro $200/mo | Creative video content, ads |
| Runway Gen-3 Alpha | Video generation | Text/image-to-video, motion brush, camera controls | $12-$76/mo | Filmmakers, content creators |
| HeyGen | Avatar video | AI avatars, voice cloning, translation | $24-$180/mo | Corporate training, marketing videos |
| ElevenLabs | Voice / Audio | Voice cloning, text-to-speech, dubbing, sound effects | Free / Pro $5-$22/mo | Voiceovers, podcasts, audiobooks |
| Suno / Udio | Music generation | Full song generation from text prompts | Free / Pro $8-$30/mo | Music creation, content creators |
| Descript | Audio/Video editing | Edit media by editing transcript, AI voice, Eye contact | Free / Pro $24/mo | Podcasters, video editors |
| Whisper (OpenAI) | Speech-to-text | Open-source transcription, 99 languages | Free (open-source) | Transcription, accessibility |
| Kling AI | Video generation | Text/image-to-video, 1080p, 5-10s clips | Free credits / API | Social media, marketing |
| Platform | Focus | What You Can Build | Pricing | Best For |
|---|---|---|---|---|
| Make (Integromat) | AI automation | Automated workflows with AI steps (GPT, vision, classification) | Free / Pro $9-$16/mo | Business automation, data pipelines |
| Zapier AI | AI automation | AI-powered workflows, Chatbots, summarization | Free / Pro $20-$299/mo | Business users, no-code automation |
| Bubble + AI | Full-stack app builder | AI-powered web apps, connect any LLM API | Free / Pro $29-$119/mo | MVPs, SaaS products |
| Flowise | Visual LLM builder | LangChain-based visual chatbots, RAG pipelines | Free (open-source) | Developers wanting visual AI builder |
| Relevance AI | AI workforce | Build AI agents, tools, no-code workflows | Free / Pro plans | Teams building AI-powered tools |
| Glide + AI | Mobile apps | AI-powered mobile apps from spreadsheets | Free / Pro $25-$99/mo | Mobile-first internal tools |
| Hugging Face Spaces | ML demos | Host ML model demos, Gradio / Streamlit apps | Free / Pro $9/mo | ML researchers, demo hosting |
Prompt engineering is the art and science of crafting inputs that elicit the best possible outputs from AI models. It is one of the most valuable skills in the AI era.
Use the CREATE framework for effective prompts:
You are an expert Python developer with 10 years of experience in data engineering.
[CONTEXT] I am building a data pipeline that processes CSV files from an S3 bucket,
transforms the data, and loads it into a PostgreSQL database.
[REQUEST] Write a Python script using pandas and psycopg2 that:
1. Reads CSV files from a local directory
2. Validates the schema
3. Inserts valid rows into a PostgreSQL table
[AUDIENCE] This will be used by a junior developer on my team. Include error handling
and inline comments explaining each step.
[TONE] Professional but accessible. Include type hints.
[FORMAT] Return only the Python code with comments. No markdown code fences.| Technique | Description | When to Use | Example |
|---|---|---|---|
| Zero-shot | Model uses its training knowledge with no examples | Simple, well-defined tasks | Translate this to French: Hello world |
| Few-shot | Provide 2-5 input-output examples in the prompt | Tasks needing specific format or style | Q: big -> small Q: hot -> ? A: cold |
| Chain-of-Thought (CoT) | Ask model to think step-by-step before answering | Math, reasoning, multi-step problems | Solve step by step: If a train leaves at... |
| Tree-of-Thought (ToT) | Explore multiple reasoning paths, evaluate, choose best | Complex planning, strategy problems | Consider 3 approaches. Evaluate trade-offs... |
| Role Prompting | Assign a specific role or persona to the model | Domain-specific expertise needed | You are a senior security auditor... |
| Self-Consistency | Generate multiple CoT answers, take majority vote | Reducing errors in reasoning tasks | Solve this 5 different ways and find consensus |
| ReAct | Combine reasoning with acting (tool use, API calls) | Agents, multi-step tool use | Think about what tool you need, then use it. |
| Pattern | Template | Example |
|---|---|---|
| Direct Instruction | Do [task] as [role] in [format] | Summarize this article in 3 bullet points as a project manager |
| Format Specification | Output as [format]: JSON/markdown/table/list | Return results as JSON with keys: name, score, grade |
| Constraint Setting | Rules: no X, must include Y, max Z words | Write a poem. Rules: no rhyming, must mention the moon, under 50 words |
| Example-Based | Input -> Output pairs, then new input | Sentiment: "Great product!" -> Positive Sentiment: "Terrible" -> Negative Sentiment: "Meh" -> ? |
| Chain-of-Thought | Solve step by step. Show your work. | Calculate 15% tip on $87.43. Show step-by-step math. |
| Deconstruction | Break [topic] into [N] parts. Explain each. | Break "machine learning pipeline" into 5 steps. |
| Perspective Taking | Explain [topic] to [audience] using [analogy] | Explain neural networks to a 10-year-old using Lego bricks. |
| Iterative Refinement | Draft, then improve based on criteria | Write a tagline. Then rewrite it to be funnier. |
The core of machine learning consists of algorithms for supervised and unsupervised learning, evaluation metrics, feature engineering, and techniques to prevent overfitting.
| Algorithm | Type | How It Works | Pros | Cons | Best For |
|---|---|---|---|---|---|
| Linear Regression | Regression | Fits a line y = mx + b minimizing MSE | Simple, interpretable, fast | Assumes linearity, sensitive to outliers | Baseline regression, trend prediction |
| Logistic Regression | Classification | Sigmoid function for binary probability | Interpretable, probabilistic output | Linear decision boundary only | Binary classification, baseline models |
| Decision Tree | Both | Splits data on feature thresholds recursively | Interpretable, handles non-linearity | Prone to overfitting, unstable | Feature importance analysis, interpretable models |
| Random Forest | Both | Ensemble of decision trees with bagging | Robust, handles non-linearity, less overfitting | Less interpretable, slower | Tabular data, feature selection |
| SVM | Both | Finds maximum-margin hyperplane between classes | Effective in high dimensions, kernel trick | Slow on large data, hard to tune | Classification with clear margins, text |
| KNN | Both | Classifies by majority vote of k nearest neighbors | Simple, no training phase, non-parametric | Slow inference, curse of dimensionality | Small datasets, recommendation |
| Naive Bayes | Classification | Applies Bayes theorem with feature independence | Fast, works with small data, text classification | Strong independence assumption | Spam filtering, text classification |
| XGBoost | Both | Gradient boosted decision trees sequentially | State-of-the-art on tabular data, fast | Complex tuning, prone to overfitting | Kaggle competitions, tabular data |
| LightGBM | Both | Gradient boosting with leaf-wise growth | Fastest GBM, handles large data | Can overfit on small data | Large datasets, production ML |
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import pandas as pd
# Load data
df = pd.read_csv("customers.csv")
X = df.drop("churn", axis=1)
y = df["churn"]
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Train model
model = RandomForestClassifier(
n_estimators=200, max_depth=10, min_samples_split=5, random_state=42
)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
# Feature importance
for feat, imp in sorted(
zip(X.columns, model.feature_importances_), key=lambda x: -x[1]
):
print(f" {feat}: {imp:.4f}")| Algorithm | Type | How It Works | Key Params | Best For |
|---|---|---|---|---|
| K-Means | Clustering | Partitions data into k clusters by minimizing within-cluster variance | k (clusters), init method, max_iter | Customer segmentation, image compression |
| DBSCAN | Clustering | Density-based: clusters are dense regions separated by sparse areas | eps (neighborhood radius), min_samples | Anomaly detection, non-spherical clusters |
| PCA | Dimensionality Reduction | Projects data onto principal components of maximum variance | n_components, explained_variance_ratio | Visualization, noise reduction, preprocessing |
| Hierarchical | Clustering | Builds a tree of clusters (agglomerative or divisive) | linkage (ward/complete/average), n_clusters | Taxonomy creation, small datasets |
| t-SNE | Visualization | Non-linear dimensionality reduction for 2D/3D visualization | perplexity, n_iter, learning_rate | Visualizing high-dimensional data |
| UMAP | Visualization / Reduction | Preserves both local and global structure, faster than t-SNE | n_neighbors, min_dist, n_components | Visualization, general-purpose dim reduction |
| Autoencoder | Representation Learning | Neural network that compresses then reconstructs data | latent_dim, layers, activation | Anomaly detection, denoising, feature learning |
| Metric | Formula / Description | Type | When to Use |
|---|---|---|---|
| Accuracy | (TP + TN) / Total — overall correctness | Classification | Balanced classes (not for imbalanced data) |
| Precision | TP / (TP + FP) — of predicted positive, how many are correct | Classification | When false positives are costly (spam detection) |
| Recall | TP / (TP + FN) — of actual positive, how many were found | Classification | When false negatives are costly (disease detection) |
| F1 Score | 2 * (Precision * Recall) / (Precision + Recall) | Classification | Balance between precision and recall |
| AUC-ROC | Area under Receiver Operating Characteristic curve | Classification | Comparing models across all thresholds |
| Log Loss | -mean(y*log(p) + (1-y)*log(1-p)) | Classification | Probabilistic predictions, model calibration |
| RMSE | sqrt(mean((y_pred - y_actual)^2)) | Regression | Penalizes large errors, standard regression metric |
| MAE | mean(|y_pred - y_actual|) | Regression | Robust to outliers, easy to interpret |
| R-squared | 1 - SS_res / SS_tot | Regression | Explained variance, model comparison |
| BLEU | N-gram overlap between prediction and reference | NLP / Gen | Machine translation, text generation quality |
| ROUGE | Recall of n-grams from reference in prediction | NLP / Gen | Summarization evaluation |
| MMLU | Multi-subject accuracy across 57 academic topics | LLM Eval | General LLM knowledge benchmark |
| Technique | Description | Example |
|---|---|---|
| Imputation | Fill missing values (mean, median, mode, KNN) | df.fillna(df.median()) |
| Encoding | Convert categorical to numerical (one-hot, label, target) | pd.get_dummies(df, columns=["color"]) |
| Scaling | Normalize feature ranges (standard, min-max, robust) | StandardScaler().fit_transform(X) |
| Binning | Convert continuous to discrete categories | pd.cut(df["age"], bins=[0,18,35,50,100]) |
| Polynomial Features | Create interaction and power features | X^2, X1*X2 from X1 and X2 |
| Log Transform | Reduce skewness of distributions | np.log1p(df["income"]) |
| Date Features | Extract components from datetime | day_of_week, is_weekend, quarter, hour |
| Text Features | TF-IDF, word count, sentiment, embeddings | TfidfVectorizer().fit_transform(texts) |
| Aggregation | Group-by statistics per category | mean/median/std per user_id |
| Target Encoding | Replace category with mean of target | Mean price per neighborhood |
| Technique | Type | How It Works | When to Use |
|---|---|---|---|
| L1 (Lasso) | Regularization | Adds sum of |weights| to loss; drives weights to zero | Feature selection, sparse models |
| L2 (Ridge) | Regularization | Adds sum of weights^2 to loss; shrinks all weights | Preventing large weights, multicollinearity |
| Elastic Net | Regularization | Combines L1 + L2 penalties (alpha, l1_ratio) | Balanced regularization, many features |
| Dropout | Regularization (DL) | Randomly zeros activations during training | Neural networks, deep learning |
| Early Stopping | Training | Stop training when validation loss starts increasing | Prevent over-training on training set |
| Data Augmentation | Data | Create variations of training data (flip, rotate, noise) | Image, text, audio tasks |
| Cross-Validation | Evaluation | K-fold splits to get robust performance estimates | Small datasets, model selection |
| Ensemble Methods | Model | Combine multiple models (bagging, boosting, stacking) | Almost always improves performance |
| Batch Normalization | Regularization (DL) | Normalize activations per mini-batch | Deep networks, faster convergence |
| Weight Decay | Regularization | L2 penalty applied per step during optimization | Most DL training runs as default |
from sklearn.linear_model import Lasso, Ridge, ElasticNet
from sklearn.model_selection import cross_val_score
# L1 Regularization (feature selection)
lasso = Lasso(alpha=0.01)
print("Lasso R2:", cross_val_score(lasso, X, y, cv=5).mean())
# L2 Regularization (shrink weights)
ridge = Ridge(alpha=1.0)
print("Ridge R2:", cross_val_score(ridge, X, y, cv=5).mean())
# Elastic Net (L1 + L2)
elastic = ElasticNet(alpha=0.01, l1_ratio=0.5)
print("ElasticNet R2:", cross_val_score(elastic, X, y, cv=5).mean())Deep learning uses multi-layered neural networks to learn hierarchical representations of data. It powers modern computer vision, NLP, speech, and generative AI.
import torch
import torch.nn as nn
class SimpleClassifier(nn.Module):
def __init__(self, input_dim, hidden_dim, num_classes):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.BatchNorm1d(hidden_dim),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(hidden_dim, hidden_dim // 2),
nn.ReLU(),
nn.Linear(hidden_dim // 2, num_classes),
)
def forward(self, x):
return self.net(x)
# Usage
model = SimpleClassifier(input_dim=784, hidden_dim=256, num_classes=10)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)| Function | Formula | Range | Pros | Cons | Common Use |
|---|---|---|---|---|---|
| ReLU | max(0, x) | [0, +inf) | Fast, no vanishing gradient, simple | Dying ReLU (neurons stuck at 0) | Hidden layers (default choice) |
| Sigmoid | 1 / (1 + e^(-x)) | (0, 1) | Output as probability, smooth | Vanishing gradient, not zero-centered | Binary classification output |
| Tanh | (e^x - e^-x) / (e^x + e^-x) | (-1, 1) | Zero-centered, stronger gradients | Still has vanishing gradient | RNNs, hidden states |
| GELU | x * Phi(x) | (~-0.17, +inf) | Smooth, non-monotonic, used in Transformers | Slightly more expensive | Transformer models (GPT, BERT) |
| Swish | x * sigmoid(x) | (~-0.28, +inf) | Smooth, self-gated, often outperforms ReLU | More compute | Deep networks, modern architectures |
| Softmax | e^xi / sum(e^xj) | (0, 1) sum=1 | Probability distribution over classes | Not for multi-label; use sigmoid instead | Multi-class classification output |
| LeakyReLU | max(alpha*x, x) | (-inf, +inf) | No dying ReLU problem | Alpha is a hyperparameter | Variants when ReLU neurons die |
| Optimizer | Key Idea | Learning Rate | Pros | Cons | Best For |
|---|---|---|---|---|---|
| SGD | Gradient descent with momentum | Needs tuning (1e-2) | Simple, often best generalization | Slow convergence, sensitive to LR | Academic research, fine-tuning with patience |
| Adam | Adaptive LR per parameter (1st + 2nd moment) | 1e-3 to 1e-4 | Fast, adaptive, works well out of box | Can converge to sharp minima | Most DL tasks (default choice) |
| AdamW | Adam with decoupled weight decay | 1e-3 to 1e-4 | Better regularization than Adam | Same as Adam + extra hyperparam | Transformer training (LLMs) |
| RMSprop | Adaptive LR based on moving avg of squared grad | 1e-3 | Good for non-stationary objectives | Less popular than Adam | RNNs, older architectures |
| Adagrad | Adaptive LR per parameter, decreasing over time | 1e-2 | Good for sparse gradients | LR decays to near-zero | NLP with sparse features |
| Lion | Adaptive via sign of momentum | 1e-4 to 3e-4 | Less memory, faster than AdamW | Newer, less tested | Large-scale DL training |
| Architecture | Best For | Key Models | How It Works | Year Introduced |
|---|---|---|---|---|
| CNN | Images, spatial data | ResNet, EfficientNet, ConvNeXt, YOLO | Convolutional filters learn spatial hierarchies | 1989 (LeNet) / 2015 (ResNet) |
| RNN / LSTM | Sequences, time series | LSTM, GRU, Bidirectional | Hidden state processes one step at a time with gates | 1997 (LSTM) |
| Transformer | Everything (text, vision, audio) | GPT, BERT, ViT, Whisper, DALL-E | Self-attention computes all-pairs relationships in parallel | 2017 (Attention Is All You Need) |
| Diffusion | Image/audio/video generation | Stable Diffusion, DALL-E, Sora | Gradually denoise from pure noise to generate data | 2020 (DDPM) |
| GAN | Image generation, style transfer | StyleGAN, CycleGAN, Pix2Pix | Generator and discriminator compete in adversarial game | 2014 (GAN) |
| GNN | Graphs, molecules, social networks | GCN, GraphSAGE, GAT | Message passing between connected nodes | 2017 (GCN) |
| Mamba / SSM | Long sequences, efficient LLMs | Mamba, Mamba-2, Jamba | State space models with selective memory | 2023 (Mamba) |
| Framework | Creator | Language | Strengths | Used By | Year |
|---|---|---|---|---|---|
| PyTorch | Meta (Facebook) | Python, C++ | Dynamic graphs, intuitive, huge research community, torch.compile | Meta, OpenAI, most researchers | 2016 |
| TensorFlow / Keras | Python, C++, JS | Production deployment, TPU support, tf.keras high-level API | Google, large enterprises | 2015 | |
| JAX | Python | Functional transform, auto-vectorization (vmap), JIT (jit), GPU/TPU | Google DeepMind, researchers | 2018 | |
| Flax | Python (JAX) | Linen API for neural nets on JAX | DeepMind researchers | 2020 | |
| Transformers (HF) | Hugging Face | Python | 50K+ pretrained models, easy fine-tuning, datasets, tokenizers | Most ML practitioners | 2019 |
| TensorRT | NVIDIA | Python, C++ | Inference optimization, INT8 quantization, GPU acceleration | Production ML inference | 2017 |
Large Language Models are transformer-based models trained on massive text corpora, capable of generating text, code, and performing complex reasoning tasks.
| Model | Developer | Parameters | Context Window | Key Strength | License | Pricing |
|---|---|---|---|---|---|---|
| GPT-4o | OpenAI | ~1.8T (MoE) | 128K | Best general purpose, multimodal, function calling | Proprietary | API: $2.50/1M input tokens |
| o3 / o4-mini | OpenAI | Unknown | 200K | Deep reasoning, math, coding, agentic tasks | Proprietary | o3: $2/1M input, o4-mini: cheaper |
| Claude 4 Opus | Anthropic | Unknown | 200K | Long context analysis, careful reasoning, safety | Proprietary | API: $15/1M input tokens |
| Claude 4 Sonnet | Anthropic | Unknown | 200K | Best speed/cost ratio, excellent coding | Proprietary | API: $3/1M input tokens |
| Gemini 2.5 Pro | Unknown | 1M | Massive context, multimodal (video, audio), Google Search | Proprietary | API: $1.25-$10/1M tokens | |
| Llama 3.1 405B | Meta | 405B | 128K | Largest open model, strong reasoning | Llama 3.1 License (free) | Self-hosted or cloud |
| Llama 4 Scout/Maverick | Meta | 109B-400B+ | 10M | Largest context, mixture-of-experts | Llama 4 License | Self-hosted or cloud |
| Mistral Large 2 | Mistral AI | 123B | 128K | Strong multilingual, function calling, coding | Mistral License | API: $2/1M input tokens |
| DeepSeek V3 | DeepSeek | 671B (MoE) | 128K | Open-weight, strong coding and math, MoE efficiency | MIT License | Self-hosted or API |
| Qwen 3 235B | Alibaba | 235B (MoE) | 128K | Multilingual, strong coding, thinking mode | Apache 2.0 | Self-hosted or API |
| Command R+ | Cohere | 104B | 128K | RAG-optimized, multilingual, enterprise RAG | CC-BY-NC | Enterprise API pricing |
| Model | Context Window | Approx. Words | Approx. Pages | Notes |
|---|---|---|---|---|
| GPT-4o | 128K tokens | ~96K words | ~320 pages | Standard for most tasks |
| Claude 4 | 200K tokens | ~150K words | ~500 pages | Can analyze full books, large codebases |
| Gemini 2.5 Pro | 1M tokens | ~750K words | ~2500 pages | Analyze entire books, hours of video |
| Llama 4 Maverick | 10M tokens | ~7.5M words | ~25000 pages | Largest context ever in production |
| GPT-4o mini | 128K tokens | ~96K words | ~320 pages | Cost-effective for long context |
| Gemini 2.5 Flash | 1M tokens | ~750K words | ~2500 pages | Fast, cheap, huge context |
| DeepSeek V3 | 128K tokens | ~96K words | ~320 pages | Open-weight option for long context |
| Approach | Description | Cost | Effort | When to Use |
|---|---|---|---|---|
| Prompt Engineering | Craft optimal instructions and examples; no model changes | Lowest | Low | General tasks, quick iteration, non-expert users |
| RAG (Retrieval Augmented) | Retrieve relevant documents, inject into prompt at inference time | Low-Medium | Medium | Up-to-date knowledge, domain-specific data, citeable answers |
| Fine-Tuning (LoRA/QLoRA) | Train lightweight adapters on your data; keeps base model frozen | Medium | Medium | Specific tone/style, domain jargon, consistent formatting |
| Full Fine-Tuning | Retrain all model parameters on your dataset | High | High | Entirely new capabilities, domain adaptation, research |
| Distillation | Train a smaller model to mimic a larger teacher model | Medium | High | Reduce deployment cost, edge deployment |
| Pre-Training | Train a model from scratch on massive data | Very High | Very High | New languages, entirely new domains, large orgs |
import tiktoken
# Tokenize text
enc = tiktoken.encoding_for_model("gpt-4o")
text = "Machine learning is transforming the world!"
tokens = enc.encode(text)
print(f"Tokens: {tokens}")
print(f"Count: {len(tokens)} tokens")
print(f"Decoded: {enc.decode(tokens)}")
# Count tokens for cost estimation
def estimate_cost(text: str, model: str = "gpt-4o") -> float:
enc = tiktoken.encoding_for_model(model)
n_tokens = len(enc.encode(text))
# GPT-4o pricing: $2.50/1M input, $10/1M output
input_cost = (n_tokens / 1_000_000) * 2.50
return round(input_cost, 6)
print(f"Estimated input cost: ${estimate_cost(text)}")| Embedding Model | Dimensions | Max Input | Cost / 1M tokens | Best For |
|---|---|---|---|---|
| text-embedding-3-large | 3072 | 8191 | $0.13 | High-quality retrieval, RAG |
| text-embedding-3-small | 1536 | 8191 | $0.02 | Cost-effective RAG, classification |
| Cohere Embed v3 | 1024 | 128K | $0.10 | Multilingual, RAG, reranking |
| Mistral Embed | 1024 | 32K | $0.10 | European languages, retrieval |
| BGE-large-en-v1.5 | 1024 | 512 | Free | Open-source, local deployment |
| Nomic Embed | 768 | 8192 | Free | Open-source, long context |
| Snowflake Arctic Embed | 384-1024 | 8192 | Free | Open-source, high quality |
| Vector Database | Type | Max Scale | Key Features | Best For |
|---|---|---|---|---|
| Pinecone | Managed | Billions | Serverless, sparse-dense hybrid, metadata filtering | Production RAG, no-ops |
| Weaviate | Self-hosted / Cloud | Billions | Multi-modal, GraphQL/REST, built-in modules | Flexible deployment, hybrid search |
| Qdrant | Self-hosted / Cloud | Billions | Rust-based, fast, gRPC/REST, filtering | High-performance, on-premise |
| Chroma | Self-hosted | Millions | Lightweight, Python-native, perfect for dev | Prototyping, small projects |
| Milvus | Self-hosted | Tens of Billions | Distributed, GPU-accelerated, hybrid search | Enterprise-scale, multi-modal |
| pgvector | PostgreSQL extension | Millions | Runs in PostgreSQL, familiar SQL queries | Teams already using PostgreSQL |
| Elasticsearch 8+ | Self-hosted / Cloud | Billions | Sparse + dense vectors, kNN search, aggregations | Combined keyword + vector search |
As AI becomes more powerful and pervasive, understanding ethical considerations, safety practices, and regulatory frameworks is essential for responsible development and deployment.
| Concern | Description | Example | Mitigation |
|---|---|---|---|
| Bias & Fairness | Models reflect and amplify biases in training data | Hiring model favors male candidates due to historical data | Diverse training data, fairness metrics, bias audits, debiasing techniques |
| Transparency | Black-box models make it hard to explain decisions | Loan rejection without clear explanation | Explainable AI (XAI), SHAP values, LIME, decision logs |
| Privacy | Models can memorize and leak training data | LLM regurgitates PII or copyrighted text | Differential privacy, data anonymization, memorization checks |
| Hallucinations | Models generate plausible-sounding but false information | LLM cites non-existent legal cases or research papers | RAG with source grounding, fact-checking layers, confidence thresholds |
| Deepfakes | AI-generated media used for deception | Fake video of a CEO authorizing a wire transfer | Content provenance (C2PA), detection tools, watermarks |
| Misinformation | AI scales creation and spread of false content | Bot networks generating fake news articles | AI detection tools, platform moderation, media literacy |
| Job Displacement | Automation replacing human workers | Copywriters, customer service agents, entry-level programmers | Reskilling programs, UBI discussions, human-AI collaboration |
| Safety Alignment | Models may pursue goals misaligned with human values | Model generates harmful instructions when asked cleverly | RLHF, constitutional AI, red-teaming, safety benchmarks |
| Type | Description | Example | Mitigation Strategy |
|---|---|---|---|
| Factual Hallucination | Confidently states incorrect facts | Invents a book title and author | RAG with verified sources, fact-checking pipeline |
| Reference Hallucination | Cites non-existent sources or links | Fabricates URL or academic paper | Verify all citations against source corpus |
| Arithmetic Error | Wrong calculations presented confidently | Says 17 * 23 = 401 (actual: 391) | Use code interpreter tools, external calculator |
| Logical Error | Flawed reasoning chain leads to wrong conclusion | Correct math with wrong interpretation | Chain-of-thought with verification, step checking |
| Temporal Confusion | Mixes up dates, timelines, events | Claims an event happened in the wrong year | Provide date context in prompt, verify with search |
| Regulation | Region | Status | Key Provisions | Impact on Developers |
|---|---|---|---|---|
| EU AI Act | European Union | Enacted (2024), phased rollout 2025-2027 | Risk-based tiers: Unacceptable, High, Limited, Minimal. High-risk AI needs conformity assessment. | Must classify AI systems, implement risk management, ensure transparency for AI-generated content. |
| US Executive Orders | United States | EO 14110 (Oct 2023), evolving | Safety testing for frontier models, AI watermarking standards, NIST AI RMF | Voluntary commitments for frontier model developers, sector-specific guidance. |
| UK AI Safety | United Kingdom | Pro-innovation approach (2023) | Sector-specific regulation, AI Safety Institute, no single AI law | Existing regulators (FCA, Ofcom) adapt to oversee AI in their domains. |
| China AI Regulations | China | Multiple enacted (2023-2025) | Deep synthesis rules, generative AI measures, algorithmic recommendations | Content moderation required, algorithmic transparency, real-name verification. |
| Canada AIDA | Canada | Proposed (C-27, under revision) | Responsible AI development, high-impact AI systems oversight | Impact assessment for high-impact systems, explainability requirements. |
| Brazil AI Bill | Brazil | In progress (PL 2338/2023) | Risk-based approach inspired by EU AI Act | Rights-based framework, risk classification, mandatory impact assessments. |
| Principle | Description | How to Implement |
|---|---|---|
| Fairness | AI should not discriminate or create unfair outcomes | Bias testing across demographic groups, fairness metrics (disparate impact, equal opportunity) |
| Accountability | Humans should be responsible for AI decisions | Audit trails, model cards, clear ownership, incident response plans |
| Transparency | AI decisions should be explainable | XAI tools (SHAP), model documentation, user-facing explanations |
| Privacy & Security | Protect user data throughout the AI lifecycle | Encryption, access controls, data minimization, regular security audits |
| Safety & Robustness | AI should behave safely and handle edge cases | Red-teaming, adversarial testing, failure mode analysis, graceful degradation |
| Human Oversight | Meaningful human control over AI systems | Human-in-the-loop for high-stakes decisions, override mechanisms, appeal processes |
| Sustainability | AI should consider environmental impact | Efficient model architectures, carbon-aware training, use small models when possible |
# Example: Basic content safety check before processing
import re
def check_input_safety(prompt: str) -> tuple[bool, str]:
"""Basic input validation for AI applications."""
blocked_patterns = [
r"(ignore|bypass|override)s+(previous|safety|system)",
r"(pretend|act|roleplay)s+(as|like)s+(admin|god|no one)",
r"(jailbreak|DAN|hacked)",
]
for pattern in blocked_patterns:
if re.search(pattern, prompt, re.IGNORECASE):
return False, "Input flagged by safety filter."
# Length check
if len(prompt) > 50000:
return False, "Input exceeds maximum allowed length."
return True, "Input passed safety checks."
# RAG answer grounding - require citations
def ground_answer(answer: str, sources: list[str]) -> str:
"""Ensure answer is grounded in provided sources."""
return (
f"{answer}
"
f"Sources used: {', '.join(sources[:3])}
"
f"Note: This answer is based on the provided documents "
f"and may not reflect the most current information."
)Integrating AI into your applications is now a core developer skill. This section covers APIs, frameworks, patterns, and cost optimization for building AI-powered products.
| API | Endpoint | Use Case | Model | Pricing (per 1M tokens) |
|---|---|---|---|---|
| Chat Completions | POST /v1/chat/completions | Conversational AI, tasks, agents | gpt-4o, gpt-4o-mini, o3, o4-mini | $0.15-$2.50 input, $0.60-$10 output |
| Embeddings | POST /v1/embeddings | Text embeddings for search, RAG | text-embedding-3-large/small | $0.02-$0.13 |
| Images | POST /v1/images/generations | Generate images from text | DALL-E 3, gpt-image-1 | $0.04-$0.08/image |
| Audio (TTS) | POST /v1/audio/speech | Text-to-speech | tts-1, tts-1-hd | $15/1M characters |
| Audio (STT) | POST /v1/audio/transcriptions | Speech-to-text | whisper-1 | $0.006/min |
| Moderation | POST /v1/moderations | Content safety filtering | omni-moderation-latest | Free (included) |
| Assistants | POST /v1/assistants | Stateful agents with tools | gpt-4o, gpt-4o-mini | Model pricing + $0.02/assistant/day |
| Batch API | POST /v1/chat/completions (batch file) | 50% cheaper async processing | gpt-4o, gpt-4o-mini | 50% discount on all models |
from openai import OpenAI
import json
client = OpenAI() # uses OPENAI_API_KEY env var
# ── 1. Chat Completion ──────────────────────────────
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain async/await in Python."},
],
temperature=0.7,
max_tokens=500,
)
print(response.choices[0].message.content)
# ── 2. Streaming Response ───────────────────────────
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a haiku about code."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# ── 3. Function Calling (Tool Use) ─────────────────
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
},
},
"required": ["city"],
},
},
}
]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
tool_choice="auto",
)
# Check if model wants to call a function
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print(f"Call: {tool_call.function.name}({args})")
# ── 4. Embeddings for RAG ──────────────────────────
embedding = client.embeddings.create(
model="text-embedding-3-small",
input="Machine learning is a subset of AI.",
)
print(f"Dimensions: {len(embedding.data[0].embedding)}")
print(f"First 5: {embedding.data[0].embedding[:5]}")| Framework | Focus | Key Features | Language | Best For |
|---|---|---|---|---|
| LangChain | LLM Orchestration | Chains, agents, tools, memory, RAG pipelines, 700+ integrations | Python / TypeScript | Complex LLM apps, RAG, agents |
| LlamaIndex | RAG-Focused | Data connectors, indexing, query engines, advanced RAG patterns | Python | RAG applications, document Q&A |
| Semantic Kernel | Enterprise AI (Microsoft) | Planners, plugins, connectors, .NET/Python, Azure integration | Python / C# | Microsoft enterprise, .NET shops |
| CrewAI | Multi-Agent Systems | Role-based agents, tasks, crews, collaboration patterns | Python | Multi-agent workflows, team AI |
| AutoGen (Microsoft) | Multi-Agent Conversations | Agent-to-agent chat, human-in-the-loop, code execution | Python | Research, complex multi-step tasks |
| Haystack (deepset) | NLP Pipelines | Document store, retriever, reader, generator pipelines | Python | Production NLP, search, Q&A |
| Vercel AI SDK | Full-Stack AI UI | Streaming UI, Edge runtime, React/Vue/Svelte helpers | TypeScript | Next.js AI features, chat UIs |
| Dify | Visual AI Builder | Visual workflow builder, RAG, agents, model management | Python (self-hosted) | Teams wanting no-code AI app builder |
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
# 1. Load and split documents
loader = PyPDFLoader("company-handbook.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["
", "
", ". ", " ", ""],
)
chunks = splitter.split_documents(docs)
# 2. Create vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db",
)
# 3. Create RAG chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
template = """Answer the question based on these context documents.
If unsure, say "I don't have enough information."
Context:
{context}
Question: {question}
Provide a clear, concise answer with source references."""
prompt = ChatPromptTemplate.from_template(template)
def format_docs(docs):
return "
".join(
f"[Doc {i+1}] {d.page_content}" for i, d in enumerate(docs)
)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
)
# 4. Query
answer = chain.invoke("What is the vacation policy?")
print(answer.content)| Feature | Complexity | API Needed | Impact | Implementation Tip |
|---|---|---|---|---|
| Semantic Search | Medium | Embeddings + Vector DB | High | Replace keyword search with embedding similarity for 10x better relevance |
| Chatbot / Assistant | Medium | Chat Completions | High | Start with FAQ RAG, add tools/function calling for actions |
| Content Generation | Low | Chat Completions | Medium | Blog drafts, product descriptions, email templates, summaries |
| Code Review Bot | Medium | Chat Completions + Git API | Medium | Analyze PR diffs, suggest improvements, catch bugs |
| Document Q&A | Medium | Embeddings + RAG | High | Upload docs, ask questions, get cited answers |
| Sentiment Analysis | Low | Embeddings / Classification | Medium | Analyze reviews, support tickets, social media mentions |
| Image Generation | Low | DALL-E / Stable Diffusion | Medium | Product mockups, social media graphics, design iterations |
| Text Summarization | Low | Chat Completions | High | Meeting notes, article summaries, document digests |
| Translation | Low | Chat Completions / GPT-4o | Medium | Multilingual support, localize content, real-time translation |
| Data Extraction | Medium | Chat Completions + Structured Output | High | Extract structured data from invoices, forms, emails (JSON mode) |
| Voice Interface | High | Whisper + TTS + Chat | High | Voice assistants, accessibility, hands-free interaction |
| Recommendation Engine | High | Embeddings / Collaborative Filtering | High | Personalized content, products, or features based on user behavior |
| Strategy | Savings | Description | When to Use |
|---|---|---|---|
| Use Smaller Models | 60-90% | GPT-4o-mini costs 90% less than GPT-4o. Use larger models only when needed. | Drafting, classification, simple tasks, bulk processing |
| Batch API | 50% | Submit async batch jobs for non-urgent work. 50% discount on all models. | Bulk embedding generation, processing large document sets |
| Caching | 80-99% | Cache identical queries and responses. Avoid redundant API calls. | FAQ bots, repeated user queries, leaderboards |
| Prompt Caching | Up to 50% | OpenAI caches repeated system prompts. Keep system prompt static. | Long system prompts, RAG with stable context |
| Token Optimization | 20-40% | Shorten prompts, compress context, use fewer tokens. Every token costs money. | Production systems processing many requests |
| Quantization (Local) | Hardware savings | Use 4-bit or 8-bit quantized models for self-hosted inference. | Self-hosting Llama, Mistral, or other open models |
| Semantic Caching | 70-90% | Cache semantically similar queries, not just identical ones. | Customer support, FAQ, any Q&A system |
| Rate Limiting | Variable | Prevent runaway costs from bugs or abuse. Set daily/monthly budgets. | All production AI applications |
import hashlib
import json
# ── 1. Simple Response Cache ───────────────────────
_cache: dict[str, str] = {}
def cached_completion(prompt: str, model: str = "gpt-4o-mini") -> str:
"""Cache responses to avoid duplicate API calls."""
key = hashlib.md5(f"{model}:{prompt}".encode()).hexdigest()
if key in _cache:
print("Cache hit!")
return _cache[key]
# Actual API call
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=0,
)
result = response.choices[0].message.content
_cache[key] = result
return result
# ── 2. Model Routing by Complexity ─────────────────
def route_model(prompt: str, system_prompt: str) -> str:
"""Use cheap model for simple tasks, expensive for complex."""
# Simple heuristic: use mini for short prompts
is_simple = len(prompt.split()) < 50
from openai import OpenAI
client = OpenAI()
model = "gpt-4o-mini" if is_simple else "gpt-4o"
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt},
],
temperature=0,
)
return response.choices[0].message.content
# ── 3. Structured Output (JSON Mode) ──────────────
response_schema = {
"type": "json_schema",
"json_schema": {
"name": "product_review",
"strict": True,
"schema": {
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
},
"rating": {"type": "number"},
"summary": {"type": "string"},
},
"required": ["sentiment", "rating", "summary"],
"additionalProperties": False,
},
},
}
# Use with response_format parameter in API call
# response = client.chat.completions.create(
# model="gpt-4o-mini",
# messages=[...],
# response_format=response_schema,
# )| Stage | Duration | Focus Areas | Resources |
|---|---|---|---|
| Foundation | 2-4 weeks | Python, NumPy, Pandas, basic statistics, linear algebra | Khan Academy (math), Python official docs, Kaggle Learn |
| ML Basics | 4-6 weeks | Scikit-learn, supervised/unsupervised learning, evaluation metrics | Andrew Ng Machine Learning course, Hands-On ML book |
| Deep Learning | 4-8 weeks | PyTorch, neural networks, CNNs, RNNs, training techniques | fast.ai, Andrej Karpathy YouTube, Deep Learning book |
| NLP & LLMs | 4-6 weeks | Transformers, Hugging Face, embeddings, RAG, fine-tuning | Hugging Face NLP course, LangChain docs, OpenAI cookbook |
| AI Engineering | 4-6 weeks | APIs, LangChain, vector DBs, production deployment, cost optimization | OpenAI API docs, Vercel AI SDK, production ML blogs |
| Specialization | Ongoing | Agents, multimodal AI, fine-tuning, MLOps, AI safety | Papers, conferences, community, build projects |