Machine Learning Algorithms Quick Reference - AiPro Institute™

Machine Learning Algorithms Quick Reference | AiPro Institute™

🎯 Algorithm Selection Flowchart

START: What is your learning task?

↓

Do you have labeled data?

↓

YES → Supervised Learning

Classification: Logistic Regression, SVM, Random Forest, Neural Networks
Regression: Linear Regression, Ridge, Lasso, XGBoost

NO → Unsupervised Learning

Clustering: K-Means, DBSCAN, Hierarchical
Dimensionality Reduction: PCA, t-SNE, Autoencoders

↓

Consider: Data size, interpretability needs, computational resources

📊 Supervised Learning Algorithms

Regression Algorithms

Linear Regression

Supervised Regression

Use Case: Predicting continuous values with linear relationships

Formula:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

Assumptions: Linearity, independence, homoscedasticity, normality

Pros: Simple, interpretable, fast training, probabilistic predictions

Cons: Cannot model non-linear relationships, sensitive to outliers

Best For: Feature analysis, baseline models, interpretable predictions

💡 Pro Tip: Check multicollinearity using VIF (Variance Inflation Factor). Values > 10 indicate problematic correlation.

Ridge Regression (L2 Regularization)

Supervised Regression

Use Case: Linear regression with multicollinearity

Cost Function:

J(β) = RSS + α Σβ²

Hyperparameter α: Controls regularization strength (0 to ∞)

Effect: Shrinks coefficients toward zero, reduces overfitting

Best For: High-dimensional data, correlated features

Lasso Regression (L1 Regularization)

Supervised Regression

Use Case: Feature selection with regression

Cost Function:

J(β) = RSS + α Σ|β|

Effect: Can reduce coefficients to exactly zero (feature elimination)

Best For: Feature selection, sparse models, interpretability

💡 Pro Tip: Use ElasticNet (combines L1 + L2) when you have many correlated features and need feature selection.

Polynomial Regression

Supervised Regression

Use Case: Non-linear relationships

Formula:

y = β₀ + β₁x + β₂x² + β₃x³ + ... + βₙxⁿ

Degree Selection: Start with 2-3, use cross-validation to optimize

⚠️ Warning: High degree polynomials can cause severe overfitting. Always use regularization and validate carefully.

Classification Algorithms

Logistic Regression

Supervised Classification

Use Case: Binary and multiclass classification

Activation Function:

σ(z) = 1 / (1 + e⁻ᶻ)

Output: Probability scores (0 to 1)

Pros: Probabilistic output, interpretable, fast, works well with linearly separable data

Cons: Assumes linear decision boundary, sensitive to outliers

Best For: Binary classification, baseline model, probability calibration needed

Decision Trees

Supervised Classification

Use Case: Non-linear classification/regression

Splitting Criteria:

Gini Impurity: 1 - Σpᵢ² (classification)
Entropy: -Σpᵢlog₂(pᵢ) (information gain)
MSE: Mean squared error (regression)

Key Hyperparameters:

max_depth: Maximum tree depth (3-10 typical)

min_samples_split: Min samples to split node (2-20)

min_samples_leaf: Min samples in leaf (1-10)

max_features: Features to consider for split

Pros: Highly interpretable, handles non-linear data, no feature scaling needed

Cons: Prone to overfitting, unstable, biased toward dominant classes

⚠️ Warning: Single decision trees overfit easily. Use ensemble methods (Random Forest, XGBoost) for production.

Random Forest

Ensemble Classification

Use Case: Robust classification/regression

Mechanism: Ensemble of decision trees with bagging + feature randomness

Key Hyperparameters:

n_estimators: Number of trees (100-500)

max_depth: Tree depth (10-30)

max_features: sqrt(n) for classification, n/3 for regression

min_samples_split: Minimum samples to split (2-10)

Pros: Reduces overfitting, handles high-dimensional data, provides feature importance

Cons: Less interpretable, slower prediction, larger memory footprint

Best For: Tabular data, feature importance analysis, robust predictions

💡 Pro Tip: Use Out-of-Bag (OOB) score to estimate generalization without separate validation set.

Support Vector Machine (SVM)

Supervised Classification

Use Case: High-dimensional classification

Objective: Find hyperplane that maximizes margin between classes

Kernel Functions:

Linear: K(x, y) = x·y (linearly separable data)
Polynomial: K(x, y) = (x·y + c)ᵈ (non-linear)
RBF (Gaussian): K(x, y) = exp(-γ||x-y||²) (most common)
Sigmoid: K(x, y) = tanh(αx·y + c)

Key Hyperparameters:

C: Regularization (0.1-100, higher = less regularization)

γ (gamma): RBF kernel width (0.001-10)

kernel: Type of kernel function

Pros: Effective in high dimensions, memory efficient (uses support vectors)

Cons: Slow on large datasets, requires feature scaling, difficult to interpret

Best For: Text classification, image recognition, small-to-medium datasets

Gradient Boosting (XGBoost, LightGBM, CatBoost)

Ensemble Classification

Use Case: State-of-the-art tabular data performance

Mechanism: Sequential ensemble where each tree corrects previous errors

Key Hyperparameters:

n_estimators: Number of boosting rounds (100-1000)

learning_rate: Shrinkage (0.01-0.3)

max_depth: Tree depth (3-10)

subsample: Row sampling (0.5-1.0)

colsample_bytree: Column sampling (0.5-1.0)

min_child_weight: Minimum sum of weights (1-10)

Comparison:

XGBoost: Most popular, handles missing values, L1/L2 regularization
LightGBM: Fastest, leaf-wise growth, best for large datasets
CatBoost: Best for categorical features, robust to overfitting

Best For: Kaggle competitions, structured data, feature engineering

💡 Pro Tip: Use early stopping with validation set to prevent overfitting. Monitor train/val loss divergence.

K-Nearest Neighbors (KNN)

Supervised Classification

Use Case: Instance-based learning, pattern recognition

Mechanism: Classify based on k nearest neighbors (majority vote)

Distance Metrics:

Euclidean: √Σ(xᵢ - yᵢ)² (most common)
Manhattan: Σ|xᵢ - yᵢ|
Minkowski: (Σ|xᵢ - yᵢ|ᵖ)^(1/p)
Cosine: 1 - (x·y)/(||x|| ||y||)

Choosing K: Odd numbers for binary classification, use cross-validation (typical: 3-11)

Pros: Simple, no training phase, naturally handles multi-class

Cons: Slow prediction on large datasets, curse of dimensionality, requires feature scaling

⚠️ Warning: Always scale features! KNN is distance-based and sensitive to feature magnitudes.

Naive Bayes

Supervised Classification

Use Case: Text classification, spam detection

Formula (Bayes' Theorem):

P(y|X) = P(X|y) × P(y) / P(X)

Variants:

Gaussian NB: Continuous features (assumes normal distribution)
Multinomial NB: Discrete counts (text, word frequencies)
Bernoulli NB: Binary features (presence/absence)

Assumption: Features are conditionally independent (often violated but works well)

Pros: Fast, works with high dimensions, requires little training data

Best For: Text classification, spam filtering, sentiment analysis

Neural Networks

Feedforward Neural Network (MLP)

Supervised Deep Learning

Use Case: Complex non-linear patterns

Architecture: Input layer → Hidden layers → Output layer

Activation Functions:

ReLU: max(0, x) - Most common for hidden layers
Sigmoid: 1/(1+e⁻ˣ) - Binary classification output
Softmax: eˣⁱ/Σeˣʲ - Multi-class classification output
Tanh: (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) - Alternative to sigmoid
Leaky ReLU: max(0.01x, x) - Fixes dying ReLU problem

Key Hyperparameters:

hidden_layers: Number and size [64, 32, 16]

learning_rate: 0.001-0.1 (Adam: 0.001)

batch_size: 32, 64, 128, 256

dropout: 0.2-0.5 (prevent overfitting)

optimizer: Adam, SGD, RMSprop

epochs: 50-500 with early stopping

Regularization Techniques:

Dropout: Randomly disable neurons during training
L1/L2: Weight penalty in loss function
Batch Normalization: Normalize layer inputs
Early Stopping: Stop when validation loss stops improving

💡 Pro Tip: Start with 2-3 hidden layers. Add more only if underfitting. Use He initialization for ReLU, Xavier for tanh/sigmoid.

Convolutional Neural Network (CNN)

Supervised Deep Learning

Use Case: Image recognition, computer vision

Key Layers:

Convolutional: Feature extraction with filters
Pooling: Downsampling (Max/Average pooling)
Fully Connected: Classification layer

Common Architectures: VGG, ResNet, Inception, EfficientNet

Best For: Image classification, object detection, facial recognition

Recurrent Neural Network (RNN/LSTM/GRU)

Supervised Deep Learning

Use Case: Sequential data, time series

Variants:

Simple RNN: Vanishing gradient problem
LSTM: Long Short-Term Memory (solves vanishing gradient)
GRU: Gated Recurrent Unit (faster than LSTM)

Best For: NLP, time series forecasting, speech recognition

🔍 Unsupervised Learning Algorithms

Clustering Algorithms

K-Means Clustering

Unsupervised Clustering

Use Case: Customer segmentation, pattern grouping

Algorithm:

Initialize k random centroids
Assign each point to nearest centroid
Update centroids to mean of assigned points
Repeat until convergence

Choosing K (Elbow Method):

Plot Within-Cluster Sum of Squares (WCSS) vs K
WCSS = ΣΣ||x - μₖ||²

Other K Selection Methods:

Silhouette Score: (-1 to 1, higher is better)
Gap Statistic: Compare WCSS to random data
Davies-Bouldin Index: Lower is better

Pros: Fast, simple, scalable to large datasets

Cons: Must specify K, sensitive to outliers, assumes spherical clusters

⚠️ Warning: K-Means is sensitive to initialization. Use k-means++ initialization or run multiple times with different seeds.

Hierarchical Clustering

Unsupervised Clustering

Use Case: Taxonomy creation, hierarchical relationships

Types:

Agglomerative (bottom-up): Start with individual points, merge clusters
Divisive (top-down): Start with one cluster, split recursively

Linkage Methods:

Single: Minimum distance between clusters
Complete: Maximum distance between clusters
Average: Average distance between all pairs
Ward: Minimize variance (most common)

Output: Dendrogram (tree diagram showing cluster hierarchy)

Pros: Don't need to specify K, produces dendrogram

Cons: Slow (O(n³)), sensitive to noise and outliers

DBSCAN (Density-Based Spatial Clustering)

Unsupervised Clustering

Use Case: Arbitrary-shaped clusters, outlier detection

Parameters:

ε (epsilon): Maximum distance for neighborhood
MinPts: Minimum points to form dense region

Point Types:

Core: Has ≥ MinPts within ε
Border: Within ε of core point but has < MinPts
Noise: Neither core nor border (outlier)

Pros: Finds arbitrary shapes, robust to outliers, automatically determines number of clusters

Cons: Struggles with varying densities, sensitive to ε and MinPts

💡 Pro Tip: Use k-distance graph to choose ε. Plot sorted k-distances and find the "elbow" point.

Gaussian Mixture Model (GMM)

Unsupervised Clustering

Use Case: Soft clustering, probability-based grouping

Mechanism: Assumes data generated from mixture of Gaussian distributions

Algorithm: Expectation-Maximization (EM)

E-step: Calculate probability of each point belonging to each cluster
M-step: Update Gaussian parameters (mean, covariance)

Output: Soft assignments (probability distribution over clusters)

Pros: Soft clustering, can model elliptical clusters

Best For: When cluster membership is uncertain, probabilistic assignments needed

Dimensionality Reduction

Principal Component Analysis (PCA)

Unsupervised Dim Reduction

Use Case: Feature reduction, data visualization

Mechanism: Linear transformation to maximize variance

Steps:

Standardize data (mean=0, std=1)
Compute covariance matrix
Calculate eigenvectors and eigenvalues
Select top K eigenvectors (principal components)
Transform data to new feature space

Choosing K Components:

Retain components explaining 95% variance
Use scree plot (elbow method)
Cross-validation with downstream task

Pros: Removes correlation, reduces overfitting, fast

Cons: Linear only, loses interpretability

Best For: Visualization (reduce to 2-3D), preprocessing for ML

t-SNE (t-Distributed Stochastic Neighbor Embedding)

Unsupervised Visualization

Use Case: High-dimensional data visualization

Mechanism: Non-linear dimensionality reduction preserving local structure

Key Hyperparameter:

Perplexity: 5-50 (typical), balance between local/global structure
Learning rate: 10-1000
Iterations: 1000-5000

Pros: Excellent for visualization, reveals clusters

Cons: Slow, non-deterministic, cannot transform new data

⚠️ Warning: t-SNE is for visualization ONLY. Don't use output features for modeling (distances are meaningless).

UMAP (Uniform Manifold Approximation and Projection)

Unsupervised Dim Reduction

Use Case: Faster alternative to t-SNE

Advantages over t-SNE:

Faster (10-100x)
Preserves global structure better
Can transform new data
Scales to larger datasets

Best For: Large-scale visualization, embedding generation

Autoencoders

Unsupervised Deep Learning

Use Case: Non-linear dimensionality reduction, anomaly detection

Architecture:

Encoder: Compress input to latent representation
Bottleneck: Low-dimensional latent space
Decoder: Reconstruct original input

Variants:

Variational (VAE): Generative model, produces probability distributions
Denoising: Trained to reconstruct from corrupted input
Sparse: Encourages sparse activations

Best For: Feature learning, anomaly detection (high reconstruction error = anomaly)

🎮 Reinforcement Learning Algorithms

Core RL Concepts

Components:

Agent: Decision maker
Environment: World agent interacts with
State (s): Current situation
Action (a): Possible moves
Reward (r): Feedback signal
Policy (π): Strategy for selecting actions

Goal: Maximize cumulative reward over time

Q-Learning

Reinforcement

Use Case: Discrete action spaces, model-free learning

Q-Function Update:

Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)]

α: Learning rate
γ: Discount factor (future reward importance)
r: Immediate reward

Exploration vs Exploitation: ε-greedy strategy (explore with probability ε)

Deep Q-Network (DQN)

Reinforcement Deep Learning

Use Case: High-dimensional state spaces (images, complex games)

Innovation: Neural network approximates Q-function

Key Techniques:

Experience Replay: Store and sample past experiences
Target Network: Separate network for stable targets
Frame Stacking: Use multiple frames to capture motion

Best For: Atari games, robotic control

Policy Gradient Methods (REINFORCE, A3C, PPO)

Reinforcement

Use Case: Continuous action spaces, stochastic policies

Approach: Directly optimize policy π(a|s) instead of Q-values

Popular Algorithms:

PPO (Proximal Policy Optimization): Most popular, stable training
A3C (Asynchronous Advantage Actor-Critic): Parallel training
SAC (Soft Actor-Critic): Maximum entropy RL

Best For: Robotics, continuous control, game AI

📊 Performance Metrics by Task Type

Classification Metrics

Accuracy

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Use When: Balanced classes

Avoid When: Imbalanced datasets

Precision

Precision = TP / (TP + FP)

Use When: False positives are costly (spam detection)

Interpretation: "Of predicted positives, how many are correct?"

Recall (Sensitivity)

Recall = TP / (TP + FN)

Use When: False negatives are costly (cancer detection)

Interpretation: "Of actual positives, how many did we find?"

F1-Score

F1 = 2 × (Precision × Recall) / (Precision + Recall)

Use When: Need balance between precision and recall

Best For: Imbalanced datasets

ROC-AUC

Area Under ROC Curve
(TPR vs FPR)

Use When: Evaluating ranking quality

Range: 0.5 (random) to 1.0 (perfect)

Log Loss

-1/N Σ[y log(p) + (1-y) log(1-p)]

Use When: Probability calibration matters

Best For: Multi-class problems

💡 Pro Tip: For imbalanced data, use F1-Score, Precision-Recall AUC, or Matthews Correlation Coefficient (MCC) instead of accuracy.

Regression Metrics

Mean Absolute Error (MAE)

MAE = 1/n Σ|yᵢ - ŷᵢ|

Use When: Outliers present, want interpretable error

Units: Same as target variable

Mean Squared Error (MSE)

MSE = 1/n Σ(yᵢ - ŷᵢ)²

Use When: Want to penalize large errors heavily

Note: Sensitive to outliers

Root Mean Squared Error (RMSE)

RMSE = √MSE

Use When: Need interpretable units like MAE

Advantage: Same units as target

R² Score (Coefficient of Determination)

R² = 1 - (SS_res / SS_tot)

Range: -∞ to 1 (1 = perfect fit)

Use When: Comparing models, understanding variance explained

Mean Absolute Percentage Error (MAPE)

MAPE = 100/n Σ|yᵢ - ŷᵢ|/|yᵢ|

Use When: Need scale-independent metric

Warning: Undefined when yᵢ = 0

Adjusted R²

R²_adj = 1 - [(1-R²)(n-1)/(n-p-1)]

Use When: Comparing models with different feature counts

Advantage: Penalizes unnecessary features

Clustering Metrics

Silhouette Score

s = (b - a) / max(a, b)

Range: -1 to 1 (higher is better)

Use When: Evaluating cluster quality

Davies-Bouldin Index

DB = 1/k Σ max[(σᵢ + σⱼ)/d(cᵢ,cⱼ)]

Range: 0 to ∞ (lower is better)

Use When: Comparing different K values

Calinski-Harabasz Index

CH = (SSB/SSW) × [(n-k)/(k-1)]

Range: 0 to ∞ (higher is better)

Best For: Dense, well-separated clusters

⚠️ Common Pitfalls & Solutions

Problem	Symptoms	Solutions
Overfitting	High training accuracy, low test accuracy; large gap between train/val loss	• Increase training data • Reduce model complexity • Apply regularization (L1/L2, dropout) • Early stopping • Cross-validation
Underfitting	Low training and test accuracy; high bias	• Increase model complexity • Add more features • Reduce regularization • Train longer • Try non-linear models
Class Imbalance	High accuracy but poor minority class performance	• SMOTE or oversampling • Class weights • Stratified sampling • Use F1-score instead of accuracy • Ensemble methods
Data Leakage	Unrealistically high performance; test accuracy > train accuracy	• Separate test set before any processing • Fit preprocessing on train only • Check for future information in features • Time-based splits for time series
Vanishing Gradients	Deep networks stop learning; weights don't update	• Use ReLU instead of sigmoid/tanh • Batch normalization • Residual connections (ResNet) • Gradient clipping • Better initialization
Exploding Gradients	NaN or Inf losses; unstable training	• Gradient clipping • Lower learning rate • Batch normalization • Weight regularization
Feature Scaling Issues	Slow convergence; poor performance with distance-based algorithms	• StandardScaler (mean=0, std=1) • MinMaxScaler (0-1 range) • RobustScaler (use median, resistant to outliers) • Required for: SVM, KNN, Neural Networks, PCA
Curse of Dimensionality	Too many features; poor generalization; distance metrics break down	• Feature selection (Lasso, feature importance) • Dimensionality reduction (PCA, UMAP) • Regularization • More training data

⚖️ Algorithm Comparison Table

Algorithm	Training Speed	Prediction Speed	Interpretability	Handles Non-linearity	Scales to Big Data
Linear Regression	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	❌	⭐⭐⭐⭐⭐
Logistic Regression	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	❌	⭐⭐⭐⭐⭐
Decision Trees	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	✅	⭐⭐⭐
Random Forest	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	✅	⭐⭐⭐⭐
XGBoost	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐	✅	⭐⭐⭐⭐⭐
SVM	⭐⭐	⭐⭐⭐	⭐⭐	✅ (with kernels)	⭐⭐
KNN	⭐⭐⭐⭐⭐ (no training)	⭐⭐	⭐⭐⭐⭐	✅	⭐
Neural Networks	⭐	⭐⭐⭐⭐	⭐	✅	⭐⭐⭐⭐⭐
K-Means	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	❌	⭐⭐⭐⭐
DBSCAN	⭐⭐⭐	N/A	⭐⭐⭐	✅	⭐⭐

🎛️ Hyperparameter Tuning Guide

Tuning Strategies

Grid Search

Approach: Exhaustive search over specified parameter values

Pros: Guaranteed to find best combination in grid

Cons: Exponentially slow with more parameters

Best For: Small parameter space, computational resources available

Random Search

Approach: Random combinations from parameter distributions

Pros: More efficient than grid search, explores broader space

Cons: May miss optimal combination

Best For: Large parameter space, limited time

💡 Pro Tip: Random search often outperforms grid search with the same computation budget.

Bayesian Optimization

Approach: Use previous results to inform next parameter choices

Tools: Optuna, Hyperopt, Scikit-Optimize

Pros: Most efficient, learns from past trials

Best For: Expensive models (neural networks), complex parameter spaces

⚠️ Warning: Always use cross-validation during hyperparameter tuning to avoid overfitting to validation set.

💻 Quick Reference Code Snippets

Scikit-learn Template

# Complete ML Pipeline Template
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# 1. Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 2. Scale features (fit on train only!)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 3. Train model
model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
model.fit(X_train_scaled, y_train)

# 4. Cross-validation
cv_scores = cross_val_score(model, X_train_scaled, y_train, cv=5)
print(f"CV Score: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")

# 5. Evaluate
y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
                

Model Selection Template

from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [5, 10, 15, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Grid search with cross-validation
grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring='f1_macro',
    n_jobs=-1,
    verbose=1
)

grid_search.fit(X_train_scaled, y_train)

print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.3f}")

# Use best model
best_model = grid_search.best_estimator_
                

🤖 Machine Learning Algorithms Quick Reference

Regression Algorithms

Linear Regression

Ridge Regression (L2 Regularization)

Lasso Regression (L1 Regularization)

Polynomial Regression

Classification Algorithms

Logistic Regression

Decision Trees

Random Forest

Support Vector Machine (SVM)

Gradient Boosting (XGBoost, LightGBM, CatBoost)

K-Nearest Neighbors (KNN)

Naive Bayes

Neural Networks

Feedforward Neural Network (MLP)

Convolutional Neural Network (CNN)

Recurrent Neural Network (RNN/LSTM/GRU)

Clustering Algorithms

K-Means Clustering

Hierarchical Clustering

DBSCAN (Density-Based Spatial Clustering)

Gaussian Mixture Model (GMM)

Dimensionality Reduction

Principal Component Analysis (PCA)

t-SNE (t-Distributed Stochastic Neighbor Embedding)

UMAP (Uniform Manifold Approximation and Projection)

Autoencoders

Core RL Concepts

Q-Learning

Deep Q-Network (DQN)

Policy Gradient Methods (REINFORCE, A3C, PPO)

Classification Metrics

Accuracy

Precision

Recall (Sensitivity)

F1-Score

ROC-AUC

Log Loss

Regression Metrics

Mean Absolute Error (MAE)

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

R² Score (Coefficient of Determination)

Mean Absolute Percentage Error (MAPE)

Adjusted R²

Clustering Metrics

Silhouette Score

Davies-Bouldin Index

Calinski-Harabasz Index

Tuning Strategies

Grid Search

Random Search

Bayesian Optimization

Scikit-learn Template

Model Selection Template

Author: aiinstituteadmin

Related Posts

Leave a Reply Cancel reply

Empower People with AI Education

Support