# Supervised Learning

## 📚 Overview

Supervised Learning adalah paradigma machine learning dimana model belajar dari data yang sudah diberi label (ground truth). Model belajar mapping dari input features ke output target berdasarkan contoh-contoh yang diberikan.

## 🎯 Types of Supervised Learning

### 1. **Classification**

Memprediksi kategori atau class dari input data.

**Binary Classification:**

* Spam vs Non-spam email
* Fraud vs Non-fraud transaction
* Sick vs Healthy patient

**Multi-class Classification:**

* Image classification (cat, dog, bird, etc.)
* Text categorization (news, sports, politics, etc.)
* Disease diagnosis (diabetes, heart disease, cancer, etc.)

### 2. **Regression**

Memprediksi nilai numerik kontinu dari input data.

**Examples:**

* House price prediction
* Stock price forecasting
* Temperature prediction
* Sales forecasting

## 🚀 Popular Algorithms

### Classification Algorithms

#### 1. **Logistic Regression**

Linear model untuk binary classification dengan sigmoid activation.

```python
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Prepare data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train model
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X_train_scaled, y_train)

# Predict
y_pred = model.predict(X_test_scaled)
y_pred_proba = model.predict_proba(X_test_scaled)
```

**Pros:**

* Interpretable coefficients
* Fast training and prediction
* Good baseline model
* Handles multicollinearity

**Cons:**

* Assumes linear relationship
* Sensitive to outliers
* Limited to linear decision boundaries

**Use Cases:**

* Medical diagnosis
* Credit scoring
* Marketing response prediction

#### 2. **Decision Trees**

Tree-based model yang membuat decisions berdasarkan feature thresholds.

```python
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt

# Train model
model = DecisionTreeClassifier(
    max_depth=5,
    min_samples_split=10,
    min_samples_leaf=5,
    random_state=42
)
model.fit(X_train, y_train)

# Visualize tree
plt.figure(figsize=(20,10))
plot_tree(model, feature_names=feature_names, class_names=class_names, filled=True)
plt.show()
```

**Pros:**

* Easy to interpret
* Handles non-linear relationships
* No feature scaling needed
* Can handle mixed data types

**Cons:**

* Prone to overfitting
* Unstable (small changes can create very different trees)
* Can create biased trees with imbalanced classes

**Use Cases:**

* Customer segmentation
* Medical diagnosis
* Credit risk assessment

#### 3. **Random Forest**

Ensemble method yang combines multiple decision trees.

```python
from sklearn.ensemble import RandomForestClassifier

# Train model
model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=42
)
model.fit(X_train, y_train)

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_names,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
```

**Pros:**

* Reduces overfitting
* Handles missing values
* Provides feature importance
* Robust to outliers

**Cons:**

* Less interpretable than single trees
* Can be computationally expensive
* May not capture complex interactions

**Use Cases:**

* Predictive modeling
* Feature selection
* Anomaly detection

#### 4. **Support Vector Machines (SVM)**

Finds optimal hyperplane to separate classes with maximum margin.

```python
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler

# Scale features (important for SVM)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train model
model = SVC(
    kernel='rbf',  # 'linear', 'poly', 'rbf', 'sigmoid'
    C=1.0,        # Regularization parameter
    gamma='scale', # Kernel coefficient
    random_state=42
)
model.fit(X_train_scaled, y_train)
```

**Pros:**

* Effective in high-dimensional spaces
* Memory efficient
* Versatile with different kernels
* Good generalization

**Cons:**

* Sensitive to feature scaling
* Computationally expensive for large datasets
* Difficult to interpret with non-linear kernels

**Use Cases:**

* Text classification
* Image classification
* Bioinformatics

### Regression Algorithms

#### 1. **Linear Regression**

Models linear relationship between features and target.

```python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Coefficients
coefficients = pd.DataFrame({
    'feature': feature_names,
    'coefficient': model.coef_
}).sort_values('coefficient', key=abs, ascending=False)
```

#### 2. **Ridge Regression**

Linear regression with L2 regularization.

```python
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

# Hyperparameter tuning
param_grid = {'alpha': [0.001, 0.01, 0.1, 1, 10, 100]}
ridge = Ridge(random_state=42)
grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train_scaled, y_train)

# Best model
best_ridge = grid_search.best_estimator_
```

#### 3. **Lasso Regression**

Linear regression with L1 regularization (feature selection).

```python
from sklearn.linear_model import Lasso

# Train model
model = Lasso(alpha=0.1, random_state=42)
model.fit(X_train_scaled, y_train)

# Feature selection
selected_features = feature_names[model.coef_ != 0]
print(f"Selected features: {selected_features}")
```

## 📊 Model Evaluation

### Classification Metrics

```python
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_auc_score
)

# Basic metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)

# ROC AUC (for binary classification)
if len(np.unique(y_test)) == 2:
    roc_auc = roc_auc_score(y_test, y_pred_proba[:, 1])

# Detailed report
print(classification_report(y_test, y_pred))
```

### Regression Metrics

```python
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Calculate metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"MAE: {mae:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"R²: {r2:.4f}")
```

### Cross-Validation

```python
from sklearn.model_selection import cross_val_score, StratifiedKFold

# For classification
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(model, X, y, cv=cv, scoring='accuracy')

print(f"CV Accuracy: {cv_scores.mean():.3f} (+/- {cv_scores.std() * 2:.3f})")

# For regression
cv_scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')
rmse_scores = np.sqrt(-cv_scores)
print(f"CV RMSE: {rmse_scores.mean():.3f} (+/- {rmse_scores.std() * 2:.3f})")
```

## 🔧 Hyperparameter Tuning

### Grid Search

```python
from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Grid search
grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

# Best parameters
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.3f}")
```

### Random Search

```python
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

# Define parameter distributions
param_dist = {
    'n_estimators': randint(50, 300),
    'max_depth': randint(3, 20),
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 10)
}

# Random search
random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_dist,
    n_iter=100,
    cv=5,
    scoring='accuracy',
    random_state=42,
    n_jobs=-1
)

random_search.fit(X_train, y_train)
```

## 🚀 Best Practices

### 1. **Data Preprocessing**

* Handle missing values appropriately
* Scale numerical features
* Encode categorical variables
* Remove or handle outliers
* Feature engineering

### 2. **Model Selection**

* Start with simple models (linear regression, logistic regression)
* Use cross-validation for model comparison
* Consider ensemble methods for better performance
* Balance between bias and variance

### 3. **Feature Engineering**

* Create interaction features
* Polynomial features for non-linear relationships
* Domain-specific features
* Feature selection to reduce dimensionality

### 4. **Overfitting Prevention**

* Use regularization techniques
* Cross-validation
* Early stopping
* Data augmentation
* Ensemble methods

### 5. **Evaluation Strategy**

* Hold-out test set
* Cross-validation
* Multiple metrics
* Business context consideration

## 📚 References & Resources

### 📖 Books

* [**"Introduction to Statistical Learning"**](https://www.statlearning.com/) by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
* [**"Pattern Recognition and Machine Learning"**](https://www.microsoft.com/en-us/research/people/cmbishop/) by Christopher Bishop
* [**"The Elements of Statistical Learning"**](https://web.stanford.edu/~hastie/ElemStatLearn/) by Trevor Hastie, Robert Tibshirani, Jerome Friedman

### 🎓 Courses

* [**Coursera Machine Learning**](https://www.coursera.org/learn/machine-learning) by Andrew Ng
* [**Stanford CS229**](https://cs229.stanford.edu/) - Machine Learning Course
* [**MIT 6.036**](https://introml.mit.edu/) - Introduction to Machine Learning

### 📰 Research Papers

* [**"Random Forests"**](https://link.springer.com/article/10.1023/A:1010933404324) by Leo Breiman
* [**"Support Vector Networks"**](https://link.springer.com/article/10.1023/A:1022627411411) by Corinna Cortes and Vladimir Vapnik
* [**"Classification and Regression Trees"**](https://www.taylorfrancis.com/books/mono/10.1201/9781315139470/classification-regression-trees-leo-breiman-jerome-friedman-richard-olshen-charles-stone) by Leo Breiman

### 🐙 GitHub Repositories

* [**Scikit-learn Examples**](https://github.com/scikit-learn/scikit-learn/tree/main/examples)
* [**ML-From-Scratch**](https://github.com/eriklindernoren/ML-From-Scratch) - Implementation of ML algorithms
* [**Awesome Machine Learning**](https://github.com/josephmisiti/awesome-machine-learning) - Curated ML resources

### 📊 Datasets

* [**UCI Machine Learning Repository**](https://archive.ics.uci.edu/ml/)
* [**Kaggle Datasets**](https://www.kaggle.com/datasets)
* [**Scikit-learn Built-in Datasets**](https://scikit-learn.org/stable/datasets.html)

## 🔗 Related Topics

* [🧠 ML Fundamentals](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals)
* [🎯 Unsupervised Learning](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals/unsupervised-learning)
* [🔄 Reinforcement Learning](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals/reinforcement-learning)
* [🐍 Python ML Tools](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/python-ml)

***

*Last updated: December 2024* *Contributors: \[Your Name]*