# Deep Learning

## 📚 Overview

Deep Learning adalah subset dari Machine Learning yang menggunakan neural networks dengan multiple layers (deep neural networks) untuk belajar hierarchical representations dari data. Deep Learning telah merevolusi berbagai domain seperti computer vision, natural language processing, speech recognition, dan banyak lagi.

## 🧠 Neural Network Fundamentals

### 1. **Basic Structure**

Neural network terdiri dari:

* **Input Layer**: Menerima input data
* **Hidden Layers**: Memproses dan transform data
* **Output Layer**: Menghasilkan prediksi

```
Input Layer → Hidden Layer 1 → Hidden Layer 2 → ... → Output Layer
```

### 2. **Neurons (Nodes)**

Setiap neuron menerima input, menerapkan activation function, dan menghasilkan output.

```python
import numpy as np

class Neuron:
    def __init__(self, input_size):
        self.weights = np.random.randn(input_size) * 0.01
        self.bias = 0.0
    
    def forward(self, inputs):
        # Linear combination
        z = np.dot(inputs, self.weights) + self.bias
        # Activation function (ReLU)
        return np.maximum(0, z)
    
    def update_weights(self, new_weights, new_bias):
        self.weights = new_weights
        self.bias = new_bias

# Example usage
neuron = Neuron(input_size=3)
inputs = np.array([1.0, 2.0, 3.0])
output = neuron.forward(inputs)
print(f"Neuron output: {output}")
```

### 3. **Activation Functions**

Fungsi yang menentukan output dari neuron.

```python
import matplotlib.pyplot as plt

def plot_activation_functions():
    x = np.linspace(-5, 5, 100)
    
    # ReLU
    relu = np.maximum(0, x)
    
    # Sigmoid
    sigmoid = 1 / (1 + np.exp(-x))
    
    # Tanh
    tanh = np.tanh(x)
    
    # Leaky ReLU
    leaky_relu = np.where(x > 0, x, 0.01 * x)
    
    fig, axes = plt.subplots(2, 2, figsize=(12, 8))
    
    axes[0, 0].plot(x, relu)
    axes[0, 0].set_title('ReLU')
    axes[0, 0].grid(True)
    
    axes[0, 1].plot(x, sigmoid)
    axes[0, 1].set_title('Sigmoid')
    axes[0, 1].grid(True)
    
    axes[1, 0].plot(x, tanh)
    axes[1, 0].set_title('Tanh')
    axes[1, 0].grid(True)
    
    axes[1, 1].plot(x, leaky_relu)
    axes[1, 1].set_title('Leaky ReLU')
    axes[1, 1].grid(True)
    
    plt.tight_layout()
    plt.show()

plot_activation_functions()
```

## 🏗️ Neural Network Architectures

### 1. **Feedforward Neural Networks (FNN)**

Basic neural network dengan forward connections.

```python
import torch
import torch.nn as nn
import torch.optim as optim

class FeedforwardNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(FeedforwardNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)
    
    def forward(self, x):
        x = self.dropout(self.relu(self.fc1(x)))
        x = self.dropout(self.relu(self.fc2(x)))
        x = self.fc3(x)
        return x

# Example usage
model = FeedforwardNN(input_size=784, hidden_size=128, output_size=10)
print(model)

# Generate sample data
X = torch.randn(100, 784)
y = torch.randint(0, 10, (100,))

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(10):
    optimizer.zero_grad()
    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()
    
    if epoch % 2 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
```

### 2. **Convolutional Neural Networks (CNNs)**

Specialized untuk processing grid-like data seperti images.

```python
class CNN(nn.Module):
    def __init__(self, num_classes=10):
        super(CNN, self).__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        
        # Pooling layers
        self.pool = nn.MaxPool2d(2, 2)
        
        # Fully connected layers
        self.fc1 = nn.Linear(128 * 3 * 3, 512)
        self.fc2 = nn.Linear(512, num_classes)
        
        # Dropout for regularization
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        # Convolutional layers
        x = self.pool(torch.relu(self.conv1(x)))  # 28x28 -> 14x14
        x = self.pool(torch.relu(self.conv2(x)))  # 14x14 -> 7x7
        x = self.pool(torch.relu(self.conv3(x)))  # 7x7 -> 3x3
        
        # Flatten
        x = x.view(-1, 128 * 3 * 3)
        
        # Fully connected layers
        x = self.dropout(torch.relu(self.fc1(x)))
        x = self.fc2(x)
        
        return x

# Example usage
model = CNN(num_classes=10)
print(model)

# Generate sample image data (batch_size, channels, height, width)
X = torch.randn(32, 1, 28, 28)
y = torch.randint(0, 10, (32,))

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(5):
    optimizer.zero_grad()
    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()
    
    print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
```

### 3. **Recurrent Neural Networks (RNNs)**

Designed untuk sequential data processing.

```python
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        # x shape: (batch_size, sequence_length, input_size)
        rnn_out, hidden = self.rnn(x)
        
        # Use last output for classification
        out = self.fc(rnn_out[:, -1, :])
        return out

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=2):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, 
                           batch_first=True, dropout=0.2)
        self.fc = nn.Linear(hidden_size, output_size)
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        # Initialize hidden state
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
        
        # Forward propagate LSTM
        lstm_out, _ = self.lstm(x, (h0, c0))
        
        # Use last output
        out = self.dropout(lstm_out[:, -1, :])
        out = self.fc(out)
        return out

# Example usage
input_size = 10
hidden_size = 64
output_size = 5
sequence_length = 20

model = LSTM(input_size, hidden_size, output_size)
print(model)

# Generate sample sequential data
X = torch.randn(32, sequence_length, input_size)
y = torch.randint(0, output_size, (32,))

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(5):
    optimizer.zero_grad()
    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()
    
    print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
```

### 4. **Transformers**

Modern architecture untuk sequence processing dengan attention mechanisms.

```python
class MultiHeadAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        self.num_heads = num_heads
        self.d_model = d_model
        
        assert d_model % num_heads == 0
        
        self.depth = d_model // num_heads
        
        self.wq = nn.Linear(d_model, d_model)
        self.wk = nn.Linear(d_model, d_model)
        self.wv = nn.Linear(d_model, d_model)
        
        self.dense = nn.Linear(d_model, d_model)
    
    def split_heads(self, x, batch_size):
        x = x.view(batch_size, -1, self.num_heads, self.depth)
        return x.permute(0, 2, 1, 3)
    
    def forward(self, value, key, query, mask):
        batch_size = query.size(0)
        
        # Linear transformations
        Q = self.wq(query)
        K = self.wk(key)
        V = self.wv(value)
        
        # Split into multiple heads
        Q = self.split_heads(Q, batch_size)
        K = self.split_heads(K, batch_size)
        V = self.split_heads(V, batch_size)
        
        # Scaled dot-product attention
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.depth)
        
        if mask is not None:
            scores += mask
        
        attention_weights = torch.softmax(scores, dim=-1)
        
        # Apply attention to values
        context = torch.matmul(attention_weights, V)
        
        # Concatenate heads
        context = context.permute(0, 2, 1, 3).contiguous()
        context = context.view(batch_size, -1, self.d_model)
        
        # Final linear transformation
        output = self.dense(context)
        
        return output, attention_weights

class TransformerBlock(nn.Module):
    def __init__(self, d_model, num_heads, dff, rate=0.1):
        super(TransformerBlock, self).__init__()
        
        self.attention = MultiHeadAttention(d_model, num_heads)
        self.ffn = nn.Sequential(
            nn.Linear(d_model, dff),
            nn.ReLU(),
            nn.Linear(dff, d_model)
        )
        
        self.layernorm1 = nn.LayerNorm(d_model, eps=1e-6)
        self.layernorm2 = nn.LayerNorm(d_model, eps=1e-6)
        
        self.dropout1 = nn.Dropout(rate)
        self.dropout2 = nn.Dropout(rate)
    
    def forward(self, x, training, mask):
        # Multi-head attention
        attn_output, _ = self.attention(x, x, x, mask)
        attn_output = self.dropout1(attn_output)
        out1 = self.layernorm1(x + attn_output)
        
        # Feed forward network
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output)
        out2 = self.layernorm2(out1 + ffn_output)
        
        return out2

# Example usage
d_model = 128
num_heads = 8
dff = 512
seq_length = 50

model = TransformerBlock(d_model, num_heads, dff)
print(model)

# Generate sample data
X = torch.randn(32, seq_length, d_model)
mask = torch.zeros(32, seq_length, seq_length)

# Forward pass
output = model(X, training=True, mask=mask)
print(f"Input shape: {X.shape}")
print(f"Output shape: {output.shape}")
```

## 🔧 Training Deep Neural Networks

### 1. **Loss Functions**

Fungsi yang mengukur error antara predictions dan actual values.

```python
# Classification losses
criterion_ce = nn.CrossEntropyLoss()
criterion_bce = nn.BCELoss()
criterion_bce_logits = nn.BCEWithLogitsLoss()

# Regression losses
criterion_mse = nn.MSELoss()
criterion_mae = nn.L1Loss()
criterion_huber = nn.HuberLoss()

# Example usage
predictions = torch.randn(32, 10)
targets = torch.randint(0, 10, (32,))

loss_ce = criterion_ce(predictions, targets)
print(f"Cross Entropy Loss: {loss_ce.item():.4f}")

# For regression
predictions_reg = torch.randn(32, 1)
targets_reg = torch.randn(32, 1)

loss_mse = criterion_mse(predictions_reg, targets_reg)
print(f"MSE Loss: {loss_mse.item():.4f}")
```

### 2. **Optimizers**

Algorithms untuk updating network weights.

```python
# Different optimizers
optimizer_sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer_adam = optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999))
optimizer_adamw = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)
optimizer_rmsprop = optim.RMSprop(model.parameters(), lr=0.001, alpha=0.99)

# Learning rate schedulers
scheduler_step = optim.lr_scheduler.StepLR(optimizer_adam, step_size=30, gamma=0.1)
scheduler_cosine = optim.lr_scheduler.CosineAnnealingLR(optimizer_adam, T_max=100)
scheduler_reduce = optim.lr_scheduler.ReduceLROnPlateau(optimizer_adam, patience=5)
```

### 3. **Training Loop with Validation**

```python
def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    
    train_losses = []
    val_losses = []
    
    for epoch in range(num_epochs):
        # Training phase
        model.train()
        train_loss = 0.0
        
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
        
        train_loss /= len(train_loader)
        train_losses.append(train_loss)
        
        # Validation phase
        model.eval()
        val_loss = 0.0
        correct = 0
        total = 0
        
        with torch.no_grad():
            for data, target in val_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                loss = criterion(output, target)
                val_loss += loss.item()
                
                _, predicted = torch.max(output.data, 1)
                total += target.size(0)
                correct += (predicted == target).sum().item()
        
        val_loss /= len(val_loader)
        val_losses.append(val_loss)
        accuracy = 100 * correct / total
        
        print(f'Epoch {epoch+1}/{num_epochs}:')
        print(f'Training Loss: {train_loss:.4f}')
        print(f'Validation Loss: {val_loss:.4f}')
        print(f'Validation Accuracy: {accuracy:.2f}%')
        print('-' * 50)
    
    return train_losses, val_losses

# Example usage with DataLoader
from torch.utils.data import DataLoader, TensorDataset

# Create sample data
X_train = torch.randn(1000, 784)
y_train = torch.randint(0, 10, (1000,))
X_val = torch.randn(200, 784)
y_val = torch.randint(0, 10, (200,))

# Create DataLoaders
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

# Train model
model = FeedforwardNN(784, 128, 10)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

train_losses, val_losses = train_model(
    model, train_loader, val_loader, criterion, optimizer, num_epochs=5
)

# Plot training progress
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(train_losses, label='Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()
```

## 🚀 Advanced Techniques

### 1. **Batch Normalization**

Normalizes activations untuk training stability.

```python
class BatchNormNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(BatchNormNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.bn1 = nn.BatchNorm1d(hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.bn2 = nn.BatchNorm1d(hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)
    
    def forward(self, x):
        x = self.dropout(self.relu(self.bn1(self.fc1(x))))
        x = self.dropout(self.relu(self.bn2(self.fc2(x))))
        x = self.fc3(x)
        return x
```

### 2. **Residual Connections (ResNet)**

Skip connections untuk training very deep networks.

```python
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, 
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, 
                          stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
    
    def forward(self, x):
        residual = x
        
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        
        out += self.shortcut(residual)
        out = torch.relu(out)
        
        return out
```

### 3. **Attention Mechanisms**

Focus pada relevant parts of input.

```python
class SelfAttention(nn.Module):
    def __init__(self, embed_size, heads):
        super(SelfAttention, self).__init__()
        self.embed_size = embed_size
        self.heads = heads
        self.head_dim = embed_size // heads
        
        assert (self.head_dim * heads == embed_size), "Embed size needs to be divisible by heads"
        
        self.values = nn.Linear(self.head_dim, self.head_dim, bias=False)
        self.keys = nn.Linear(self.head_dim, self.head_dim, bias=False)
        self.queries = nn.Linear(self.head_dim, self.head_dim, bias=False)
        self.fc_out = nn.Linear(heads * self.head_dim, embed_size)
    
    def forward(self, values, keys, query, mask):
        N = query.shape[0]
        value_len, key_len, query_len = values.shape[1], keys.shape[1], query.shape[1]
        
        # Split embedding into self.heads pieces
        values = values.reshape(N, value_len, self.heads, self.head_dim)
        keys = keys.reshape(N, key_len, self.heads, self.head_dim)
        queries = query.reshape(N, query_len, self.heads, self.head_dim)
        
        # Energy
        energy = torch.einsum("nqhd,nkhd->nqhk", [queries, keys])
        
        if mask is not None:
            energy = energy.masked_fill(mask == 0, float("-1e20"))
        
        attention = torch.softmax(energy / (self.embed_size ** (1/2)), dim=3)
        
        out = torch.einsum("nqhk,nvhd->nqhd", [attention, values])
        out = out.reshape(N, query_len, self.heads * self.head_dim)
        
        return self.fc_out(out)
```

## 📊 Model Evaluation

### 1. **Classification Metrics**

```python
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
import seaborn as sns

def evaluate_classification(model, test_loader, device):
    model.eval()
    all_predictions = []
    all_targets = []
    
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            _, predicted = torch.max(output.data, 1)
            
            all_predictions.extend(predicted.cpu().numpy())
            all_targets.extend(target.cpu().numpy())
    
    # Calculate metrics
    accuracy = accuracy_score(all_targets, all_predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(
        all_targets, all_predictions, average='weighted'
    )
    
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1-Score: {f1:.4f}")
    
    # Confusion matrix
    cm = confusion_matrix(all_targets, all_predictions)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title('Confusion Matrix')
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.show()
    
    return accuracy, precision, recall, f1
```

### 2. **Regression Metrics**

```python
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

def evaluate_regression(model, test_loader, device):
    model.eval()
    all_predictions = []
    all_targets = []
    
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            
            all_predictions.extend(output.cpu().numpy())
            all_targets.extend(target.cpu().numpy())
    
    # Calculate metrics
    mse = mean_squared_error(all_targets, all_predictions)
    mae = mean_absolute_error(all_targets, all_predictions)
    r2 = r2_score(all_targets, all_predictions)
    rmse = np.sqrt(mse)
    
    print(f"MSE: {mse:.4f}")
    print(f"RMSE: {rmse:.4f}")
    print(f"MAE: {mae:.4f}")
    print(f"R²: {r2:.4f}")
    
    # Plot predictions vs actual
    plt.figure(figsize=(8, 6))
    plt.scatter(all_targets, all_predictions, alpha=0.5)
    plt.plot([min(all_targets), max(all_targets)], [min(all_targets), max(all_targets)], 'r--', lw=2)
    plt.xlabel('Actual Values')
    plt.ylabel('Predicted Values')
    plt.title('Predictions vs Actual Values')
    plt.grid(True)
    plt.show()
    
    return mse, rmse, mae, r2
```

## 🚀 Best Practices

### 1. **Data Preprocessing**

* Normalize/standardize input data
* Handle missing values
* Data augmentation for images
* Proper train/validation/test split

### 2. **Model Architecture**

* Start simple, increase complexity gradually
* Use appropriate activation functions
* Add regularization (dropout, batch norm)
* Consider residual connections for deep networks

### 3. **Training Strategy**

* Use appropriate learning rate
* Implement learning rate scheduling
* Use early stopping
* Monitor training/validation curves

### 4. **Regularization**

* Dropout
* Batch normalization
* Weight decay (L2 regularization)
* Data augmentation

### 5. **Hyperparameter Tuning**

* Learning rate
* Batch size
* Network architecture
* Regularization parameters

## 📚 References & Resources

### 📖 Books

* [**"Deep Learning"**](https://www.deeplearningbook.org/) by Ian Goodfellow, Yoshua Bengio, Aaron Courville
* [**"Neural Networks and Deep Learning"**](http://neuralnetworksanddeeplearning.com/) by Michael Nielsen
* [**"Hands-On Machine Learning"**](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/) by Aurélien Géron

### 🎓 Courses

* [**Andrew Ng's Deep Learning Specialization**](https://www.coursera.org/specializations/deep-learning)
* [**Fast.ai Practical Deep Learning**](https://course.fast.ai/)
* [**MIT 6.S191 Introduction to Deep Learning**](https://introtodeeplearning.com/)

### 📰 Research Papers

* [**"ImageNet Classification with Deep Convolutional Neural Networks"**](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) by Krizhevsky et al.
* [**"Attention Is All You Need"**](https://arxiv.org/abs/1706.03762) by Vaswani et al.
* [**"Deep Residual Learning for Image Recognition"**](https://arxiv.org/abs/1512.03385) by He et al.

### 🐙 GitHub Repositories

* [**PyTorch Examples**](https://github.com/pytorch/examples)
* [**TensorFlow Examples**](https://github.com/aymericdamien/TensorFlow-Examples)
* [**Awesome Deep Learning**](https://github.com/ChristosChristofidis/awesome-deep-learning)

### 📊 Datasets

* [**ImageNet**](http://www.image-net.org/) - Large-scale image dataset
* [**MNIST**](http://yann.lecun.com/exdb/mnist/) - Handwritten digits
* [**CIFAR**](https://www.cs.toronto.edu/~kriz/cifar.html) - Color images
* [**Hugging Face Datasets**](https://huggingface.co/datasets) - NLP datasets

## 🔗 Related Topics

* [🧠 ML Fundamentals](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals)
* [🔢 Supervised Learning](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals/supervised-learning)
* [🎯 Unsupervised Learning](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals/unsupervised-learning)
* [🐍 Python ML Tools](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/python-ml)

***

*Last updated: December 2024* *Contributors: \[Your Name]*