# Pytorch

## 📚 Overview

PyTorch adalah open-source machine learning library yang dikembangkan oleh Facebook (Meta). Library ini terkenal dengan dynamic computational graphs, intuitive Python interface, dan excellent support untuk deep learning research. PyTorch menyediakan flexible framework untuk building dan training neural networks.

## 🚀 Getting Started

### 1. **Installation & Basic Setup**

```python
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

# Check PyTorch version and CUDA availability
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU device: {torch.cuda.get_device_name(0)}")

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Set device (GPU if available, else CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
```

### 2. **Basic Tensor Operations**

```python
# Create tensors
scalar = torch.tensor(42)
vector = torch.tensor([1, 2, 3, 4, 5])
matrix = torch.tensor([[1, 2, 3], [4, 5, 6]])
tensor_3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print("Scalar:", scalar)
print("Vector:", vector)
print("Matrix:", matrix)
print("3D Tensor:", tensor_3d)

# Tensor properties
print(f"\nScalar shape: {scalar.shape}, dtype: {scalar.dtype}")
print(f"Vector shape: {vector.shape}, dtype: {vector.dtype}")
print(f"Matrix shape: {matrix.shape}, dtype: {matrix.dtype}")

# Move tensors to device
vector_gpu = vector.to(device)
print(f"Vector on {device}: {vector_gpu.device}")

# Basic operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

print(f"\nAddition: {a + b}")
print(f"Subtraction: {a - b}")
print(f"Multiplication: {a * b}")
print(f"Division: {a / b}")
print(f"Power: {a ** 2}")

# Broadcasting
matrix_2d = torch.tensor([[1, 2, 3], [4, 5, 6]])
vector_1d = torch.tensor([10, 20, 30])

print(f"\nMatrix + Vector (broadcasting):")
print(matrix_2d + vector_1d)

# Reshaping tensors
reshaped = matrix_2d.reshape(3, 2)
print(f"\nReshaped matrix:")
print(reshaped)

# Convert between PyTorch and NumPy
numpy_array = matrix_2d.numpy()
torch_tensor = torch.from_numpy(numpy_array)
print(f"\nNumPy array: {numpy_array}")
print(f"PyTorch tensor: {torch_tensor}")
```

## 🏗️ Building Neural Networks

### 1. **Sequential Model**

```python
# Create a simple sequential model
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(64, 10),
    nn.Softmax(dim=1)
)

# Move model to device
model = model.to(device)

# Model summary
print("Model architecture:")
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

# Alternative: using ModuleList
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, dropout_rate=0.2):
        super(SimpleNet, self).__init__()
        self.layers = nn.ModuleList([
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(hidden_size, hidden_size // 2),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(hidden_size // 2, output_size),
            nn.Softmax(dim=1)
        ])
    
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

# Create model instance
simple_model = SimpleNet(784, 128, 10).to(device)
print("\nSimpleNet architecture:")
print(simple_model)
```

### 2. **Custom Model with Forward Method**

```python
class CustomNet(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size, dropout_rate=0.2):
        super(CustomNet, self).__init__()
        
        # Build layers dynamically
        layers = []
        prev_size = input_size
        
        for hidden_size in hidden_sizes:
            layers.extend([
                nn.Linear(prev_size, hidden_size),
                nn.ReLU(),
                nn.Dropout(dropout_rate)
            ])
            prev_size = hidden_size
        
        # Output layer
        layers.append(nn.Linear(prev_size, output_size))
        layers.append(nn.Softmax(dim=1))
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.network(x)
    
    def get_feature_representation(self, x, layer_index):
        """Get intermediate feature representation"""
        features = x
        for i, layer in enumerate(self.network):
            features = layer(features)
            if i == layer_index:
                return features
        return features

# Create custom model
custom_model = CustomNet(784, [256, 128, 64], 10).to(device)
print("CustomNet architecture:")
print(custom_model)

# Test forward pass
dummy_input = torch.randn(1, 784).to(device)
output = custom_model(dummy_input)
print(f"\nOutput shape: {output.shape}")
print(f"Output sum (should be 1.0): {output.sum().item():.4f}")
```

### 3. **Advanced Model with Skip Connections**

```python
class ResidualBlock(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(ResidualBlock, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.linear2 = nn.Linear(hidden_size, input_size)
        self.relu = nn.ReLU()
        self.batch_norm1 = nn.BatchNorm1d(hidden_size)
        self.batch_norm2 = nn.BatchNorm1d(input_size)
    
    def forward(self, x):
        residual = x
        
        out = self.linear1(x)
        out = self.batch_norm1(out)
        out = self.relu(out)
        
        out = self.linear2(out)
        out = self.batch_norm2(out)
        
        # Skip connection
        out += residual
        out = self.relu(out)
        
        return out

class ResidualNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_blocks=3):
        super(ResidualNet, self).__init__()
        
        self.input_layer = nn.Linear(input_size, hidden_size)
        self.batch_norm = nn.BatchNorm1d(hidden_size)
        self.relu = nn.ReLU()
        
        # Residual blocks
        self.residual_blocks = nn.ModuleList([
            ResidualBlock(hidden_size, hidden_size) for _ in range(num_blocks)
        ])
        
        self.output_layer = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax(dim=1)
    
    def forward(self, x):
        x = self.input_layer(x)
        x = self.batch_norm(x)
        x = self.relu(x)
        
        # Pass through residual blocks
        for block in self.residual_blocks:
            x = block(x)
        
        x = self.output_layer(x)
        x = self.softmax(x)
        
        return x

# Create residual network
residual_model = ResidualNet(784, 256, 10, num_blocks=3).to(device)
print("ResidualNet architecture:")
print(residual_model)
```

## 🔧 Model Training

### 1. **Data Preparation**

```python
# Load and prepare MNIST dataset
from torchvision import datasets, transforms

# Define transformations
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST mean and std
])

# Load datasets
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)

# Create data loaders
from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Training batches: {len(train_loader)}")

# Visualize some samples
def visualize_samples(dataloader, num_samples=8):
    dataiter = iter(dataloader)
    images, labels = next(dataiter)
    
    plt.figure(figsize=(12, 6))
    for i in range(num_samples):
        plt.subplot(2, 4, i + 1)
        plt.imshow(images[i].squeeze(), cmap='gray')
        plt.title(f'Label: {labels[i]}')
        plt.axis('off')
    plt.tight_layout()
    plt.show()

# Visualize training samples
visualize_samples(train_loader)
```

### 2. **Training Loop**

```python
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function
def train_epoch(model, dataloader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for batch_idx, (data, target) in enumerate(dataloader):
        data, target = data.to(device), target.to(device)
        
        # Flatten images for dense layers
        data = data.view(data.size(0), -1)
        
        # Forward pass
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        # Statistics
        running_loss += loss.item()
        _, predicted = output.max(1)
        total += target.size(0)
        correct += predicted.eq(target).sum().item()
        
        if batch_idx % 100 == 0:
            print(f'Batch {batch_idx}/{len(dataloader)}, '
                  f'Loss: {loss.item():.4f}, '
                  f'Acc: {100.*correct/total:.2f}%')
    
    epoch_loss = running_loss / len(dataloader)
    epoch_acc = 100. * correct / total
    
    return epoch_loss, epoch_acc

# Validation function
def validate(model, dataloader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for data, target in dataloader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.size(0), -1)
            
            output = model(data)
            loss = criterion(output, target)
            
            running_loss += loss.item()
            _, predicted = output.max(1)
            total += target.size(0)
            correct += predicted.eq(target).sum().item()
    
    val_loss = running_loss / len(dataloader)
    val_acc = 100. * correct / total
    
    return val_loss, val_acc

# Training loop
num_epochs = 10
train_losses = []
train_accs = []
val_losses = []
val_accs = []

for epoch in range(num_epochs):
    print(f'\nEpoch {epoch+1}/{num_epochs}')
    print('-' * 50)
    
    # Training
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
    train_losses.append(train_loss)
    train_accs.append(train_acc)
    
    # Validation
    val_loss, val_acc = validate(model, test_loader, criterion, device)
    val_losses.append(val_loss)
    val_accs.append(val_acc)
    
    print(f'\nEpoch {epoch+1} Summary:')
    print(f'Training Loss: {train_loss:.4f}, Training Acc: {train_acc:.2f}%')
    print(f'Validation Loss: {val_loss:.4f}, Validation Acc: {val_acc:.2f}%')

print('\nTraining completed!')
```

### 3. **Training Visualization**

```python
# Plot training history
def plot_training_history(train_losses, train_accs, val_losses, val_accs):
    fig, axes = plt.subplots(1, 2, figsize=(15, 5))
    
    # Plot loss
    axes[0].plot(train_losses, label='Training Loss', color='blue')
    axes[0].plot(val_losses, label='Validation Loss', color='red')
    axes[0].set_title('Training and Validation Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Plot accuracy
    axes[1].plot(train_accs, label='Training Accuracy', color='blue')
    axes[1].plot(val_accs, label='Validation Accuracy', color='red')
    axes[1].set_title('Training and Validation Accuracy')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy (%)')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# Plot training history
plot_training_history(train_losses, train_accs, val_losses, val_accs)
```

## 🎨 Advanced Architectures

### 1. **Convolutional Neural Networks (CNNs)**

```python
class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        
        # Pooling and normalization
        self.pool = nn.MaxPool2d(2, 2)
        self.batch_norm1 = nn.BatchNorm2d(32)
        self.batch_norm2 = nn.BatchNorm2d(64)
        self.batch_norm3 = nn.BatchNorm2d(128)
        
        # Dropout
        self.dropout = nn.Dropout(0.25)
        
        # Fully connected layers
        self.fc1 = nn.Linear(128 * 3 * 3, 512)
        self.fc2 = nn.Linear(512, num_classes)
        
        # Activation
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)
    
    def forward(self, x):
        # First conv block
        x = self.conv1(x)
        x = self.batch_norm1(x)
        x = self.relu(x)
        x = self.pool(x)
        x = self.dropout(x)
        
        # Second conv block
        x = self.conv2(x)
        x = self.batch_norm2(x)
        x = self.relu(x)
        x = self.pool(x)
        x = self.dropout(x)
        
        # Third conv block
        x = self.conv3(x)
        x = self.batch_norm3(x)
        x = self.relu(x)
        x = self.pool(x)
        x = self.dropout(x)
        
        # Flatten and fully connected
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)
        
        return x

# Create CNN model
cnn_model = ConvNet().to(device)
print("CNN architecture:")
print(cnn_model)

# Test with sample input
sample_input = torch.randn(1, 1, 28, 28).to(device)
sample_output = cnn_model(sample_input)
print(f"\nSample input shape: {sample_input.shape}")
print(f"Sample output shape: {sample_output.shape}")
```

### 2. **Recurrent Neural Networks (RNNs)**

```python
class RNNNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes, dropout=0.2):
        super(RNNNet, self).__init__()
        
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # RNN layer
        self.rnn = nn.LSTM(
            input_size, 
            hidden_size, 
            num_layers, 
            batch_first=True, 
            dropout=dropout if num_layers > 1 else 0
        )
        
        # Fully connected layers
        self.fc1 = nn.Linear(hidden_size, hidden_size // 2)
        self.fc2 = nn.Linear(hidden_size // 2, num_classes)
        
        # Dropout and activation
        self.dropout = nn.Dropout(dropout)
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)
    
    def forward(self, x):
        # Initialize hidden state
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        
        # RNN forward pass
        out, _ = self.rnn(x, (h0, c0))
        
        # Take output from last time step
        out = out[:, -1, :]
        
        # Fully connected layers
        out = self.fc1(out)
        out = self.relu(out)
        out = self.dropout(out)
        
        out = self.fc2(out)
        out = self.softmax(out)
        
        return out

# Create RNN model
rnn_model = RNNNet(input_size=10, hidden_size=128, num_layers=2, num_classes=5).to(device)
print("RNN architecture:")
print(rnn_model)

# Test with sample input
sample_input = torch.randn(2, 20, 10).to(device)  # (batch, seq_len, features)
sample_output = rnn_model(sample_input)
print(f"\nSample input shape: {sample_input.shape}")
print(f"Sample output shape: {sample_output.shape}")
```

### 3. **Transformer Architecture**

```python
class MultiHeadAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        self.d_model = d_model
        self.num_heads = num_heads
        self.d_k = d_model // num_heads
        
        self.w_q = nn.Linear(d_model, d_model)
        self.w_k = nn.Linear(d_model, d_model)
        self.w_v = nn.Linear(d_model, d_model)
        self.w_o = nn.Linear(d_model, d_model)
        
    def forward(self, query, key, value, mask=None):
        batch_size = query.size(0)
        
        # Linear transformations
        Q = self.w_q(query).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        K = self.w_k(key).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        V = self.w_v(value).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        
        # Scaled dot-product attention
        scores = torch.matmul(Q, K.transpose(-2, -1)) / np.sqrt(self.d_k)
        
        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)
        
        attention_weights = F.softmax(scores, dim=-1)
        context = torch.matmul(attention_weights, V)
        
        # Reshape and apply output projection
        context = context.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model)
        output = self.w_o(context)
        
        return output, attention_weights

class TransformerBlock(nn.Module):
    def __init__(self, d_model, num_heads, d_ff, dropout=0.1):
        super(TransformerBlock, self).__init__()
        
        self.attention = MultiHeadAttention(d_model, num_heads)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        
        self.feed_forward = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(d_ff, d_model)
        )
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x, mask=None):
        # Self-attention with residual connection
        attn_output, _ = self.attention(x, x, x, mask)
        x = self.norm1(x + self.dropout(attn_output))
        
        # Feed-forward with residual connection
        ff_output = self.feed_forward(x)
        x = self.norm2(x + self.dropout(ff_output))
        
        return x

class TransformerNet(nn.Module):
    def __init__(self, input_size, d_model, num_heads, num_layers, num_classes, dropout=0.1):
        super(TransformerNet, self).__init__()
        
        self.input_projection = nn.Linear(input_size, d_model)
        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(d_model, num_heads, d_model * 4, dropout)
            for _ in range(num_layers)
        ])
        
        self.output_projection = nn.Linear(d_model, num_classes)
        self.softmax = nn.Softmax(dim=1)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x):
        # Project input to d_model dimensions
        x = self.input_projection(x)
        x = self.dropout(x)
        
        # Pass through transformer blocks
        for block in self.transformer_blocks:
            x = block(x)
        
        # Global average pooling and output projection
        x = torch.mean(x, dim=1)  # Global average pooling
        x = self.output_projection(x)
        x = self.softmax(x)
        
        return x

# Create transformer model
transformer_model = TransformerNet(
    input_size=10, 
    d_model=128, 
    num_heads=8, 
    num_layers=6, 
    num_classes=5
).to(device)

print("Transformer architecture:")
print(transformer_model)

# Test with sample input
sample_input = torch.randn(2, 20, 10).to(device)  # (batch, seq_len, features)
sample_output = transformer_model(sample_input)
print(f"\nSample input shape: {sample_input.shape}")
print(f"Sample output shape: {sample_output.shape}")
```

## 🔍 Model Evaluation & Analysis

### 1. **Performance Metrics**

```python
# Evaluate model performance
def evaluate_model(model, dataloader, criterion, device):
    model.eval()
    running_loss = 0.0
    all_predictions = []
    all_targets = []
    
    with torch.no_grad():
        for data, target in dataloader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.size(0), -1)
            
            output = model(data)
            loss = criterion(output, target)
            
            running_loss += loss.item()
            
            # Store predictions and targets
            _, predicted = output.max(1)
            all_predictions.extend(predicted.cpu().numpy())
            all_targets.extend(target.cpu().numpy())
    
    avg_loss = running_loss / len(dataloader)
    accuracy = 100. * sum(p == t for p, t in zip(all_predictions, all_targets)) / len(all_targets)
    
    return avg_loss, accuracy, all_predictions, all_targets

# Evaluate the model
test_loss, test_accuracy, predictions, targets = evaluate_model(
    model, test_loader, criterion, device
)

print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.2f}%")

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns

cm = confusion_matrix(targets, predictions)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

# Classification report
print("Classification Report:")
print(classification_report(targets, predictions))
```

### 2. **Model Interpretability**

```python
# Feature importance analysis
def analyze_feature_importance(model, sample_input, target_class):
    """Analyze feature importance using gradients"""
    model.eval()
    sample_input.requires_grad_(True)
    
    # Forward pass
    output = model(sample_input)
    
    # Backward pass
    output[0, target_class].backward()
    
    # Get gradients
    gradients = sample_input.grad.abs().mean(dim=0)
    
    return gradients.cpu().numpy()

# Analyze feature importance
sample_input = torch.randn(1, 784).to(device)
target_class = 5

feature_importance = analyze_feature_importance(model, sample_input, target_class)

plt.figure(figsize=(12, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.title(f'Feature Importance for Class {target_class}')
plt.xlabel('Feature Index')
plt.ylabel('Gradient Magnitude')
plt.grid(True, alpha=0.3)
plt.show()

# Visualize learned features (for CNN)
def visualize_cnn_features(model, sample_input):
    """Visualize intermediate CNN features"""
    model.eval()
    
    # Get intermediate activations
    activations = []
    
    def hook_fn(module, input, output):
        activations.append(output)
    
    # Register hooks for convolutional layers
    hooks = []
    for name, module in model.named_modules():
        if isinstance(module, nn.Conv2d):
            hook = module.register_forward_hook(hook_fn)
            hooks.append(hook)
    
    # Forward pass
    with torch.no_grad():
        _ = model(sample_input)
    
    # Remove hooks
    for hook in hooks:
        hook.remove()
    
    # Visualize first few feature maps
    if activations:
        first_layer_features = activations[0][0]  # First batch, first layer
        
        plt.figure(figsize=(15, 5))
        for i in range(min(16, first_layer_features.size(0))):
            plt.subplot(2, 8, i + 1)
            plt.imshow(first_layer_features[i].cpu(), cmap='viridis')
            plt.title(f'Feature {i}')
            plt.axis('off')
        plt.tight_layout()
        plt.show()

# Visualize CNN features (if using CNN model)
if 'cnn_model' in locals():
    sample_image = torch.randn(1, 1, 28, 28).to(device)
    visualize_cnn_features(cnn_model, sample_image)
```

## 🚀 Model Deployment

### 1. **Model Saving & Loading**

```python
# Save model in different formats
torch.save(model.state_dict(), 'model_weights.pth')  # Save weights only
torch.save(model, 'model_full.pth')  # Save entire model

# Save model for production
torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'epoch': num_epochs,
    'train_losses': train_losses,
    'val_losses': val_losses,
    'train_accs': train_accs,
    'val_accs': val_accs
}, 'checkpoint.pth')

# Load model weights
new_model = type(model)()  # Create new model instance
new_model.load_state_dict(torch.load('model_weights.pth'))
new_model.eval()

# Load entire model
loaded_model = torch.load('model_full.pth')
loaded_model.eval()

# Load checkpoint
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
```

### 2. **Model Export for Production**

```python
# Export to TorchScript
scripted_model = torch.jit.script(model)
torch.jit.save(scripted_model, 'model_scripted.pt')

# Export to ONNX
dummy_input = torch.randn(1, 784).to(device)
torch.onnx.export(
    model, 
    dummy_input, 
    'model.onnx',
    export_params=True,
    opset_version=11,
    do_constant_folding=True,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

# Test exported models
# TorchScript
scripted_model = torch.jit.load('model_scripted.pt')
scripted_output = scripted_model(dummy_input)

# ONNX (requires onnx and onnxruntime)
try:
    import onnx
    import onnxruntime as ort
    
    # Load ONNX model
    onnx_model = onnx.load('model.onnx')
    onnx.checker.check_model(onnx_model)
    
    # Create ONNX runtime session
    ort_session = ort.InferenceSession('model.onnx')
    
    # Test inference
    ort_inputs = {ort_session.get_inputs()[0].name: dummy_input.cpu().numpy()}
    ort_outputs = ort_session.run(None, ort_inputs)
    
    print("ONNX model loaded and tested successfully!")
    
except ImportError:
    print("ONNX libraries not installed. Install with: pip install onnx onnxruntime")
```

### 3. **Performance Optimization**

```python
# Model optimization techniques
def optimize_model(model):
    """Apply various optimization techniques"""
    
    # 1. Fuse batch normalization
    if hasattr(model, 'features'):
        torch.quantization.fuse_modules(model.features, ['conv', 'bn', 'relu'])
    
    # 2. Quantization (INT8)
    quantized_model = torch.quantization.quantize_dynamic(
        model, {nn.Linear, nn.Conv2d}, dtype=torch.qint8
    )
    
    # 3. JIT compilation
    jit_model = torch.jit.script(model)
    
    return quantized_model, jit_model

# Benchmark model performance
import time

def benchmark_model(model, input_data, num_runs=100):
    """Benchmark model inference time"""
    model.eval()
    
    # Warmup
    with torch.no_grad():
        for _ in range(10):
            _ = model(input_data)
    
    # Benchmark
    start_time = time.time()
    with torch.no_grad():
        for _ in range(num_runs):
            _ = model(input_data)
    end_time = time.time()
    
    avg_time = (end_time - start_time) / num_runs
    return avg_time

# Benchmark original model
dummy_input = torch.randn(1, 784).to(device)
original_time = benchmark_model(model, dummy_input)

print(f"Original model inference time: {original_time*1000:.2f} ms")

# Apply optimizations
quantized_model, jit_model = optimize_model(model)

# Benchmark optimized models
quantized_time = benchmark_model(quantized_model, dummy_input)
jit_time = benchmark_model(jit_model, dummy_input)

print(f"Quantized model inference time: {quantized_time*1000:.2f} ms")
print(f"JIT model inference time: {jit_time*1000:.2f} ms")
print(f"Quantization speedup: {original_time/quantized_time:.2f}x")
print(f"JIT speedup: {original_time/jit_time:.2f}x")
```

## 🔧 Advanced Features

### 1. **Custom Training Loops with Gradient Accumulation**

```python
def train_with_gradient_accumulation(model, dataloader, criterion, optimizer, 
                                   device, accumulation_steps=4):
    """Training with gradient accumulation for larger effective batch size"""
    model.train()
    running_loss = 0.0
    optimizer.zero_grad()
    
    for batch_idx, (data, target) in enumerate(dataloader):
        data, target = data.to(device), target.to(device)
        data = data.view(data.size(0), -1)
        
        # Forward pass
        output = model(data)
        loss = criterion(output, target) / accumulation_steps
        
        # Backward pass
        loss.backward()
        
        # Gradient accumulation
        if (batch_idx + 1) % accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()
        
        running_loss += loss.item() * accumulation_steps
        
        if batch_idx % 100 == 0:
            print(f'Batch {batch_idx}/{len(dataloader)}, Loss: {loss.item()*accumulation_steps:.4f}')
    
    return running_loss / len(dataloader)
```

### 2. **Mixed Precision Training**

```python
from torch.cuda.amp import GradScaler, autocast

def train_with_mixed_precision(model, dataloader, criterion, optimizer, device):
    """Training with mixed precision for faster training and lower memory usage"""
    scaler = GradScaler()
    model.train()
    running_loss = 0.0
    
    for batch_idx, (data, target) in enumerate(dataloader):
        data, target = data.to(device), target.to(device)
        data = data.view(data.size(0), -1)
        
        optimizer.zero_grad()
        
        # Forward pass with autocast
        with autocast():
            output = model(data)
            loss = criterion(output, target)
        
        # Backward pass with scaler
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        
        running_loss += loss.item()
        
        if batch_idx % 100 == 0:
            print(f'Batch {batch_idx}/{len(dataloader)}, Loss: {loss.item():.4f}')
    
    return running_loss / len(dataloader)
```

## 🚀 Best Practices

### 1. **Model Architecture**

* **Start simple**: Begin with basic architectures
* **Use appropriate layers**: Choose layers based on data type
* **Regularization**: Apply dropout and batch normalization
* **Residual connections**: Use skip connections for deep networks

### 2. **Training Optimization**

* **Learning rate scheduling**: Use schedulers for LR reduction
* **Early stopping**: Prevent overfitting
* **Data augmentation**: Increase dataset diversity
* **Mixed precision**: Use for faster training on modern GPUs

### 3. **Performance & Deployment**

* **Model optimization**: Use TorchScript and quantization
* **Batch processing**: Process data in batches
* **GPU utilization**: Maximize GPU memory usage
* **Model serving**: Use TorchServe for production

## 📚 References & Resources

### 📖 Documentation

* [**PyTorch Official Documentation**](https://pytorch.org/docs/)
* [**PyTorch Tutorials**](https://pytorch.org/tutorials/)
* [**PyTorch Examples**](https://github.com/pytorch/examples)
* [**PyTorch Guide**](https://pytorch.org/docs/stable/notes/index.html)

### 🎓 Tutorials & Courses

* [**PyTorch Tutorials**](https://pytorch.org/tutorials/)
* [**Deep Learning with PyTorch**](https://pytorch.org/deep-learning-with-pytorch)
* [**PyTorch Fundamentals**](https://pytorch.org/tutorials/beginner/basics/intro.html)
* [**PyTorch YouTube Channel**](https://www.youtube.com/c/PyTorch)

### 📰 Articles & Blogs

* [**PyTorch Blog**](https://pytorch.org/blog/)
* [**PyTorch Best Practices**](https://pytorch.org/docs/stable/notes/windows.html)
* [**Model Optimization Guide**](https://pytorch.org/tutorials/recipes/recipes/introduction_to_recipe.html)

### 🐙 GitHub Repositories

* [**PyTorch Source Code**](https://github.com/pytorch/pytorch)
* [**PyTorch Examples**](https://github.com/pytorch/examples)
* [**PyTorch Hub**](https://github.com/pytorch/hub)
* [**PyTorch Lightning**](https://github.com/PyTorchLightning/pytorch-lightning)

### 📊 Datasets & Models

* [**PyTorch Datasets**](https://pytorch.org/vision/stable/datasets.html)
* [**PyTorch Hub Models**](https://pytorch.org/hub/)
* [**TorchVision Models**](https://pytorch.org/vision/stable/models.html)

## 🔗 Related Topics

* [🐍 Python ML Tools](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/python-ml)
* [🧠 TensorFlow](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/python-ml/tensorflow)
* [📊 NumPy & Pandas](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/python-ml/numpy-pandas)
* [🧠 ML Fundamentals](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals)

***

*Last updated: December 2024* *Contributors: \[Your Name]*