# Neural Networks

## 📚 Overview

Neural Networks adalah computational models yang terinspirasi dari biological neural networks di otak manusia. Mereka terdiri dari interconnected nodes (neurons) yang memproses informasi dan belajar patterns dari data melalui training process.

## 🧠 Mathematical Foundations

### 1. **Single Neuron (Perceptron)**

Mathematical representation dari single neuron:

```
Input: x₁, x₂, ..., xₙ
Weights: w₁, w₂, ..., wₙ
Bias: b
Output: y = f(Σ(wᵢxᵢ) + b)
```

Dimana:

* `xᵢ` adalah input features
* `wᵢ` adalah weights
* `b` adalah bias term
* `f()` adalah activation function

```python
import numpy as np
import matplotlib.pyplot as plt

class Perceptron:
    def __init__(self, input_size, learning_rate=0.01):
        self.weights = np.random.randn(input_size) * 0.01
        self.bias = 0.0
        self.learning_rate = learning_rate
    
    def forward(self, inputs):
        # Linear combination
        z = np.dot(inputs, self.weights) + self.bias
        # Activation function (step function)
        return 1 if z > 0 else 0
    
    def train(self, inputs, target):
        # Forward pass
        prediction = self.forward(inputs)
        
        # Calculate error
        error = target - prediction
        
        # Update weights and bias
        self.weights += self.learning_rate * error * inputs
        self.bias += self.learning_rate * error
        
        return error

# Example: AND gate
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 0, 0, 1])

perceptron = Perceptron(input_size=2)

# Training
epochs = 100
for epoch in range(epochs):
    total_error = 0
    for inputs, target in zip(X, y):
        error = perceptron.train(inputs, target)
        total_error += abs(error)
    
    if total_error == 0:
        print(f"Converged at epoch {epoch}")
        break

# Test
print("AND Gate Results:")
for inputs in X:
    prediction = perceptron.forward(inputs)
    print(f"Input: {inputs}, Output: {prediction}")
```

### 2. **Activation Functions**

Fungsi yang menentukan output dari neuron.

#### **Step Function (Binary)**

```
f(x) = 1 if x > 0 else 0
```

#### **Sigmoid Function**

```
f(x) = 1 / (1 + e^(-x))
```

#### **ReLU (Rectified Linear Unit)**

```
f(x) = max(0, x)
```

#### **Tanh (Hyperbolic Tangent)**

```
f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
```

```python
def plot_activation_functions():
    x = np.linspace(-5, 5, 100)
    
    # Different activation functions
    step = np.where(x > 0, 1, 0)
    sigmoid = 1 / (1 + np.exp(-x))
    relu = np.maximum(0, x)
    tanh = np.tanh(x)
    leaky_relu = np.where(x > 0, x, 0.01 * x)
    
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    
    axes[0, 0].plot(x, step)
    axes[0, 0].set_title('Step Function')
    axes[0, 0].grid(True)
    
    axes[0, 1].plot(x, sigmoid)
    axes[0, 1].set_title('Sigmoid')
    axes[0, 1].grid(True)
    
    axes[0, 2].plot(x, relu)
    axes[0, 2].set_title('ReLU')
    axes[0, 2].grid(True)
    
    axes[1, 0].plot(x, tanh)
    axes[1, 0].set_title('Tanh')
    axes[1, 0].grid(True)
    
    axes[1, 1].plot(x, leaky_relu)
    axes[1, 1].set_title('Leaky ReLU')
    axes[1, 1].grid(True)
    
    # Derivative of ReLU
    relu_derivative = np.where(x > 0, 1, 0)
    axes[1, 2].plot(x, relu_derivative)
    axes[1, 2].set_title('ReLU Derivative')
    axes[1, 2].grid(True)
    
    plt.tight_layout()
    plt.show()

plot_activation_functions()
```

### 3. **Forward Propagation**

Process menghitung output dari neural network.

```python
class SimpleNeuralNetwork:
    def __init__(self, layer_sizes):
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes)
        
        # Initialize weights and biases
        self.weights = []
        self.biases = []
        
        for i in range(self.num_layers - 1):
            w = np.random.randn(layer_sizes[i+1], layer_sizes[i]) * 0.01
            b = np.zeros((layer_sizes[i+1], 1))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))
    
    def sigmoid_derivative(self, z):
        s = self.sigmoid(z)
        return s * (1 - s)
    
    def forward(self, X):
        self.activations = [X]
        self.z_values = []
        
        for i in range(self.num_layers - 1):
            z = np.dot(self.weights[i], self.activations[i]) + self.biases[i]
            self.z_values.append(z)
            
            if i == self.num_layers - 2:
                # Output layer - no activation for regression, softmax for classification
                a = z
            else:
                a = self.sigmoid(z)
            
            self.activations.append(a)
        
        return self.activations[-1]

# Example usage
layer_sizes = [2, 3, 1]  # 2 input, 3 hidden, 1 output
nn = SimpleNeuralNetwork(layer_sizes)

# Test forward pass
X = np.array([[1], [2]])  # 2 input features
output = nn.forward(X)
print(f"Input: {X.flatten()}")
print(f"Output: {output.flatten()}")
```

## 🏗️ Network Architectures

### 1. **Feedforward Neural Network**

Basic neural network dengan forward connections.

```python
class FeedforwardNN:
    def __init__(self, input_size, hidden_sizes, output_size):
        self.layer_sizes = [input_size] + hidden_sizes + [output_size]
        self.num_layers = len(self.layer_sizes)
        
        # Initialize weights and biases
        self.weights = []
        self.biases = []
        
        for i in range(self.num_layers - 1):
            w = np.random.randn(self.layer_sizes[i+1], self.layer_sizes[i]) * np.sqrt(2.0 / self.layer_sizes[i])
            b = np.zeros((self.layer_sizes[i+1], 1))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def relu(self, z):
        return np.maximum(0, z)
    
    def relu_derivative(self, z):
        return np.where(z > 0, 1, 0)
    
    def forward(self, X):
        self.activations = [X]
        self.z_values = []
        
        for i in range(self.num_layers - 1):
            z = np.dot(self.weights[i], self.activations[i]) + self.biases[i]
            self.z_values.append(z)
            
            if i == self.num_layers - 2:
                # Output layer - no activation for regression
                a = z
            else:
                a = self.relu(z)
            
            self.activations.append(a)
        
        return self.activations[-1]
    
    def backward(self, X, y, learning_rate=0.01):
        m = X.shape[1]
        
        # Compute gradients
        delta = self.activations[-1] - y
        
        for i in range(self.num_layers - 2, -1, -1):
            dW = np.dot(delta, self.activations[i].T) / m
            db = np.sum(delta, axis=1, keepdims=True) / m
            
            if i > 0:
                delta = np.dot(self.weights[i].T, delta) * self.relu_derivative(self.z_values[i-1])
            
            # Update weights and biases
            self.weights[i] -= learning_rate * dW
            self.biases[i] -= learning_rate * db
    
    def train(self, X, y, epochs=1000, learning_rate=0.01):
        costs = []
        
        for epoch in range(epochs):
            # Forward pass
            output = self.forward(X)
            
            # Compute cost (MSE)
            cost = np.mean((output - y) ** 2)
            costs.append(cost)
            
            # Backward pass
            self.backward(X, y, learning_rate)
            
            if epoch % 100 == 0:
                print(f"Epoch {epoch}, Cost: {cost:.6f}")
        
        return costs

# Example: XOR problem
X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]])  # 2 features, 4 samples
y = np.array([[0, 1, 1, 0]])  # XOR output

# Create and train network
nn = FeedforwardNN(input_size=2, hidden_sizes=[4], output_size=1)
costs = nn.train(X, y, epochs=5000, learning_rate=0.1)

# Test
print("\nXOR Results:")
for i in range(X.shape[1]):
    input_data = X[:, i:i+1]
    prediction = nn.forward(input_data)
    print(f"Input: {input_data.flatten()}, Output: {prediction[0, 0]:.4f}")

# Plot training progress
plt.figure(figsize=(10, 6))
plt.plot(costs)
plt.xlabel('Epoch')
plt.ylabel('Cost (MSE)')
plt.title('Training Progress')
plt.grid(True)
plt.show()
```

### 2. **Multi-Layer Perceptron (MLP)**

Feedforward network dengan multiple hidden layers.

```python
class MLP:
    def __init__(self, layer_sizes, activation='relu'):
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes)
        self.activation = activation
        
        # Initialize weights and biases
        self.weights = []
        self.biases = []
        
        for i in range(self.num_layers - 1):
            # He initialization for ReLU
            if activation == 'relu':
                w = np.random.randn(self.layer_sizes[i+1], self.layer_sizes[i]) * np.sqrt(2.0 / self.layer_sizes[i])
            else:
                w = np.random.randn(self.layer_sizes[i+1], self.layer_sizes[i]) * 0.01
            
            b = np.zeros((self.layer_sizes[i+1], 1))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def activate(self, z):
        if self.activation == 'relu':
            return np.maximum(0, z)
        elif self.activation == 'sigmoid':
            return 1 / (1 + np.exp(-z))
        elif self.activation == 'tanh':
            return np.tanh(z)
        else:
            return z
    
    def activate_derivative(self, z):
        if self.activation == 'relu':
            return np.where(z > 0, 1, 0)
        elif self.activation == 'sigmoid':
            s = 1 / (1 + np.exp(-z))
            return s * (1 - s)
        elif self.activation == 'tanh':
            return 1 - np.tanh(z) ** 2
        else:
            return 1
    
    def forward(self, X):
        self.activations = [X]
        self.z_values = []
        
        for i in range(self.num_layers - 1):
            z = np.dot(self.weights[i], self.activations[i]) + self.biases[i]
            self.z_values.append(z)
            
            if i == self.num_layers - 2:
                # Output layer - no activation for regression
                a = z
            else:
                a = self.activate(z)
            
            self.activations.append(a)
        
        return self.activations[-1]
    
    def backward(self, X, y, learning_rate=0.01):
        m = X.shape[1]
        
        # Compute gradients
        delta = self.activations[-1] - y
        
        for i in range(self.num_layers - 2, -1, -1):
            dW = np.dot(delta, self.activations[i].T) / m
            db = np.sum(delta, axis=1, keepdims=True) / m
            
            if i > 0:
                delta = np.dot(self.weights[i].T, delta) * self.activate_derivative(self.z_values[i-1])
            
            # Update weights and biases
            self.weights[i] -= learning_rate * dW
            self.biases[i] -= learning_rate * db
    
    def train(self, X, y, epochs=1000, learning_rate=0.01, batch_size=None):
        costs = []
        
        if batch_size is None:
            batch_size = X.shape[1]
        
        for epoch in range(epochs):
            # Mini-batch training
            indices = np.random.permutation(X.shape[1])
            X_shuffled = X[:, indices]
            y_shuffled = y[:, indices]
            
            for i in range(0, X.shape[1], batch_size):
                X_batch = X_shuffled[:, i:i+batch_size]
                y_batch = y_shuffled[:, i:i+batch_size]
                
                # Forward pass
                output = self.forward(X_batch)
                
                # Compute cost (MSE)
                cost = np.mean((output - y_batch) ** 2)
                
                # Backward pass
                self.backward(X_batch, y_batch, learning_rate)
            
            # Compute cost on full dataset
            output_full = self.forward(X)
            cost_full = np.mean((output_full - y) ** 2)
            costs.append(cost_full)
            
            if epoch % 100 == 0:
                print(f"Epoch {epoch}, Cost: {cost_full:.6f}")
        
        return costs

# Example: Non-linear regression
np.random.seed(42)
X = np.random.randn(1, 100) * 2
y = np.sin(X) + 0.1 * np.random.randn(1, 100)

# Create and train MLP
mlp = MLP(layer_sizes=[1, 10, 10, 1], activation='relu')
costs = mlp.train(X, y, epochs=2000, learning_rate=0.01)

# Test
X_test = np.linspace(-4, 4, 100).reshape(1, -1)
y_pred = mlp.forward(X_test)

# Plot results
plt.figure(figsize=(15, 5))

plt.subplot(1, 2, 1)
plt.plot(costs)
plt.xlabel('Epoch')
plt.ylabel('Cost (MSE)')
plt.title('Training Progress')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.scatter(X.flatten(), y.flatten(), alpha=0.6, label='Training Data')
plt.plot(X_test.flatten(), y_pred.flatten(), 'r-', linewidth=2, label='MLP Prediction')
plt.plot(X_test.flatten(), np.sin(X_test.flatten()), 'g--', linewidth=2, label='True Function')
plt.xlabel('x')
plt.ylabel('y')
plt.title('MLP Regression')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()
```

## 🔧 Training Algorithms

### 1. **Gradient Descent**

Optimization algorithm untuk minimizing loss function.

```python
def gradient_descent_example():
    # Simple function: f(x) = x² + 2x + 1
    def f(x):
        return x**2 + 2*x + 1
    
    def df(x):
        return 2*x + 2
    
    # Gradient descent
    x = 5.0  # Starting point
    learning_rate = 0.1
    iterations = 100
    
    x_history = [x]
    f_history = [f(x)]
    
    for i in range(iterations):
        x = x - learning_rate * df(x)
        x_history.append(x)
        f_history.append(f(x))
    
    # Plot optimization
    x_range = np.linspace(-1, 6, 100)
    y_range = f(x_range)
    
    plt.figure(figsize=(12, 5))
    
    plt.subplot(1, 2, 1)
    plt.plot(x_range, y_range, 'b-', label='f(x) = x² + 2x + 1')
    plt.plot(x_history, f_history, 'ro-', label='Optimization Path')
    plt.xlabel('x')
    plt.ylabel('f(x)')
    plt.title('Gradient Descent Optimization')
    plt.legend()
    plt.grid(True)
    
    plt.subplot(1, 2, 2)
    plt.plot(f_history)
    plt.xlabel('Iteration')
    plt.ylabel('f(x)')
    plt.title('Cost Function Value')
    plt.grid(True)
    
    plt.tight_layout()
    plt.show()
    
    print(f"Minimum found at x = {x:.6f}")
    print(f"Minimum value = {f(x):.6f}")

gradient_descent_example()
```

### 2. **Backpropagation**

Algorithm untuk computing gradients dalam neural networks.

```python
def backpropagation_example():
    # Simple network: 2 inputs -> 2 hidden -> 1 output
    np.random.seed(42)
    
    # Initialize weights
    W1 = np.random.randn(2, 2) * 0.01
    b1 = np.zeros((2, 1))
    W2 = np.random.randn(1, 2) * 0.01
    b2 = np.zeros((1, 1))
    
    # Training data
    X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]])  # XOR input
    y = np.array([[0, 1, 1, 0]])  # XOR output
    
    learning_rate = 0.1
    epochs = 10000
    
    costs = []
    
    for epoch in range(epochs):
        # Forward pass
        Z1 = np.dot(W1, X) + b1
        A1 = 1 / (1 + np.exp(-Z1))  # Sigmoid
        
        Z2 = np.dot(W2, A1) + b2
        A2 = 1 / (1 + np.exp(-Z2))  # Sigmoid
        
        # Compute cost
        cost = -np.mean(y * np.log(A2 + 1e-8) + (1 - y) * np.log(1 - A2 + 1e-8))
        costs.append(cost)
        
        # Backward pass
        dA2 = (A2 - y) / (A2 * (1 - A2) + 1e-8)
        dZ2 = dA2 * A2 * (1 - A2)
        
        dW2 = np.dot(dZ2, A1.T)
        db2 = np.sum(dZ2, axis=1, keepdims=True)
        
        dA1 = np.dot(W2.T, dZ2)
        dZ1 = dA1 * A1 * (1 - A1)
        
        dW1 = np.dot(dZ1, X.T)
        db1 = np.sum(dZ1, axis=1, keepdims=True)
        
        # Update weights
        W2 -= learning_rate * dW2
        b2 -= learning_rate * db2
        W1 -= learning_rate * dW1
        b1 -= learning_rate * db1
        
        if epoch % 1000 == 0:
            print(f"Epoch {epoch}, Cost: {cost:.6f}")
    
    # Test
    print("\nXOR Results:")
    for i in range(X.shape[1]):
        input_data = X[:, i:i+1]
        
        # Forward pass
        Z1 = np.dot(W1, input_data) + b1
        A1 = 1 / (1 + np.exp(-Z1))
        Z2 = np.dot(W2, A1) + b2
        A2 = 1 / (1 + np.exp(-Z2))
        
        print(f"Input: {input_data.flatten()}, Output: {A2[0, 0]:.4f}")
    
    # Plot training progress
    plt.figure(figsize=(10, 6))
    plt.plot(costs)
    plt.xlabel('Epoch')
    plt.ylabel('Cost')
    plt.title('Training Progress')
    plt.grid(True)
    plt.show()

backpropagation_example()
```

## 📊 Model Evaluation

### 1. **Classification Metrics**

```python
def evaluate_classification(y_true, y_pred):
    """Evaluate classification performance"""
    from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
    
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='weighted')
    recall = recall_score(y_true, y_pred, average='weighted')
    f1 = f1_score(y_true, y_pred, average='weighted')
    
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1-Score: {f1:.4f}")
    
    # Confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title('Confusion Matrix')
    plt.colorbar()
    
    # Add text annotations
    thresh = cm.max() / 2
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            plt.text(j, i, format(cm[i, j], 'd'),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black")
    
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.show()
    
    return accuracy, precision, recall, f1
```

### 2. **Regression Metrics**

```python
def evaluate_regression(y_true, y_pred):
    """Evaluate regression performance"""
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
    
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)
    
    print(f"MSE: {mse:.4f}")
    print(f"RMSE: {rmse:.4f}")
    print(f"MAE: {mae:.4f}")
    print(f"R²: {r2:.4f}")
    
    # Plot predictions vs actual
    plt.figure(figsize=(8, 6))
    plt.scatter(y_true, y_pred, alpha=0.6)
    plt.plot([y_true.min(), y_true.max()], [y_true.min(), y_true.max()], 'r--', lw=2)
    plt.xlabel('Actual Values')
    plt.ylabel('Predicted Values')
    plt.title('Predictions vs Actual Values')
    plt.grid(True)
    plt.show()
    
    return mse, rmse, mae, r2
```

## 🚀 Best Practices

### 1. **Weight Initialization**

* **Xavier/Glorot**: Untuk sigmoid/tanh activations
* **He**: Untuk ReLU activations
* **Random**: Untuk simple cases

### 2. **Activation Functions**

* **ReLU**: Default choice untuk hidden layers
* **Sigmoid/Tanh**: Untuk output layers (classification)
* **Linear**: Untuk output layers (regression)

### 3. **Regularization**

* **Dropout**: Prevent overfitting
* **L1/L2**: Weight regularization
* **Early stopping**: Stop training when validation performance degrades

### 4. **Training Strategy**

* **Learning rate**: Start small, adjust based on convergence
* **Batch size**: Balance between memory and convergence
* **Optimization**: Adam, SGD with momentum, RMSprop

### 5. **Architecture Design**

* **Start simple**: Begin with few layers
* **Gradual complexity**: Add layers as needed
* **Skip connections**: For very deep networks

## 📚 References & Resources

### 📖 Books

* [**"Neural Networks and Deep Learning"**](http://neuralnetworksanddeeplearning.com/) by Michael Nielsen
* [**"Deep Learning"**](https://www.deeplearningbook.org/) by Ian Goodfellow, Yoshua Bengio, Aaron Courville
* [**"Pattern Recognition and Machine Learning"**](https://www.microsoft.com/en-us/research/people/cmbishop/) by Christopher Bishop

### 🎓 Courses

* [**Andrew Ng's Machine Learning Course**](https://www.coursera.org/learn/machine-learning)
* [**CS231n: Convolutional Neural Networks**](http://cs231n.stanford.edu/) by Stanford
* [**MIT 6.S191 Introduction to Deep Learning**](https://introtodeeplearning.com/)

### 📰 Research Papers

* [**"Learning representations by back-propagating errors"**](https://www.nature.com/articles/323533a0) by Rumelhart et al.
* [**"ImageNet Classification with Deep Convolutional Neural Networks"**](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) by Krizhevsky et al.
* [**"Delving Deep into Rectifiers"**](https://arxiv.org/abs/1502.01852) by He et al.

### 🐙 GitHub Repositories

* [**ML-From-Scratch**](https://github.com/eriklindernoren/ML-From-Scratch) - Implementation of ML algorithms
* [**Neural Networks From Scratch**](https://github.com/Sentdex/NNfSiX) - Neural network implementation
* [**Awesome Neural Networks**](https://github.com/rockerBOO/awesome-neural-networks) - Curated neural network resources

### 📊 Datasets

* [**UCI Machine Learning Repository**](https://archive.ics.uci.edu/ml/)
* [**MNIST**](http://yann.lecun.com/exdb/mnist/) - Handwritten digits
* [**CIFAR**](https://www.cs.toronto.edu/~kriz/cifar.html) - Image datasets

## 🔗 Related Topics

* [🧠 ML Fundamentals](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals)
* [🔢 Supervised Learning](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals/supervised-learning)
* [📊 Deep Learning](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals/deep-learning)
* [🐍 Python ML Tools](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/python-ml)

***

*Last updated: December 2024* *Contributors: \[Your Name]*
