# Scaling Advanced

## Horizontal Pod Autoscaling (HPA)

### Konsep Dasar HPA

Horizontal Pod Autoscaling (HPA) menyesuaikan jumlah replika Pod berdasarkan metrik yang diamati secara otomatis.

#### Cara Kerja HPA

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Metrics API   │───▶│   HPA Controller │───▶│  Scale Deployment│
│                 │    │                  │    │                 │
│ • CPU/Memory    │    │ • Algorithm      │    │ • + Replicas    │
│ • Custom Metrics│    │ • Target Value   │    │ • - Replicas    │
│ • External      │    │ • Cooldown       │    │ • Update Config │
└─────────────────┘    └──────────────────┘    └─────────────────┘
```

### HPA dengan CPU/Memory Metrics

#### Basic HPA Configuration

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp-deployment
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
```

#### HPA dengan Multiple Metrics

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: complex-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 50
  metrics:
  # CPU Utilization
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  # Memory Utilization
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  # Custom Metrics
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  # External Metrics
  - type: External
    external:
      metric:
        name: queue_length
        selector:
          matchLabels:
            queue: "payment-processing"
      target:
        type: AverageValue
        averageValue: "30"
```

### Custom Metrics HPA

#### Setup Metrics Server

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: metrics-server-config
  namespace: kube-system
data:
  config.yaml: |
    apiVersion: metrics-server/v0alpha1
    kind: MetricsServerConfiguration
    args:
      - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      - --kubelet-use-node-status-port
      - --metric-resolution=15s
      - --container-runtime=docker
```

#### Prometheus Adapter for Custom Metrics

```yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: prometheus-adapter
    namespace: monitoring
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-adapter
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-adapter
rules:
- apiGroups: [""]
  resources: ["nodes", "pods"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "watch"]
```

#### Custom Metric Rules

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
    # Queue length metric
    - seriesQuery: 'rabbitmq_queue_messages{queue!="",namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
          queue: {resource: "pod"}
      name:
        matches: "^rabbitmq_queue_messages"
        as: "rabbitmq_queue_messages"
      metricsQuery: 'sum(rabbitmq_queue_messages{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

    # Request rate metric
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^http_requests_total"
        as: "http_requests_per_second"
      metricsQuery: 'sum(rate(http_requests_total{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
```

### HPA Best Practices

#### Resource Requests Required

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scalable-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: nginx:1.21
        resources:
          requests:
            cpu: 100m      # Required for CPU HPA
            memory: 128Mi  # Required for Memory HPA
          limits:
            cpu: 500m
            memory: 512Mi
```

#### HPA with Readiness Gates

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      policies:
      - type: Percent
        value: 50
        periodSeconds: 30
      - type: Pods
        value: 2
        periodSeconds: 30
      selectPolicy: Max
    scaleDown:
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      selectPolicy: Min
```

## Vertical Pod Autoscaling (VPA)

### Konsep VPA

Vertical Pod Autoscaling (VPA) menyesuaikan resource requests dan limits untuk container secara otomatis berdasarkan penggunaan historis.

#### VPA Components

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   VPA Updater   │    │   VPA Recommender│    │   VPA Admission │
│                 │    │                  │    │     Controller  │
│ • Apply Updates │    │ • Analyze Usage  │    │ • Mutate Pods   │
│ • Recreate Pods │    │ • Recommend Size │    │ • Set Resources │
│ • Handle Events │    │ • Calculate Need │    │ • Validate       │
└─────────────────┘    └──────────────────┘    └─────────────────┘
```

### VPA Implementation

#### Basic VPA Configuration

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp-deployment
  updatePolicy:
    updateMode: "Auto"  # Auto, Recreate, or Initial
  resourcePolicy:
    containerPolicies:
    - containerName: webapp
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
```

#### VPA with Multiple Containers

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: multi-container-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: multi-container-app
  updatePolicy:
    updateMode: "Recreate"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 1
        memory: 2Gi
      controlledResources: ["cpu", "memory"]
    - containerName: sidecar
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 500m
        memory: 512Mi
      controlledResources: ["cpu"]
      controlledValues: "RequestsOnly"
```

#### VPA Update Modes

```yaml
# UpdateMode: Auto
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: auto-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: critical-app
  updatePolicy:
    updateMode: "Auto"  # Automatically updates pods
  resourcePolicy:
    containerPolicies:
    - containerName: app
      controlledResources: ["cpu", "memory"]
---
# UpdateMode: Recreate
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: recreate-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: batch-app
  updatePolicy:
    updateMode: "Recreate"  # Deletes and recreates pods
  resourcePolicy:
    containerPolicies:
    - containerName: app
      controlledResources: ["cpu", "memory"]
---
# UpdateMode: Initial
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: initial-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: stable-app
  updatePolicy:
    updateMode: "Initial"  # Only sets resources on pod creation
  resourcePolicy:
    containerPolicies:
    - containerName: app
      controlledResources: ["cpu", "memory"]
```

### VPA Best Practices

#### VPA with HPA Compatibility

```yaml
# Configure VPA for memory only
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: memory-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      controlledResources: ["memory"]  # Let HPA handle CPU
      controlledValues: "RequestsOnly"
---
# Configure HPA for CPU only
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cpu-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
```

## Cluster Autoscaling

### Konsep Cluster Autoscaler

Cluster Autoscaler (CA) menyesuaikan jumlah node dalam cluster berdasarkan resource requirements yang tidak dapat dipenuhi oleh node yang ada.

#### Cluster Autoscaler Flow

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Unschedulable │───▶│   Cluster Auto-  │───▶│   Cloud Provider│
│     Pods        │    │    Scaler        │    │   API           │
│                 │    │                  │    │                 │
│ • Insufficient  │    │ • Evaluate Need  │    │ • Create Nodes  │
│   Resources     │    │ • Check Limits   │    │ • Delete Nodes  │
│ • Node Taints   │    │ • Scale Up/Down  │    │ • Update Config │
└─────────────────┘    └──────────────────┘    └─────────────────┘
```

### Cluster Autoscaler Configuration

#### AWS EKS Cluster Autoscaler

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eks-cluster-name
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
```

#### GKE Cluster Autoscaler

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=gce
        - --gke-config-file=/etc/config/config
        - --node-group-auto-discovery=zones
        - --max-nodes-total=100
        - --balance-similar-node-groups
        volumeMounts:
        - name: config
          mountPath: /etc/config
      volumes:
      - name: config
        configMap:
          name: ca-config
```

### Node Groups Configuration

#### AWS Auto Scaling Groups

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-status
  namespace: kube-system
data:
  nodes.max: "100"
  nodes.min: "1"
  scale-up-delay: "3m"
  scale-down-delay: "10m"
---
# Node Groups with Different Instance Types
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-expander-config
  namespace: kube-system
data:
  expander priorities: |
    10: node-group-optimized-for-cpu  # CPU optimized instances
    20: node-group-balanced           # General purpose instances
    30: node-group-memory-optimized   # Memory optimized instances
```

#### GKE Node Pools

```bash
# Create node pools with different machine types
gcloud container node-pools create cpu-optimized-pool \
  --cluster=production-cluster \
  --machine-type=c2-standard-4 \
  --num-nodes=1 \
  --max-nodes=10 \
  --min-nodes=0 \
  --enable-autoscaling \
  --node-labels=type=cpu-optimized,pool=cpu-optimized-pool \
  --node-taints=type=cpu-optimized:NoSchedule

gcloud container node-pools create memory-optimized-pool \
  --cluster=production-cluster \
  --machine-type=n2-highmem-8 \
  --num-nodes=1 \
  --max-nodes=5 \
  --min-nodes=0 \
  --enable-autoscaling \
  --node-labels=type=memory-optimized,pool=memory-optimized-pool \
  --node-taints=type=memory-optimized:NoSchedule
```

### Advanced Scaling Strategies

#### Predictive Autoscaling

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: predictive-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: External
    external:
      metric:
        name: predicted_cpu_utilization
      target:
        type: Value
        value: "70"
  behavior:
    scaleUp:
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      policies:
      - type: Percent
        value: 20
        periodSeconds: 300
      selectPolicy: Min
```

#### Multi-Metric Scaling

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: multi-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 100
  metrics:
  # CPU metrics
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  # Memory metrics
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  # Custom application metrics
  - type: Pods
    pods:
      metric:
        name: active_connections
      target:
        type: AverageValue
        averageValue: "1000"
  # Queue depth
  - type: External
    external:
      metric:
        name: redis_queue_length
      target:
        type: AverageValue
        averageValue: "100"
  # Response time
  - type: External
    external:
      metric:
        name: http_response_time_p95
      target:
        type: Value
        value: "500"
```

## Scaling Monitoring dan Troubleshooting

### HPA Monitoring

#### HPA Status Check

```bash
# Check HPA status
kubectl get hpa
kubectl describe hpa webapp-hpa

# Check HPA events
kubectl get events --field-selector involvedObject.name=webapp-hpa

# Check current metrics
kubectl top pods -l app=webapp
kubectl top nodes
```

#### HPA Metrics Analysis

```bash
# Get detailed HPA metrics
kubectl get --raw "/apis/autoscaling/v2/namespaces/production/horizontalpodautoscalers/webapp-hpa" | jq '.status'

# Check target utilization
kubectl get hpa webapp-hpa -o jsonpath='{.status.currentMetrics[0].resource.current.averageUtilization}'

# View scaling events
kubectl get events --sort-by='.lastTimestamp' | grep "HorizontalPodAutoscaler"
```

### VPA Monitoring

#### VPA Status dan Recommendations

```bash
# Check VPA status
kubectl get vpa
kubectl describe vpa webapp-vpa

# Get VPA recommendations
kubectl get vpa webapp-vpa -o jsonpath='{.status.recommendation.containerRecommendations[0]}'

# Check VPA events
kubectl get events --field-selector involvedObject.name=webapp-vpa
```

#### VPA Update Analysis

```bash
# View VPA update history
kubectl get vpa webapp-vpa -o yaml | grep -A 10 "recommendation"

# Check if pods were updated
kubectl get pods -l app=webapp -o wide
kubectl describe pod <pod-name> | grep -A 10 "Limits"
```

### Cluster Autoscaler Monitoring

#### CA Status dan Logs

```bash
# Check cluster autoscaler pod
kubectl get pods -n kube-system -l app=cluster-autoscaler
kubectl logs -n kube-system -l app=cluster-autoscaler

# Check CA events
kubectl get events --sort-by='.lastTimestamp' | grep "ClusterAutoscaler"

# Check node groups
kubectl get nodes --label-columns=node.kubernetes.io/instance-type
```

#### CA Scaling Events

```bash
# Monitor scaling events
kubectl get events --field-selector reason=TriggeredScaleUp
kubectl get events --field-selector reason=TriggeredScaleDown

# Check node utilization
kubectl top nodes
kubectl describe node <node-name> | grep -A 10 "Allocated resources"
```

### Troubleshooting Scaling Issues

#### Common HPA Problems

```bash
# HPA not scaling - check metrics
kubectl describe hpa <hpa-name>

# Missing metrics server
kubectl get pods -n kube-system -l k8s-app=metrics-server
kubectl logs -n kube-system -l k8s-app=metrics-server

# Resource requests missing
kubectl describe deployment <deployment-name> | grep -A 10 "Requests"
```

#### VPA Issues

```bash
# VPA not updating pods
kubectl describe vpa <vpa-name> | grep -A 10 "Conditions"

# Check if pods are evicted
kubectl get events | grep "Evicted"

# VPA conflicts with HPA
kubectl get hpa, vpa
```

#### Cluster Autoscaler Issues

```bash
# Cluster autoscaler not working
kubectl logs -n kube-system -l app=cluster-autoscaler | grep -i error

# Node group limits reached
kubectl get nodes
kubectl describe nodes

# Insufficient quotas
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names <asg-name>
```

## Scaling Best Practices

### Resource Management

#### Proper Resource Requests

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:v1.0.0
        resources:
          requests:
            cpu: 200m      # Based on benchmarking
            memory: 256Mi  # Based on memory profiling
          limits:
            cpu: 1000m     # Reasonable limit
            memory: 512Mi  # 2x requests is good practice
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
```

#### Quality of Service Classes

```yaml
# Burstable QoS (most common)
resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi
---
# Guaranteed QoS (requests = limits)
resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 200m
    memory: 256Mi
---
# Best-Effort QoS (no requests/limits - avoid for production)
resources: {}  # Not recommended for critical services
```

### Scaling Strategy

#### Progressive Scaling

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: progressive-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50    # Scale up by max 50%
        periodSeconds: 60
      - type: Pods
        value: 5     # Or add max 5 pods
        periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25    # Scale down slower
        periodSeconds: 300
      selectPolicy: Min
```

#### Scaling Priorities

```yaml
# High priority services (critical path)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
spec:
  minReplicas: 5
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Lower target for headroom
---
# Background services
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-hpa
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80  # Higher target for efficiency
```

### Performance Optimization

#### Scaling Efficiency

```yaml
# Pod disruption budget for graceful scaling
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: myapp
---
# Readiness gates for traffic routing
apiVersion: v1
kind: Pod
metadata:
  name: app-pod
spec:
  containers:
  - name: app
    image: myapp:v1.0.0
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
  readinessGates:
  - conditionType: target-traffic-ready
```

#### Cost Optimization

```yaml
# Node selectors for cost-optimized instances
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processing
spec:
  template:
    spec:
      nodeSelector:
        node-type: spot-instance
      tolerations:
      - key: "spot-instance"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      containers:
      - name: batch-worker
        image: batch-worker:v1.0.0
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 4Gi
```

***

## 🚀 **Production Deployment**

### Production Scaling Setup

#### Complete Production HPA

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: production-api-hpa
  namespace: production
  labels:
    app: production-api
    env: production
    tier: backend
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: production-api
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 65
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 75
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "500"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 10
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      selectPolicy: Min
```

#### Production VPA

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: production-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: production-api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi
      controlledResources: ["memory"]  # Let HPA handle CPU
      controlledValues: "RequestsOnly"
```

***

## 📚 **Resources dan Referensi**

### Dokumentasi Official

* [Kubernetes Horizontal Pod Autoscaling](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
* [Kubernetes Vertical Pod Autoscaling](https://kubernetes.io/docs/tasks/run-application/vertical-pod-autoscaling/)
* [Cluster Autoscaler Documentation](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)

### Advanced Reading

* [Kubernetes Autoscaling Patterns](https://kubernetes.io/docs/concepts/cluster-administration/cluster-management/)
* [Custom Metrics Adapter](https://github.com/kubernetes-sigs/custom-metrics-apiserver)
* [Prometheus Adapter](https://github.com/kubernetes-sigs/prometheus-adapter)

### Cheatsheet Summary

```bash
# HPA Commands
kubectl get hpa
kubectl describe hpa <name>
kubectl autoscale deployment <name> --cpu-percent=70 --min=3 --max=10

# VPA Commands
kubectl get vpa
kubectl describe vpa <name>
kubectl get vpa <name> -o jsonpath='{.status.recommendation}'

# Monitoring
kubectl top pods
kubectl top nodes
kubectl get events --sort-by='.lastTimestamp'
```

Scaling documentation siap digunakan! 🎉
