# Catatan Seekor: CAG (Cache Augmented Generation)

## 📚 Overview

Cache Augmented Generation (CAG) adalah teknik yang menggabungkan caching mechanism dengan generative AI untuk meningkatkan performance, mengurangi latency, dan mengoptimalkan resource usage. CAG memungkinkan sistem AI untuk menyimpan dan menggunakan hasil generasi sebelumnya.

## 🎯 How CAG Works

### 1. **Cache Storage Phase**

* Hasil generasi disimpan dalam cache dengan key yang unik
* Metadata disimpan untuk tracking dan management
* Cache policies menentukan retention dan eviction strategies

### 2. **Cache Lookup Phase**

* Input baru diproses untuk generate cache key
* Sistem mencari hasil yang sudah ada di cache
* Similarity matching digunakan untuk fuzzy cache hits

### 3. **Generation Phase**

* Jika cache hit: return cached result
* Jika cache miss: generate new result dan store di cache

## 🏗️ CAG Architecture

```
User Input → Cache Key Generation → Cache Lookup → Cache Hit? → Yes → Return Cached Result
                ↓                           ↓           ↓
            Input Processing         Cache Search      No
                ↓                           ↓           ↓
            Feature Extraction      Similarity Match   Generate New Result
                ↓                           ↓           ↓
            Hash Generation         Fuzzy Matching     Store in Cache
                ↓                           ↓           ↓
            Cache Key               Cache Hit/Miss     Return New Result
```

## 🛠️ Implementation Examples

### Basic CAG Implementation

```python
import hashlib
import json
import redis
from typing import Any, Dict, Optional

class BasicCAG:
    def __init__(self, cache_client=None):
        self.cache_client = cache_client or redis.Redis(host='localhost', port=6379, db=0)
        self.cache_ttl = 3600  # 1 hour
        
    def generate_cache_key(self, input_data: Dict[str, Any]) -> str:
        """Generate a unique cache key for input data."""
        sorted_data = json.dumps(input_data, sort_keys=True)
        return hashlib.md5(sorted_data.encode()).hexdigest()
    
    def get_cached_result(self, cache_key: str) -> Optional[Dict[str, Any]]:
        """Retrieve cached result if exists."""
        try:
            cached_data = self.cache_client.get(cache_key)
            if cached_data:
                return json.loads(cached_data)
        except Exception as e:
            print(f"Cache retrieval error: {e}")
        return None
    
    def store_in_cache(self, cache_key: str, result: Dict[str, Any]) -> None:
        """Store result in cache with TTL."""
        try:
            self.cache_client.setex(
                cache_key,
                self.cache_ttl,
                json.dumps(result)
            )
        except Exception as e:
            print(f"Cache storage error: {e}")
    
    def generate_response(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
        """Generate response with caching."""
        cache_key = self.generate_cache_key(input_data)
        
        # Try to get cached result
        cached_result = self.get_cached_result(cache_key)
        if cached_result:
            return {
                "result": cached_result,
                "source": "cache",
                "cache_key": cache_key
            }
        
        # Generate new result
        new_result = self._generate_new_result(input_data)
        
        # Store in cache
        self.store_in_cache(cache_key, new_result)
        
        return {
            "result": new_result,
            "source": "generation",
            "cache_key": cache_key
        }
    
    def _generate_new_result(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
        """Simulate AI generation process."""
        return {
            "generated_text": f"Generated response for: {input_data.get('query', 'unknown')}",
            "timestamp": str(datetime.now()),
            "model_version": "1.0"
        }
```

### Advanced CAG with Similarity Matching

```python
from sentence_transformers import SentenceTransformer
import numpy as np

class AdvancedCAG:
    def __init__(self, cache_client=None, similarity_threshold=0.85):
        self.cache_client = cache_client or redis.Redis(host='localhost', port=6379, db=0)
        self.similarity_threshold = similarity_threshold
        self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
        self.cache_ttl = 3600
        
    def get_similar_cached_result(self, input_data: Dict[str, Any]) -> Optional[Dict[str, Any]]:
        """Find similar cached results using embedding similarity."""
        try:
            # Generate embedding for input
            input_text = self._extract_text(input_data)
            input_embedding = self.embedding_model.encode([input_text])[0]
            
            # Get all cache keys
            all_keys = self.cache_client.keys("cag:*")
            
            best_match = None
            best_similarity = 0
            
            for key in all_keys[:100]:  # Limit search for performance
                try:
                    cached_data = self.cache_client.get(key)
                    if cached_data:
                        cached_item = json.loads(cached_data)
                        cached_text = self._extract_text(cached_item.get('input_data', {}))
                        
                        if cached_text:
                            cached_embedding = self.embedding_model.encode([cached_text])[0]
                            similarity = self._cosine_similarity(input_embedding, cached_embedding)
                            
                            if similarity > best_similarity and similarity >= self.similarity_threshold:
                                best_similarity = similarity
                                best_match = {
                                    "result": cached_item.get('result'),
                                    "similarity": similarity,
                                    "cache_key": key.decode()
                                }
                except Exception as e:
                    continue
            
            return best_match
            
        except Exception as e:
            print(f"Similarity search error: {e}")
            return None
    
    def _extract_text(self, data: Dict[str, Any]) -> str:
        """Extract text content from input data."""
        if isinstance(data, dict):
            text_parts = []
            for key, value in data.items():
                if isinstance(value, str):
                    text_parts.append(value)
                elif isinstance(value, dict):
                    text_parts.append(self._extract_text(value))
            return " ".join(text_parts)
        elif isinstance(data, str):
            return data
        return ""
    
    def _cosine_similarity(self, vec1: np.ndarray, vec2: np.ndarray) -> float:
        """Calculate cosine similarity between two vectors."""
        return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
```

## 📊 Cache Management Strategies

### TTL Management

```python
class CacheManager:
    def __init__(self, cache_client):
        self.cache_client = cache_client
        
    def set_with_ttl(self, key: str, value: Any, ttl: int) -> None:
        """Set cache with TTL."""
        self.cache_client.setex(key, ttl, json.dumps(value))
    
    def get_with_ttl(self, key: str) -> Optional[Any]:
        """Get cache value and remaining TTL."""
        try:
            value = self.cache_client.get(key)
            ttl = self.cache_client.ttl(key)
            
            if value:
                return {
                    "value": json.loads(value),
                    "ttl": ttl
                }
        except Exception as e:
            print(f"Cache operation error: {e}")
        return None
```

### Cache Eviction Policies

```python
class CacheEvictionManager:
    def __init__(self, cache_client, max_size: int = 10000):
        self.cache_client = cache_client
        self.max_size = max_size
        
    def lru_eviction(self) -> None:
        """Implement LRU eviction policy."""
        try:
            keys = self.cache_client.keys("cag:*")
            if len(keys) > self.max_size:
                # Sort by access time and remove oldest
                key_times = []
                for key in keys:
                    try:
                        access_time = self.cache_client.object('idletime', key)
                        key_times.append((key, access_time))
                    except:
                        continue
                
                key_times.sort(key=lambda x: x[1])
                keys_to_remove = key_times[:len(keys) - self.max_size]
                
                for key, _ in keys_to_remove:
                    self.cache_client.delete(key)
                    
        except Exception as e:
            print(f"LRU eviction error: {e}")
```

## 🚀 Performance Optimization

### Async CAG Implementation

```python
import asyncio
import aioredis

class AsyncCAG:
    def __init__(self, redis_url: str = "redis://localhost"):
        self.redis_url = redis_url
        
    async def initialize(self):
        """Initialize async Redis connection."""
        self.redis = await aioredis.from_url(self.redis_url)
        
    async def generate_responses_async(self, prompts: List[str]) -> List[Dict[str, Any]]:
        """Generate responses for multiple prompts asynchronously."""
        tasks = []
        for prompt in prompts:
            task = self.generate_single_response_async(prompt)
            tasks.append(task)
        
        results = await asyncio.gather(*tasks, return_exceptions=True)
        return results
    
    async def generate_single_response_async(self, prompt: str) -> Dict[str, Any]:
        """Generate response for single prompt asynchronously."""
        try:
            cache_key = f"async_cag:{hashlib.md5(prompt.encode()).hexdigest()}"
            cached_result = await self.redis.get(cache_key)
            
            if cached_result:
                return {
                    "response": json.loads(cached_result),
                    "source": "cache",
                    "prompt": prompt
                }
            
            # Generate new response
            response = await self._generate_async(prompt)
            
            # Store in cache
            await self.redis.setex(cache_key, 3600, json.dumps(response))
            
            return {
                "response": response,
                "source": "generation",
                "prompt": prompt
            }
            
        except Exception as e:
            return {
                "error": str(e),
                "prompt": prompt
            }
```

## 📊 Monitoring & Analytics

### Cache Performance Metrics

```python
class CacheAnalytics:
    def __init__(self, cache_client):
        self.cache_client = cache_client
        
    def get_cache_stats(self) -> Dict[str, Any]:
        """Get comprehensive cache statistics."""
        try:
            info = self.cache_client.info()
            
            cag_keys = self.cache_client.keys("cag:*")
            openai_keys = self.cache_client.keys("openai_cag:*")
            
            stats = {
                "total_keys": len(cag_keys) + len(openai_keys),
                "cag_keys": len(cag_keys),
                "openai_keys": len(openai_keys),
                "memory_usage": info.get("used_memory_human", "N/A"),
                "cache_size_mb": info.get("used_memory", 0) / (1024 * 1024)
            }
            
            return stats
            
        except Exception as e:
            print(f"Cache stats error: {e}")
            return {}
```

## 🔒 Security & Best Practices

### Input Validation

```python
import re

class SecurityManager:
    @staticmethod
    def validate_input(input_data: Dict[str, Any]) -> bool:
        """Validate input data for security."""
        try:
            input_str = json.dumps(input_data)
            
            # Check for dangerous patterns
            dangerous_patterns = [
                r"(\b(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER)\b)",
                r"<script[^>]*>",
                r"javascript:"
            ]
            
            for pattern in dangerous_patterns:
                if re.search(pattern, input_str, re.IGNORECASE):
                    return False
            
            return True
            
        except Exception as e:
            print(f"Input validation error: {e}")
            return False
```

### Rate Limiting

```python
import time
from collections import defaultdict

class RateLimiter:
    def __init__(self, max_requests: int = 100, time_window: int = 3600):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = defaultdict(list)
    
    def is_allowed(self, user_id: str) -> bool:
        """Check if user is allowed to make request."""
        current_time = time.time()
        
        # Clean old requests
        self.requests[user_id] = [
            req_time for req_time in self.requests[user_id]
            if current_time - req_time < self.time_window
        ]
        
        # Check if under limit
        if len(self.requests[user_id]) < self.max_requests:
            self.requests[user_id].append(current_time)
            return True
        
        return False
```

## 📚 References & Resources

### 📖 Research Papers

* [**"Cache-Augmented Language Models"**](https://arxiv.org/abs/2012.11926) - Research on caching in language models
* [**"Efficient Memory Management for Large Language Models"**](https://arxiv.org/abs/2201.05596) - Memory optimization techniques

### 🛠️ Tools & Libraries

* [**Redis**](https://redis.io/) - In-memory data structure store
* [**aioredis**](https://github.com/aio-libs/aioredis) - Async Redis client
* [**Sentence Transformers**](https://github.com/UKPLab/sentence-transformers) - Text embeddings

### 📰 Articles & Blogs

* [**"Redis Caching Best Practices"**](https://redis.io/docs/manual/patterns/distributed-locks/) - Redis documentation
* [**"Caching Strategies for AI Applications"**](https://aws.amazon.com/caching/) - AWS caching guide

## 🔗 Related Topics

* [🧠 ML Fundamentals](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/fundamentals)
* [🔍 RAG Systems](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/catatan-seekor-rag)
* [🤖 OpenAI Integration](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/catatan-seekor-open-ai)
* [🎯 Fine-tuning](https://mahbubzulkarnain.gitbook.io/catatan-seekor-the-series/machine-learning/catatan-seekor-fine-tunning)

***

*Last updated: December 2024* *Contributors: \[Your Name]*