🧠Catatan Seekor: CAG (Cache Augmented Generation)

📚 Overview

Cache Augmented Generation (CAG) adalah teknik yang menggabungkan caching mechanism dengan generative AI untuk meningkatkan performance, mengurangi latency, dan mengoptimalkan resource usage. CAG memungkinkan sistem AI untuk menyimpan dan menggunakan hasil generasi sebelumnya.

🎯 How CAG Works

1. Cache Storage Phase

Hasil generasi disimpan dalam cache dengan key yang unik
Metadata disimpan untuk tracking dan management
Cache policies menentukan retention dan eviction strategies

2. Cache Lookup Phase

Input baru diproses untuk generate cache key
Sistem mencari hasil yang sudah ada di cache
Similarity matching digunakan untuk fuzzy cache hits

3. Generation Phase

Jika cache hit: return cached result
Jika cache miss: generate new result dan store di cache

🏗️ CAG Architecture

User Input → Cache Key Generation → Cache Lookup → Cache Hit? → Yes → Return Cached Result
                ↓                           ↓           ↓
            Input Processing         Cache Search      No
                ↓                           ↓           ↓
            Feature Extraction      Similarity Match   Generate New Result
                ↓                           ↓           ↓
            Hash Generation         Fuzzy Matching     Store in Cache
                ↓                           ↓           ↓
            Cache Key               Cache Hit/Miss     Return New Result

🛠️ Implementation Examples

Basic CAG Implementation

import hashlib
import json
import redis
from typing import Any, Dict, Optional

class BasicCAG:
    def __init__(self, cache_client=None):
        self.cache_client = cache_client or redis.Redis(host='localhost', port=6379, db=0)
        self.cache_ttl = 3600  # 1 hour
        
    def generate_cache_key(self, input_data: Dict[str, Any]) -> str:
        """Generate a unique cache key for input data."""
        sorted_data = json.dumps(input_data, sort_keys=True)
        return hashlib.md5(sorted_data.encode()).hexdigest()
    
    def get_cached_result(self, cache_key: str) -> Optional[Dict[str, Any]]:
        """Retrieve cached result if exists."""
        try:
            cached_data = self.cache_client.get(cache_key)
            if cached_data:
                return json.loads(cached_data)
        except Exception as e:
            print(f"Cache retrieval error: {e}")
        return None
    
    def store_in_cache(self, cache_key: str, result: Dict[str, Any]) -> None:
        """Store result in cache with TTL."""
        try:
            self.cache_client.setex(
                cache_key,
                self.cache_ttl,
                json.dumps(result)
            )
        except Exception as e:
            print(f"Cache storage error: {e}")
    
    def generate_response(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
        """Generate response with caching."""
        cache_key = self.generate_cache_key(input_data)
        
        # Try to get cached result
        cached_result = self.get_cached_result(cache_key)
        if cached_result:
            return {
                "result": cached_result,
                "source": "cache",
                "cache_key": cache_key
            }
        
        # Generate new result
        new_result = self._generate_new_result(input_data)
        
        # Store in cache
        self.store_in_cache(cache_key, new_result)
        
        return {
            "result": new_result,
            "source": "generation",
            "cache_key": cache_key
        }
    
    def _generate_new_result(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
        """Simulate AI generation process."""
        return {
            "generated_text": f"Generated response for: {input_data.get('query', 'unknown')}",
            "timestamp": str(datetime.now()),
            "model_version": "1.0"
        }

Advanced CAG with Similarity Matching

from sentence_transformers import SentenceTransformer
import numpy as np

class AdvancedCAG:
    def __init__(self, cache_client=None, similarity_threshold=0.85):
        self.cache_client = cache_client or redis.Redis(host='localhost', port=6379, db=0)
        self.similarity_threshold = similarity_threshold
        self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
        self.cache_ttl = 3600
        
    def get_similar_cached_result(self, input_data: Dict[str, Any]) -> Optional[Dict[str, Any]]:
        """Find similar cached results using embedding similarity."""
        try:
            # Generate embedding for input
            input_text = self._extract_text(input_data)
            input_embedding = self.embedding_model.encode([input_text])[0]
            
            # Get all cache keys
            all_keys = self.cache_client.keys("cag:*")
            
            best_match = None
            best_similarity = 0
            
            for key in all_keys[:100]:  # Limit search for performance
                try:
                    cached_data = self.cache_client.get(key)
                    if cached_data:
                        cached_item = json.loads(cached_data)
                        cached_text = self._extract_text(cached_item.get('input_data', {}))
                        
                        if cached_text:
                            cached_embedding = self.embedding_model.encode([cached_text])[0]
                            similarity = self._cosine_similarity(input_embedding, cached_embedding)
                            
                            if similarity > best_similarity and similarity >= self.similarity_threshold:
                                best_similarity = similarity
                                best_match = {
                                    "result": cached_item.get('result'),
                                    "similarity": similarity,
                                    "cache_key": key.decode()
                                }
                except Exception as e:
                    continue
            
            return best_match
            
        except Exception as e:
            print(f"Similarity search error: {e}")
            return None
    
    def _extract_text(self, data: Dict[str, Any]) -> str:
        """Extract text content from input data."""
        if isinstance(data, dict):
            text_parts = []
            for key, value in data.items():
                if isinstance(value, str):
                    text_parts.append(value)
                elif isinstance(value, dict):
                    text_parts.append(self._extract_text(value))
            return " ".join(text_parts)
        elif isinstance(data, str):
            return data
        return ""
    
    def _cosine_similarity(self, vec1: np.ndarray, vec2: np.ndarray) -> float:
        """Calculate cosine similarity between two vectors."""
        return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

📊 Cache Management Strategies

TTL Management

class CacheManager:
    def __init__(self, cache_client):
        self.cache_client = cache_client
        
    def set_with_ttl(self, key: str, value: Any, ttl: int) -> None:
        """Set cache with TTL."""
        self.cache_client.setex(key, ttl, json.dumps(value))
    
    def get_with_ttl(self, key: str) -> Optional[Any]:
        """Get cache value and remaining TTL."""
        try:
            value = self.cache_client.get(key)
            ttl = self.cache_client.ttl(key)
            
            if value:
                return {
                    "value": json.loads(value),
                    "ttl": ttl
                }
        except Exception as e:
            print(f"Cache operation error: {e}")
        return None

Cache Eviction Policies

class CacheEvictionManager:
    def __init__(self, cache_client, max_size: int = 10000):
        self.cache_client = cache_client
        self.max_size = max_size
        
    def lru_eviction(self) -> None:
        """Implement LRU eviction policy."""
        try:
            keys = self.cache_client.keys("cag:*")
            if len(keys) > self.max_size:
                # Sort by access time and remove oldest
                key_times = []
                for key in keys:
                    try:
                        access_time = self.cache_client.object('idletime', key)
                        key_times.append((key, access_time))
                    except:
                        continue
                
                key_times.sort(key=lambda x: x[1])
                keys_to_remove = key_times[:len(keys) - self.max_size]
                
                for key, _ in keys_to_remove:
                    self.cache_client.delete(key)
                    
        except Exception as e:
            print(f"LRU eviction error: {e}")

🚀 Performance Optimization

Async CAG Implementation

import asyncio
import aioredis

class AsyncCAG:
    def __init__(self, redis_url: str = "redis://localhost"):
        self.redis_url = redis_url
        
    async def initialize(self):
        """Initialize async Redis connection."""
        self.redis = await aioredis.from_url(self.redis_url)
        
    async def generate_responses_async(self, prompts: List[str]) -> List[Dict[str, Any]]:
        """Generate responses for multiple prompts asynchronously."""
        tasks = []
        for prompt in prompts:
            task = self.generate_single_response_async(prompt)
            tasks.append(task)
        
        results = await asyncio.gather(*tasks, return_exceptions=True)
        return results
    
    async def generate_single_response_async(self, prompt: str) -> Dict[str, Any]:
        """Generate response for single prompt asynchronously."""
        try:
            cache_key = f"async_cag:{hashlib.md5(prompt.encode()).hexdigest()}"
            cached_result = await self.redis.get(cache_key)
            
            if cached_result:
                return {
                    "response": json.loads(cached_result),
                    "source": "cache",
                    "prompt": prompt
                }
            
            # Generate new response
            response = await self._generate_async(prompt)
            
            # Store in cache
            await self.redis.setex(cache_key, 3600, json.dumps(response))
            
            return {
                "response": response,
                "source": "generation",
                "prompt": prompt
            }
            
        except Exception as e:
            return {
                "error": str(e),
                "prompt": prompt
            }

📊 Monitoring & Analytics

Cache Performance Metrics

class CacheAnalytics:
    def __init__(self, cache_client):
        self.cache_client = cache_client
        
    def get_cache_stats(self) -> Dict[str, Any]:
        """Get comprehensive cache statistics."""
        try:
            info = self.cache_client.info()
            
            cag_keys = self.cache_client.keys("cag:*")
            openai_keys = self.cache_client.keys("openai_cag:*")
            
            stats = {
                "total_keys": len(cag_keys) + len(openai_keys),
                "cag_keys": len(cag_keys),
                "openai_keys": len(openai_keys),
                "memory_usage": info.get("used_memory_human", "N/A"),
                "cache_size_mb": info.get("used_memory", 0) / (1024 * 1024)
            }
            
            return stats
            
        except Exception as e:
            print(f"Cache stats error: {e}")
            return {}

🔒 Security & Best Practices

Input Validation

import re

class SecurityManager:
    @staticmethod
    def validate_input(input_data: Dict[str, Any]) -> bool:
        """Validate input data for security."""
        try:
            input_str = json.dumps(input_data)
            
            # Check for dangerous patterns
            dangerous_patterns = [
                r"(\b(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER)\b)",
                r"<script[^>]*>",
                r"javascript:"
            ]
            
            for pattern in dangerous_patterns:
                if re.search(pattern, input_str, re.IGNORECASE):
                    return False
            
            return True
            
        except Exception as e:
            print(f"Input validation error: {e}")
            return False

Rate Limiting

import time
from collections import defaultdict

class RateLimiter:
    def __init__(self, max_requests: int = 100, time_window: int = 3600):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = defaultdict(list)
    
    def is_allowed(self, user_id: str) -> bool:
        """Check if user is allowed to make request."""
        current_time = time.time()
        
        # Clean old requests
        self.requests[user_id] = [
            req_time for req_time in self.requests[user_id]
            if current_time - req_time < self.time_window
        ]
        
        # Check if under limit
        if len(self.requests[user_id]) < self.max_requests:
            self.requests[user_id].append(current_time)
            return True
        
        return False

📚 References & Resources

📖 Research Papers

"Cache-Augmented Language Models" - Research on caching in language models
"Efficient Memory Management for Large Language Models" - Memory optimization techniques

🛠️ Tools & Libraries

Redis - In-memory data structure store
aioredis - Async Redis client
Sentence Transformers - Text embeddings

📰 Articles & Blogs

"Redis Caching Best Practices" - Redis documentation
"Caching Strategies for AI Applications" - AWS caching guide

Last updated: December 2024 Contributors: [Your Name]

Last updated 5 months ago

hashtag📚 Overview

hashtag🎯 How CAG Works

hashtag1. Cache Storage Phase

hashtag2. Cache Lookup Phase

hashtag3. Generation Phase

hashtag🏗️ CAG Architecture

hashtag🛠️ Implementation Examples

hashtagBasic CAG Implementation

hashtagAdvanced CAG with Similarity Matching

hashtag📊 Cache Management Strategies

hashtagTTL Management

hashtagCache Eviction Policies

hashtag🚀 Performance Optimization

hashtagAsync CAG Implementation

hashtag📊 Monitoring & Analytics

hashtagCache Performance Metrics

hashtag🔒 Security & Best Practices

hashtagInput Validation

hashtagRate Limiting

hashtag📚 References & Resources

hashtag📖 Research Papers

hashtag🛠️ Tools & Libraries

hashtag📰 Articles & Blogs

hashtag🔗 Related Topics