# Semgrep

> **Static Analysis for Finding Security Vulnerabilities**

## 📋 Overview

Semgrep adalah open-source static analysis tool yang digunakan untuk menemukan bugs, security vulnerabilities, dan code quality issues. Semgrep menggunakan pattern matching untuk menganalisis source code tanpa perlu kompilasi.

## 🎯 Key Features

### 🐍 **Multi-Language Support**

* **Go**, **Java**, **JavaScript**, **TypeScript**, **Python**
* **Ruby**, **PHP**, **C**, **C++**, **C#**, **Swift**
* **Kotlin**, **Scala**, **Lisp**, **Lua**, **Dockerfile**
* **Terraform**, **YAML**, **JSON**, **Shell scripts**

### 🔍 **Detection Capabilities**

* **Security Vulnerabilities** - OWASP Top 10, CVEs, zero-days
* **Bug Detection** - Null pointer dereference, race conditions
* **Code Quality** - Maintainability, complexity, style issues
* **Secrets & Credentials** - API keys, passwords, tokens
* **Business Logic** - Hardcoded values, improper validation

### ⚡ **Performance**

* **Fast Scanning** - Ratusan ribu baris per menit
* **Incremental** - Hanya scan perubahan terakhir
* **CI/CD Integration** - GitHub Actions, GitLab CI, Jenkins
* **Parallel Processing** - Multi-core utilization

## 🚀 Installation

### Using Package Managers

```bash
# Homebrew (macOS)
brew install semgrep

# pip (Python 3.7+)
pip install semgrep

# Docker
docker run --rm -v "${PWD}:/src" returntocorp/semgrep
```

### From Source

```bash
# Clone repository
git clone https://github.com/returntocorp/semgrep
cd semgrep

# Install dependencies
pip install -e .
```

## 🔧 Basic Usage

### Scan Entire Project

```bash
# Basic scan
semgrep --config=auto .

# Auto-detect language and rules
semgrep --config=auto src/

# Scan specific files
semgrep --config=auto app/**/*.py
```

### Using Rule Sets

```bash
# OWASP Top 10
semgrep --config=p/owasp-top-ten .

# Security rules
semgrep --config=p/security .

# Performance rules
semgrep --config=p/performance .

# Custom rules directory
semgrep --config=./rules/ .
```

### Output Formats

```bash
# Default (human-readable)
semgrep --config=auto .

# JSON output
semgrep --config=auto --json --output=results.json .

# SARIF format (for GitHub Security tab)
semgrep --config=auto --sarif --output=results.sarif .

# JUnit XML
semgrep --config=auto --junit-xml --output=junit.xml .
```

## 📝 Writing Custom Rules

### Rule Structure

```yaml
# rules/insecure-hardcoded-secret.yaml
rules:
  - id: insecure-hardcoded-secret
    message: |
      Potential hardcoded secret detected. Use environment variables or secret management.
    languages: [python, javascript, go]
    severity: ERROR
    pattern: |
      $VAR = "..."
    metavariables:
      $VAR:
        regex: "(?i)(password|secret|token|key|api_key)"
```

### Advanced Patterns

```yaml
# rules/sql-injection.yaml
rules:
  - id: sql-injection-string-format
    message: SQL injection vulnerability: using string formatting in SQL queries
    languages: [python]
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: flask.request.$FUNC.get(...)
      - pattern: flask.request.$FUNC.form.get(...)
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
    pattern-sanitizers:
      - pattern: sqlalchemy.text(...)
```

### Template Rules

```yaml
# rules/insecure-function.yaml
rules:
  - id: insecure-function
    message: Using deprecated/insecure function
    languages: [python]
    severity: WARNING
    patterns:
      - pattern-not: safe_$FUNC(...)
      - pattern: $FUNC(...)
      - metavariable-regex:
          metavariable: $FUNC
          regex: "(eval|exec|subprocess|os\\.system)"
```

## 🔗 CI/CD Integration

### GitHub Actions

```yaml
# .github/workflows/semgrep.yml
name: Semgrep
on: [push, pull_request]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: returntocorp/semgrep-action@v1
        with:
          config: >-
            p/owasp-top-ten
            p/security
            p/secrets
```

### GitLab CI

```yaml
# .gitlab-ci.yml
semgrep:
  stage: test
  image: returntocorp/semgrep
  script:
    - semgrep --config=auto --json --output=gl-sast-report.json .
  artifacts:
    reports:
      sast: gl-sast-report.json
```

### Jenkins Pipeline

```groovy
pipeline {
    agent any
    stages {
        stage('Semgrep') {
            steps {
                script {
                    sh 'semgrep --config=auto --sarif --output=semgrep.sarif .'
                    publishHTML([
                        allowMissing: false,
                        alwaysLinkToLastBuild: true,
                        keepAll: true,
                        reportDir: '.',
                        reportFiles: 'semgrep.sarif',
                        reportName: 'Semgrep Report'
                    ])
                }
            }
        }
    }
}
```

## 📊 Rule Categories

### Security Rules (`p/security`)

* **Authentication** - Weak authentication, session management
* **Authorization** - Access control bypass, privilege escalation
* **Cryptographic Issues** - Weak algorithms, improper key handling
* **Injection** - SQL, NoSQL, command injection vulnerabilities
* **XSS** - Cross-site scripting vulnerabilities

### OWASP Top 10 (`p/owasp-top-ten`)

* **A01 Broken Access Control**
* **A02 Cryptographic Failures**
* **A03 Injection**
* **A04 Insecure Design**
* **A05 Security Misconfiguration**
* **A06 Vulnerable Components**
* **A07 Identity & Authentication Failures**
* **A08 Software & Data Integrity Failures**
* **A09 Logging & Monitoring Failures**
* **A10 Server-Side Request Forgery**

### Code Quality (`p/correctness`)

* **Null Pointer** - Potential null dereferences
* **Memory Leaks** - Resource management issues
* **Logic Errors** - Incorrect conditionals, unreachable code
* **Type Safety** - Type conversion issues

## ⚡ Advanced Features

### Taint Analysis

```bash
# Enable taint tracking
semgrep --config=auto --experimental .

# Specific taint rules
semgrep --config=p/taint .
```

### Interprocedural Analysis

```bash
# Enable deeper analysis
semgrep --config=auto --deep .

# Limit depth for performance
semgrep --config=auto --max-target-bytes=1000000 .
```

### Custom Metrics

```bash
# Generate complexity metrics
semgrep --config=p/metrics .

# Custom rule evaluation
semgrep --config=auto --metrics .
```

## 🔧 Configuration

### `.semgrepignore` file

```
# Exclude directories
node_modules/
vendor/
.git/

# Exclude files
*.min.js
*.test.js
*_generated.py

# Exclude patterns
**/migrations/**
**/seeds/**
```

### Semgrep Configuration (`.semgrep.yml`)

```yaml
# .semgrep.yml
rules:
  - include:
      path: src/
  - exclude:
      path: tests/
  - include:
      - rules/security/
      - rules/custom/
```

## 📈 Best Practices

### Rule Development

1. **Start Small** - Simple rules first
2. **Test Thoroughly** - Use test cases
3. **Minimize False Positives** - Refine patterns
4. **Document Clearly** - Explain findings
5. **Version Control** - Track rule changes

### Performance Optimization

```bash
# Use include/exclude patterns
semgrep --config=auto --include='**/*.py' .

# Limit file size
semgrep --config=auto --max-target-bytes=500000 .

# Parallel processing
semgrep --config=auto --jobs=4 .
```

### Team Collaboration

```bash
# Share rules via Git
git clone https://github.com/myorg/security-rules

# Use Semgrep Registry
semgrep --config=r/myorg.security .
```

## 🎓 Learning Resources

### Official Documentation

* [Semgrep Documentation](https://semgrep.dev/docs/)
* [Rule Writing Guide](https://semgrep.dev/docs/writing-rules/)
* [Registry](https://semgrep.dev/rule-registry/)
* [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=Semgrep.semgrep)

### Community Resources

* [Semgrep Community Slack](https://r2c.dev/slack)
* [GitHub Discussions](https://github.com/returntocorp/semgrep/discussions)
* [Blog](https://semgrep.dev/blog)
* [YouTube Channel](https://www.youtube.com/c/returntocorp)

## 📊 Comparison with Other Tools

| Feature            | Semgrep | SonarQube     | CodeQL        | Checkmarx     |
| ------------------ | ------- | ------------- | ------------- | ------------- |
| **Open Source**    | ✅       | ✅ (Community) | ✅             | ❌             |
| **Custom Rules**   | ✅       | ✅             | ✅             | ✅             |
| **Performance**    | ⚡ Fast  | 🐢 Slow       | 🚀 Fast       | 🐢 Slow       |
| **CI/CD**          | ✅       | ✅             | ✅             | ✅             |
| **Learning Curve** | 📚 Easy | 📚 Medium     | 📚 Hard       | 📚 Hard       |
| **Cost**           | 🆓 Free | 💰 Enterprise | 💰 Enterprise | 💰 Enterprise |

## 🔧 Troubleshooting

### Common Issues

```bash
# Rule not matching
semgrep --debug --config=auto .

# Performance issues
semgrep --config=auto --jobs=1 --max-target-bytes=100000 .

# False positives
semgrep --config=auto --severity=ERROR .
```

### Performance Tuning

```bash
# Exclude large files
semgrep --config=auto --exclude='**/*.min.js' .

# Use specific rule sets
semgrep --config=p/security .

# Limit scan depth
semgrep --config=auto --max-depth=5 .
```

## 🛡️ Enterprise Features

### Semgrep AppSec Platform

* **Dashboard** - Centralized vulnerability management
* **Team Management** - Role-based access control
* **Triage** - Issue tracking and resolution
* **Reporting** - Compliance and audit reports
* **Integration** - SSO, Slack, Jira integration

### Pricing Tiers

* **Free** - Open source, unlimited scans
* **Team** - $99/seat/month - Collaboration features
* **Business** - $299/seat/month - Advanced features
* **Enterprise** - Custom - Full platform

***

**🔒 Remember**: Static analysis is just one layer of security. Combine with dynamic testing, code review, and security training for comprehensive protection.

**⚡ Pro Tip**: Start with auto rules, then gradually add custom rules based on your application's specific security requirements.

*📅 Last Updated: 2024*
