Testland
Browse all skills & agents

semgrep-rules

Configures and runs Semgrep - pattern-based SAST across 30+ languages with the Semgrep Registry rulesets (`p/owasp-top-ten`, `p/default`, `auto`) plus custom YAML rules; integrates `semgrep ci` for PR-blocking gates with `--baseline-commit` diff-aware scanning, per-finding inline `nosemgrep` suppressions, `--exclude` / `--include` path filters, output formats (`--json` / `--sarif` / `--gitlab-sast` / `--junit-xml`), and severity filter (INFO/WARNING/ERROR). Use when the user runs Semgrep, asks about pattern rules, or needs a low-friction SAST gate without semantic-DB setup.

semgrep-rules

Overview

Per semgrep.dev/docs/getting-started/quickstart:

Semgrep is a fast pattern-based static analyzer covering 30+ languages with a registry of community + paid rulesets and straightforward YAML rule authoring. The semgrep ci subcommand adds CI-aware features (baseline-diff, organization policies, metrics).

Per semgrep.dev/docs/cli-reference:

"semgrep scan: Local scans without account requirement; doesn't return failing codes by default.

semgrep ci: Pipeline execution with organization policies, diff-aware scanning, returns failing codes on findings."

When to use

  • The repo has a .semgrep.yml / .semgrep/ directory or wants zero-config registry rulesets.
  • The user needs PR-time SAST without standing up SonarQube or CodeQL infrastructure.
  • A team prefers pattern-DSL rule authoring over semantic-database query languages.

Step 1 - Install

Per sg-quick:

# macOS
brew install semgrep

# Linux/macOS
pipx install semgrep
# or
uv tool install semgrep

# Windows (PowerShell)
pipx install semgrep

# Docker (CI-friendly)
docker pull semgrep/semgrep

Step 2 - First scan

semgrep scan --config auto

Per sg-cli, --config auto "Auto-fetch rules from registry based on project." Specific rulesets:

semgrep scan --config p/owasp-top-ten     # OWASP Top 10
semgrep scan --config p/default            # broad community ruleset
semgrep scan --config p/python p/javascript   # multiple

Step 3 - Custom rule authoring

A minimal Semgrep rule in .semgrep.yml:

rules:
  - id: hardcoded-jwt-secret
    pattern: jwt.sign($PAYLOAD, "...")
    message: Hardcoded JWT secret detected
    languages: [javascript, typescript]
    severity: ERROR
    metadata:
      cwe: "CWE-798: Use of Hard-coded Credentials"

Pattern operators: pattern, pattern-either, pattern-not, metavariable-pattern, pattern-inside. Validate rule syntax:

semgrep validate --config .semgrep.yml

(Per sg-cli subcommand list.)

Step 4 - CI integration with baseline diff

Per sg-cli:

"--baseline-commit=VAL - Show only findings not in specified commit"

- run: semgrep ci --baseline-ref=main --json --output=semgrep.json

Diff-aware mode is critical for legacy adoption - only NEW findings on the PR fail; pre-existing findings are tracked but don't block.

Step 5 - False-positive triage (MANDATORY)

Suppression mechanisms in priority order:

MechanismExampleWhen to use
Per-line nosemgrep comment# nosemgrep: hardcoded-passwordJustified single-line exception
nosemgrep block# nosemgrep: rule-id above a code blockMulti-line exception
paths.exclude in configexclude: ["**/*_pb.go"]Generated files / vendored code
Baseline ref--baseline-commit=main (Step 4)Legacy debt; ratchet
Organization-level rule disableSemgrep AppSec Platform UITeam-wide policy

Justification template (mandatory in code):

# nosemgrep: hardcoded-password
# Reason: Test fixture; password never reaches production runtime
# Reviewer: alice@example.com (2026-05-15)
# Expires: 2026-12-15
TEST_PASSWORD = "test-only-password-do-not-deploy"

Per sg-cli severity filter for triage workflow:

semgrep scan --severity ERROR --json   # only critical findings

Cadence: every quarter, audit nosemgrep suppressions for staleness. Expired ones removed; persistent ones reviewed for escalation.

Step 6 - Output formats per sg-cli

FlagPurpose
--jsonSemgrep JSON format (for sast-finding-triager)
--sarifSARIF format (GitHub Code Scanning upload)
--gitlab-sastGitLab SAST format (GitLab Security Dashboard)
--junit-xmlJUnit XML (test reporters)
--textDefault human-readable
--output VALWrite to file or URL

Step 7 - Performance flags

semgrep scan -j 8 --timeout 10 --max-target-bytes 5000000

Per sg-cli:

  • -j VALUE - Parallelism degree (default: 3)
  • --timeout=DOUBLE - Per-rule per-file timeout in seconds (default: 5.0)
  • --max-target-bytes=VALUE - Skip files exceeding size (default: 1000000)

Step 8 - Exit codes (per sg-cli)

CodeMeaning
0Success, no issues
1Issues detected (with --error flag)
2Fatal error
3Invalid syntax in scanned language
4Invalid pattern in rule
5Invalid YAML configuration
7Invalid rule in configuration
8Unsupported language specified
13Invalid API key

Step 9 - CI integration

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep
    steps:
      - uses: actions/checkout@v5
      - run: semgrep ci --baseline-ref=main --sarif --output=semgrep.sarif
      - uses: github/codeql-action/upload-sarif@v3
        if: always()
        with: { sarif_file: semgrep.sarif }

Anti-patterns

Anti-patternWhy it failsFix
--config=auto everywhereRulesets drift; no ownershipPin specific rulesets (Step 2)
nosemgrep without justificationBecomes invisible debtRequired justification template (Step 5)
No baseline refEvery legacy finding blocks; team disables--baseline-commit=main (Step 4)
semgrep scan in CIDoesn't return failing exit code by defaultUse semgrep ci (Step 1 quote)
Mix --severity ERROR with --baseline-commit poorlyCan mask real new findingsSeverity filter at output stage, not scan stage

Limitations

  • Pattern matching can miss cross-file taint flows; for those, pair with codeql-queries.
  • Registry rulesets evolve; pin specific versions for production.
  • Semgrep AppSec Platform features (org policies, supply-chain scanning) are paid; the OSS engine covers the patterns above.

References