qa-role-ai

AI/ML & data-pipeline QA role bundle: one-command install of LLM evaluation, ML model testing, AI-assisted test generation, search relevance, data notebooks, and data quality.

Install this role bundle

/plugin install qa-role-ai@testland-qa

One command installs all 6 member plugins. Requires Claude Code v2.1.110+ (v2.1.143+ to enable the whole set together).

AI/ML & data-pipeline QA

AI/ML & data-pipeline QA role bundle: one-command install of LLM evaluation, ML model testing, AI-assisted test generation, search relevance, data notebooks, and data quality.

Installing this one plugin installs all 6 member plugins below in a single command.

Install

/plugin marketplace add testland/qa
/plugin install qa-role-ai@testland-qa

Claude Code resolves and installs the member plugins automatically and lists what it added. Requires Claude Code v2.1.110+ (v2.1.143+ to enable the whole set together).

What this installs

qa-llm-evaluation - LLM and prompt evaluation
qa-ml-models - ML model testing
qa-ai-assisted - AI-assisted test generation + curation
qa-search-relevance - Search relevance testing
qa-data-notebooks - Jupyter notebook testing
qa-data-quality - Data quality testing for analytical pipelines

About role bundles

This is a role bundle - a plugin that ships no skills or agents of its own. It exists only to install a curated set of testing plugins together so you adopt a whole role in one command instead of installing each plugin by hand. Prefer a narrower set? Install just the member plugins you need individually.

Installs these 6 plugins

qa-llm-evaluation

LLM and prompt evaluation: 7 skills (deepeval-evaluation, giskard-llm, langfuse-tracing, llm-regression-suite-author, openai-evals, promptfoo-evaluation, ragas-evaluation) and 2 agents (llm-red-team-planner, prompt-eval-reviewer). Covers the mainstream OSS LLM-eval ecosystem: Promptfoo + OpenAI Evals + DeepEval + Ragas for functional eval, Giskard for adversarial scan, Langfuse for production observability.

qa-ml-models

ML model testing: 6 skills (alibi-explainability, deepchecks-tests, evidently-monitoring, fairlearn-fairness, giskard-tests, model-performance-regression-gate) and 2 agents (data-drift-incident-responder, model-fairness-reviewer). Covers vulnerability scanning, drift monitoring, group fairness, and per-prediction explainability.

qa-ai-assisted

AI-assisted test generation + curation: 3 skills (ai-spec-coverage-mapper, ai-test-generator, model-based-test-graph-author) and 3 agents (ai-test-curator, ai-test-shallow-coverage-critic, mbt-suite-builder).

qa-search-relevance

Search relevance testing: 6 skills (elasticsearch-relevance-tests, hybrid-search-eval-author, judgment-list-author, opensearch-relevance-tests, solr-relevance-tests, vector-search-precision-tests) and 1 agent (relevance-regression-reviewer). IR-metrics-driven NDCG / MRR / Recall@k regression detection.

qa-data-notebooks

Jupyter notebook testing: 4 skills (nbval-tests, notebook-ci-pipeline-author, papermill-tests, testbook-tests) and 1 agent (notebook-quality-reviewer). Covers full-notebook regression (nbval), function-level unit tests (testbook), and parameterized execution (papermill).

qa-data-quality

Data quality testing for analytical pipelines: 5 skills (dbt-testing, great-expectations, soda-checks, data-quality-gate, data-quality-conventions) and 2 agents (schema-diff-reviewer, data-anomaly-triager).