trace-spec-author
Build a trace specification document per feature - defines the trace shape (root span + child spans + key attributes per OpenTelemetry semantic conventions) that production code MUST emit. The spec drives both implementation reviews AND trace-assertion tests, so a single declarative document is the source of truth for what observability "looks like" for a feature.
trace-spec-author
Builds a per-feature trace specification document. The spec is canonical input for: (a) implementation review during code review, (b) trace-shape regression tests via opentelemetry-trace-assertions and jaeger-trace-tests, (c) onboarding for new team members.
When to use
Step 1 - Spec template
Save under docs/observability/<feature>.md:
# Trace Spec: <Feature>
**Owner:** team-checkout
**Status:** active | draft | deprecated
**Last reviewed:** YYYY-MM-DD
**Implementations:** orders-svc 1.4+, payments-svc 2.1+
## Root span
| Field | Value |
|---|---|
| name | `order.create` |
| kind | `INTERNAL` |
| status semantics | OK on 2xx, ERROR on 4xx + 5xx |
### Required attributes
| Attribute | Type | Source | Notes |
|---|---|---|---|
| `order.id` | string | UUID v4 | |
| `order.item_count` | int | int.between(1, 1000) | |
| `customer.id` | string | low-cardinality hash, never PII | |
## Child spans
### `db.query` (when persisting order)
| Field | Value |
|---|---|
| name | `db.query` |
| kind | `CLIENT` |
| parent | `order.create` |
Attributes per [OpenTelemetry DB semantic conventions]:
| Attribute | Type | Notes |
|---|---|---|
| `db.system` | string | "postgresql" |
| `db.operation` | string | "INSERT" |
### `payments.charge` (cross-service)
| Field | Value |
|---|---|
| name | `payments.charge` |
| kind | `CLIENT` (in orders-svc); peer is `SERVER` in payments-svc |
| parent | `order.create` |
Attributes:
- `payment.amount_cents` (int)
- `payment.currency` (string, ISO 4217)
- per [OpenTelemetry HTTP semantic conventions]: `http.request.method`, `url.full`, `http.response.status_code`
## Status mapping
| Outcome | Span status | Notes |
|---|---|---|
| Success (2xx) | OK | |
| Validation error (4xx) | UNSET on server side; ERROR on client side | Per [HTTP semantic conventions] |
| Server error (5xx) | ERROR | Always |
## Required test assertions
- [ ] root span name == `order.create`
- [ ] root span has all required attrs
- [ ] `db.query` is child of root
- [ ] `payments.charge` is child of root
- [ ] On 5xx from payments, `payments.charge.status == ERROR` AND `order.create.status == ERROR`
- [ ] `payment.amount_cents` always set when payment succeeds
## Change log
| Date | Change | Reviewer |
|---|---|---|
| 2026-04-01 | Initial spec | @reviewer |
| 2026-04-15 | Added `customer.id` low-cardinality req | @reviewer |Step 2 - Map to OpenTelemetry semantic conventions
For each span, decide:
Per the OpenTelemetry HTTP semantic conventions, required attributes for HTTP client spans: http.request.method, url.full, server.address, server.port, http.response.status_code, error.type (conditionally).
Step 3 - Status semantics
Decide per outcome (referenced in your test spec):
| HTTP outcome | Server span | Client span |
|---|---|---|
| 1xx, 2xx, 3xx | UNSET | UNSET |
| 4xx | UNSET | ERROR |
| 5xx | ERROR | ERROR |
Per the OpenTelemetry HTTP semantic conventions: "Status remains unset for 1xx, 2xx, and 3xx responses unless additional errors occurred. For 4xx codes, status stays unset on servers but should be set to Error on clients. All 5xx responses should be marked as Error."
Step 4 - Anti-cardinality rules (mandatory)
The spec MUST flag attributes that risk cardinality explosion in your observability backend:
| Risk | Example | Mitigation |
|---|---|---|
| User ID as attribute → unbounded series | user.id: 123e4567-... | Hash to bucket OR put in span events instead |
| URL with query params | url.full includes raw ?token=...&id=999 | Use http.route for low-cardinality template |
| Free-text error message | error.message: "User 123 …" | Use error.type enum + log line for detail |
Add to spec: **High-cardinality attributes:** none / [list].
Step 5 - Drive tests from spec
Each "Required test assertion" in Step 1 maps to one test in opentelemetry-trace-assertions (in-process) or jaeger-trace-tests (cross-service):
def test_order_create_required_attrs():
with use_tracer():
create_order(items=[item])
spans = memory_exporter.get_finished_spans()
root = next(s for s in spans if s.name == "order.create")
# Per trace-spec.md required attrs
assert "order.id" in root.attributes
assert isinstance(root.attributes["order.item_count"], int)
assert "customer.id" in root.attributes
# PII protection: must not contain raw email
assert "@" not in root.attributes["customer.id"]The spec ↔ test ↔ implementation triangle is the value: any drift between any two surfaces.
Step 6 - Change-management process
Lock in a process:
Step 7 - Spec catalog
Maintain docs/observability/INDEX.md:
| Feature | Spec | Owner | Status |
|---|---|---|---|
| Order create | [order-create.md](./order-create.md) | @team-checkout | active |
| User signup | [user-signup.md](./user-signup.md) | @team-identity | active |
| Refund flow | [refund.md](./refund.md) | @team-payments | draft |Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Spec lives in Confluence/Notion only | Drifts from code; not version-controlled | Repo-resident markdown (Step 1) |
| Spec uses prose ("emits a span when order is created") | Ambiguous; can't drive tests | Tabular required attrs + assertions (Step 1, Step 5) |
| Spec invents attribute names parallel to SemConv | Two ways to ask "what HTTP method"; analytics break | Use SemConv (Step 2) |
| Skip cardinality review | Bills surge from per-user spans | Mandatory cardinality section (Step 4) |
| No status semantics | Each implementer decides; alerts fire inconsistently | Status table (Step 3) |