mobile-device-matrix-toolkit
Dispatches mobile UI test runs across a 3-tier device matrix (smoke per-PR, regression per-merge, full farm at release) to control CI cost: generates per-target Appium capability configs from a central YAML, parallelises via GitHub Actions matrix strategy, and aggregates JUnit XML into a cross-device pass/fail table. Use when the question is about which devices to run and when, not about how to configure a specific test framework (for that, use xcuitest-suite, espresso-suite, etc.).
mobile-device-matrix-toolkit
Overview
Mobile testing has a combinatorial explosion problem:
Running every test on every config = CI cost / time disaster.
This skill is a dispatcher that picks the right subset per cadence. It wraps the per-platform test runners (xcuitest-suite, espresso-suite, appium-testing, detox-testing, maestro-flows) and orchestrates matrix execution.
When to use
Step 1 - Define the three-tier matrix
# .matrix/devices.yaml
tier_smoke:
description: "Per-PR — every commit. Cheap, fast feedback."
ios:
- { device: "iPhone 15", os: "17.4" }
android:
- { device: "Pixel 7", api: 34 }
tier_regression:
description: "Per-merge to main. Wider coverage."
ios:
- { device: "iPhone 15", os: "17.4" }
- { device: "iPhone SE 3rd", os: "16.0" }
- { device: "iPad Pro 12.9", os: "17.4" }
android:
- { device: "Pixel 7", api: 34 }
- { device: "Pixel 5", api: 31 }
- { device: "Galaxy Tab S8", api: 34 }
tier_release:
description: "Pre-release. Full matrix; runs on device farm."
ios:
- device: "iPhone 15", os: "17.4"
- device: "iPhone 15 Pro Max", os: "17.4"
- device: "iPhone 14", os: "17.0"
- device: "iPhone 13", os: "16.0"
- device: "iPhone SE 3rd", os: "15.0"
- device: "iPad Pro 12.9", os: "17.4"
- device: "iPad Mini", os: "16.0"
android:
- { device: "Pixel 8 Pro", api: 34 }
- { device: "Pixel 7", api: 34 }
- { device: "Pixel 5", api: 31 }
- { device: "Galaxy S23", api: 33 }
- { device: "Galaxy A54", api: 33 }
- { device: "Pixel 4a", api: 30 }
- { device: "Galaxy Tab S8", api: 34 }Per-tier guidance:
| Tier | When | iOS / Android count | Wall time | Cost (est) |
|---|---|---|---|---|
| Smoke | Every PR push | 1 / 1 | ~5 min | local sim |
| Regression | Merge to main | 3 / 3 | ~20 min | local sim |
| Release | Pre-release tag | 7 / 7 | ~60 min | farm |
Step 2 - Generate per-target capabilities
# scripts/gen-matrix.py
import yaml, json, sys
cfg = yaml.safe_load(open(sys.argv[1]))
tier = sys.argv[2] # smoke | regression | release
targets = []
for ios in cfg[f'tier_{tier}']['ios']:
targets.append({
'name': f"ios-{ios['device'].replace(' ', '-')}-{ios['os']}",
'platform': 'iOS',
'capabilities': {
'platformName': 'iOS',
'appium:deviceName': ios['device'],
'appium:platformVersion': ios['os'],
'appium:automationName': 'XCUITest',
},
})
for and_ in cfg[f'tier_{tier}']['android']:
targets.append({
'name': f"android-{and_['device'].replace(' ', '-')}-api{and_['api']}",
'platform': 'Android',
'capabilities': {
'platformName': 'Android',
'appium:deviceName': and_['device'],
'appium:platformVersion': str(and_['api']),
'appium:automationName': 'UiAutomator2',
},
})
print(json.dumps(targets))CI uses this output as a matrix:
jobs:
generate-matrix:
outputs:
targets: ${{ steps.gen.outputs.targets }}
steps:
- id: gen
run: |
targets=$(python scripts/gen-matrix.py .matrix/devices.yaml ${{ inputs.tier }})
echo "targets=$targets" >> "$GITHUB_OUTPUT"
test:
needs: generate-matrix
strategy:
fail-fast: false
matrix:
target: ${{ fromJSON(needs.generate-matrix.outputs.targets) }}
runs-on: ${{ matrix.target.platform == 'iOS' && 'macos-15' || 'ubuntu-latest' }}
name: ${{ matrix.target.name }}
steps:
- run: ./scripts/run-tests.sh ${{ matrix.target.name }} '${{ toJSON(matrix.target.capabilities) }}'Step 3 - Per-tier dispatch
# .github/workflows/mobile-tests.yml
on:
pull_request:
paths: ['mobile/**']
push:
branches: [main]
release:
types: [created]
jobs:
smoke:
if: github.event_name == 'pull_request'
uses: ./.github/workflows/run-matrix.yml
with:
tier: smoke
regression:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: ./.github/workflows/run-matrix.yml
with:
tier: regression
release:
if: github.event_name == 'release'
uses: ./.github/workflows/run-matrix.yml
with:
tier: releaseStep 4 - Aggregate per-target results
Each matrix shard uploads its JUnit XML; an aggregator job combines:
# scripts/aggregate-matrix.py
import xml.etree.ElementTree as ET
from collections import defaultdict
import sys
per_target = defaultdict(lambda: {'tests': 0, 'failures': 0, 'errors': 0, 'time': 0.0})
for f in sys.argv[1:]:
target = f.split('/')[-2] # extract from path
tree = ET.parse(f)
for ts in tree.iter('testsuite'):
per_target[target]['tests'] += int(ts.get('tests', 0))
per_target[target]['failures'] += int(ts.get('failures', 0))
per_target[target]['errors'] += int(ts.get('errors', 0))
per_target[target]['time'] += float(ts.get('time', 0))
# Render matrix
print('| Target | Tests | Pass | Fail | Time |')
print('|--------|------:|-----:|-----:|-----:|')
for target, m in sorted(per_target.items()):
pass_ = m['tests'] - m['failures'] - m['errors']
print(f"| {target} | {m['tests']} | {pass_} | {m['failures']+m['errors']} | {m['time']:.1f}s |")Output:
| Target | Tests | Pass | Fail | Time |
|---------------------------------|------:|-----:|-----:|-------:|
| ios-iPhone-15-17.4 | 42 | 42 | 0 | 320.4s |
| ios-iPhone-SE-3rd-16.0 | 42 | 41 | 1 | 295.1s | ← Pixel 5 only
| android-Pixel-7-api34 | 42 | 42 | 0 | 280.5s |
| android-Galaxy-Tab-S8-api34 | 42 | 40 | 2 | 290.0s | ← tablet layout issuesStep 5 - Device-farm vs local emulator decision
| Use device farm when | Use local emulator/simulator when |
|---|---|
| Real-device behavior matters (camera, sensors) | UI logic only |
| Specific OEM device under test (Samsung, foldable) | Stock OS suffices |
| iOS testing on Linux CI | macOS CI runner available |
| Release-tier matrix (cost amortizes over fewer runs) | Per-PR (cost per run too high) |
| Regulatory / certification testing | Rapid iteration |
Per-farm wiring:
# Firebase Test Lab (Android)
- run: |
gcloud firebase test android run \
--type instrumentation \
--app app/build/outputs/apk/debug/app-debug.apk \
--test app/build/outputs/apk/androidTest/debug/app-debug-androidTest.apk \
--device model=Pixel7,version=34,locale=en \
--device model=GalaxyS23,version=33,locale=en
# BrowserStack (iOS + Android)
- run: |
npx browserstack-runner --config browserstack.jsonAnti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Same matrix for every commit | CI cost explosion; team disables. | Three-tier dispatch (Step 1, 3). |
| Single device per platform | Misses tablet / older-OS regressions until release. | Regression tier covers 3 / 3 (Step 1). |
| Device-farm runs on every PR | Per-minute cost; budget exhausts mid-month. | Farm only for release tier. |
fail-fast: true on the matrix | One failing target cancels others; lose coverage signal. | fail-fast: false (Step 2). |
| Hard-coded device list in CI yaml | Devices added / OS versions deprecated → manual yaml updates everywhere. | Centralized .matrix/devices.yaml (Step 1). |
| No aggregated report | Per-target results buried in CI logs; reviewer can't see the matrix. | Aggregator job (Step 4). |