Agentic Engineering
agentic engineering Link to heading
Evidence drawn from two production codebases: md-util (Go CLI, v0.8.0) and the METRO PMO portfolio intelligence + agentic practice library.
md-util (mdu) — SHACL-first data governance CLI
Link to heading
Go CLI that ingests heterogeneous data assets (CSV, JSON/JSONL, XML, SQL DDL, Parquet, DuckDB, Postgres, IPUMS codebooks) and emits SHACL 1.2 Turtle as the canonical governance artifact. From the same .ttl, it derives dialect-specific DDL (Postgres, BigQuery, Databricks) and lake schemas (Parquet, DuckLake, Iceberg). By design it never moves rows and never executes DDL against a live database — governance artifacts only.
Scope
- ~10,879 lines of Go across 58 files, 13 internal packages
- 8 public subcommands (
init,ingest,catalog,validate,transform,emit,ice,ducklake,nimtable) - 1,946 lines of HDD / SDD / BDD design docs with 26 Gherkin scenarios
- v0.8.0 shipped 2026-06-24; Linear backlog SOU-17 → SOU-92
Data engineering capabilities
- 7 ingestors with XSD type inference (narrowest safe type)
- Live Postgres schema extraction via
PGGOLD_*env vars with 10k-row sampling - DuckDB schema introspection; Parquet sampling via
read_parquet() - SHACL constraint validation:
sh:datatype,sh:minCount,sh:maxCount,sh:pattern,sh:or, and customsh:sparqlconstraints with$datasubstitution into DuckDB SQL - ISTD-style transform layers (
_STG→_CLN→_LKP) generated from SHACL annotations - Iceberg emit: Parquet to S3 + AWS Glue catalog registration via
aws-sdk-go-v2 - DuckLake catalog with multi-backend lake registry (S3, MinIO, Hetzner, RustFS, local)
- Lineage tracking to
.md-util.db(DuckDB):ingest_log,emit_log,validate_log— idempotent, non-fatal failure
Agentic programming evidence
- Linear MCP integration; 92 tickets filed end-to-end, 15 closed in a single agentic session
- Custom Claude Code permission allowlist in
.claude/settings.local.jsonscopinggo test,go build,mdu *, and curated git operations - Persistent memory layer (
memory/MEMORY.md) keeping the Linear ticket map hot across sessions - Custom SHACL vocabulary (
mdu:delimiter,mdu:skipRows,sh:sparqlwith$datasubstitution) designed so an agent can round-trip the Turtle without losing dialect detail - Dataset-batch integration testing rhythm: 10 datasets cycled through ingest → validate → emit; each failure becomes a Linear ticket; blockers patched same-session
METRO — PMO portfolio intelligence + agentic practice library Link to heading
reposcan — repository activity dashboard
Link to heading
Go CLI + embedded web dashboard (Cobra, ~1,045 LOC, no JS framework) that scans the local repo universe and surfaces it for portfolio review.
- 5 commands:
scan(parallel crawler, 10 workers),list,stats,dirty,serve(HTTP on :8077) - DuckDB-backed schema with 5 tables:
settings,scan_runs,repos,repo_commits - First full scan: 715 repos, 431 dirty, 465,539 commits across 153 categories in 28 seconds
- Linear sync pipeline: daily 7:00 AM pull of projects + issues; portfolio snapshot is git-versioned for diff review
practice-docs/ — agentic methodology
Link to heading
HDD–SDD–BDD framework for AI-assisted development:
- Left of Do book outline (20,350 lines) defining the closed loop
cmd/hddinteractive CLI: 5-phase questionnaire (Problem Reality → Agent Hypothesis → Benefit Hypothesis → Assumption Inventory → Business Test)cmd/bddinteractive CLI: behavioral identity, context/boundary mapping, Gherkin scenario builder- Combined ~1,007 LOC of Cobra Go
Agentic Program Inventory: catalog of ~85 projects across ~30 domain groups developed with Claude Code (detected via .context/ presence).
Data engineering practice docs (data-methods/): 19 documented practice areas — Ingestion, Cleaning, Transformation, Visualization, Production, ML, NLP, Columnar Storage, Pipelines, Tech Selection, Warehouse Design, Lakehouse, Security, Quality & Governance, Metadata, Ontologies, RAG, MCP.
Recurring strengths Link to heading
Agentic programming. Production Claude Code with scoped permissions, MCP servers (Linear, msgvault, MotherDuck), persistent memory, custom skills, status-report discipline. Multi-session workflow design: file findings as tickets, fix blockers same-session, queue the rest. Prompt and ontology design treated as code.
Data engineering. DuckDB-first architecture across multiple tools. SHACL 1.2 Turtle as canonical schema, derived emit to Postgres, BigQuery, Databricks, Parquet, Iceberg/Glue, DuckLake, Nimtable. Schema-first, governance-only stance — a deliberate architectural choice that limits blast radius.
Go / CLI engineering. Cobra-based multi-tool design across md-util, reposcan, hdd, bdd. ~13k LOC of Go in this window. Standard pgx/v5, aws-sdk-go-v2, go-duckdb. Vault-backed secrets pattern as standard.