Agentic Engineering

agentic engineering Link to heading

Evidence drawn from two production codebases: md-util (Go CLI, v0.8.0) and the METRO PMO portfolio intelligence + agentic practice library.

md-util (mdu) — SHACL-first data governance CLI Link to heading

Go CLI that ingests heterogeneous data assets (CSV, JSON/JSONL, XML, SQL DDL, Parquet, DuckDB, Postgres, IPUMS codebooks) and emits SHACL 1.2 Turtle as the canonical governance artifact. From the same .ttl, it derives dialect-specific DDL (Postgres, BigQuery, Databricks) and lake schemas (Parquet, DuckLake, Iceberg). By design it never moves rows and never executes DDL against a live database — governance artifacts only.

Scope

  • ~10,879 lines of Go across 58 files, 13 internal packages
  • 8 public subcommands (init, ingest, catalog, validate, transform, emit, ice, ducklake, nimtable)
  • 1,946 lines of HDD / SDD / BDD design docs with 26 Gherkin scenarios
  • v0.8.0 shipped 2026-06-24; Linear backlog SOU-17 → SOU-92

Data engineering capabilities

  • 7 ingestors with XSD type inference (narrowest safe type)
  • Live Postgres schema extraction via PGGOLD_* env vars with 10k-row sampling
  • DuckDB schema introspection; Parquet sampling via read_parquet()
  • SHACL constraint validation: sh:datatype, sh:minCount, sh:maxCount, sh:pattern, sh:or, and custom sh:sparql constraints with $data substitution into DuckDB SQL
  • ISTD-style transform layers (_STG_CLN_LKP) generated from SHACL annotations
  • Iceberg emit: Parquet to S3 + AWS Glue catalog registration via aws-sdk-go-v2
  • DuckLake catalog with multi-backend lake registry (S3, MinIO, Hetzner, RustFS, local)
  • Lineage tracking to .md-util.db (DuckDB): ingest_log, emit_log, validate_log — idempotent, non-fatal failure

Agentic programming evidence

  • Linear MCP integration; 92 tickets filed end-to-end, 15 closed in a single agentic session
  • Custom Claude Code permission allowlist in .claude/settings.local.json scoping go test, go build, mdu *, and curated git operations
  • Persistent memory layer (memory/MEMORY.md) keeping the Linear ticket map hot across sessions
  • Custom SHACL vocabulary (mdu:delimiter, mdu:skipRows, sh:sparql with $data substitution) designed so an agent can round-trip the Turtle without losing dialect detail
  • Dataset-batch integration testing rhythm: 10 datasets cycled through ingest → validate → emit; each failure becomes a Linear ticket; blockers patched same-session

METRO — PMO portfolio intelligence + agentic practice library Link to heading

reposcan — repository activity dashboard Link to heading

Go CLI + embedded web dashboard (Cobra, ~1,045 LOC, no JS framework) that scans the local repo universe and surfaces it for portfolio review.

  • 5 commands: scan (parallel crawler, 10 workers), list, stats, dirty, serve (HTTP on :8077)
  • DuckDB-backed schema with 5 tables: settings, scan_runs, repos, repo_commits
  • First full scan: 715 repos, 431 dirty, 465,539 commits across 153 categories in 28 seconds
  • Linear sync pipeline: daily 7:00 AM pull of projects + issues; portfolio snapshot is git-versioned for diff review

practice-docs/ — agentic methodology Link to heading

HDD–SDD–BDD framework for AI-assisted development:

  • Left of Do book outline (20,350 lines) defining the closed loop
  • cmd/hdd interactive CLI: 5-phase questionnaire (Problem Reality → Agent Hypothesis → Benefit Hypothesis → Assumption Inventory → Business Test)
  • cmd/bdd interactive CLI: behavioral identity, context/boundary mapping, Gherkin scenario builder
  • Combined ~1,007 LOC of Cobra Go

Agentic Program Inventory: catalog of ~85 projects across ~30 domain groups developed with Claude Code (detected via .context/ presence).

Data engineering practice docs (data-methods/): 19 documented practice areas — Ingestion, Cleaning, Transformation, Visualization, Production, ML, NLP, Columnar Storage, Pipelines, Tech Selection, Warehouse Design, Lakehouse, Security, Quality & Governance, Metadata, Ontologies, RAG, MCP.

Recurring strengths Link to heading

Agentic programming. Production Claude Code with scoped permissions, MCP servers (Linear, msgvault, MotherDuck), persistent memory, custom skills, status-report discipline. Multi-session workflow design: file findings as tickets, fix blockers same-session, queue the rest. Prompt and ontology design treated as code.

Data engineering. DuckDB-first architecture across multiple tools. SHACL 1.2 Turtle as canonical schema, derived emit to Postgres, BigQuery, Databricks, Parquet, Iceberg/Glue, DuckLake, Nimtable. Schema-first, governance-only stance — a deliberate architectural choice that limits blast radius.

Go / CLI engineering. Cobra-based multi-tool design across md-util, reposcan, hdd, bdd. ~13k LOC of Go in this window. Standard pgx/v5, aws-sdk-go-v2, go-duckdb. Vault-backed secrets pattern as standard.