Part of Data & Analytics

Claude Code Skills for Data Engineering

Data engineering is plumbing — unglamorous, invisible when it works, and catastrophic when it doesn't. ETL pipelines, data cleaning, schema migrations, quality rules, privacy compliance, and the connectors that move data between systems. These skills cover the infrastructure layer that makes analytics possible. Because the fanciest dashboard in the world is worthless if the data feeding it is wrong.

Published by ClaudeVaultLast updated 7 skills

Key takeaway

ClaudeVault's data engineering skills give Claude Code structured workflows for the pipeline infrastructure that makes analytics possible — ETL and ELT pipeline design with Apache Airflow and Dagster orchestration, data cleaning that handles messy source systems, schema migration planning for live databases, data quality rule design with Great Expectations and Soda, privacy compliance across GDPR, CCPA, and the 20 US state privacy laws now in effect, and spreadsheet-to-SQL conversion for teams migrating from Excel to databases.

At a glance

  • 7 skills covering ETL/ELT pipeline design, data cleaning, schema migration, data quality rules, pipeline debugging, data privacy compliance, and spreadsheet-to-SQL conversion
  • 64 percent of data professionals cite data quality as their top data integrity challenge, up from 50 percent in 2023
  • ELT has become the default approach for most analytics use cases because modern cloud warehouses like Snowflake and BigQuery handle transformations at scale
  • 20 US states now have comprehensive consumer privacy laws as of January 2026, with the EU AI Act reaching full enforcement on August 2, 2026
  • The largest topic in the Data & Analytics bundle, covering the infrastructure layer from data extraction through quality assurance to privacy compliance

When you reach for these skills

  • When data pipelines break silently and the team discovers bad data in dashboards days after the failure instead of at ingestion time

  • When schema changes in source systems cascade through pipelines with no migration plan and break downstream tables

  • When data quality is checked manually by an analyst eyeballing a dashboard instead of by automated rules that halt the pipeline on failure

  • When privacy regulations change and nobody knows which pipelines handle PII, where it is stored, or how long it is retained

How these skills work together

A Claude Code data engineering workflow builds from extraction through quality assurance to compliance, ensuring each layer catches problems before they propagate downstream to dashboards and reports.

  1. 1

    Design the ETL or ELT pipeline architecture

    Start with the ETL designer. Claude designs the pipeline architecture — extraction from source systems, loading into the warehouse, and transformation either in-flight or in-warehouse. The choice between ETL and ELT depends on your warehouse's compute capacity. Claude builds idempotent pipelines with retry logic and dead-letter queues for failed records.

  2. 2

    Clean and standardize source data

    The data cleaner handles the messy reality of source systems — inconsistent formats, missing values, duplicate records, encoding issues. Claude generates cleaning transformations that standardize data at ingestion so downstream consumers never see the raw mess.

  3. 3

    Design data quality rules that catch problems at ingestion

    Use the data quality rule designer to build automated checks with Great Expectations or Soda. Claude generates rules for completeness, uniqueness, referential integrity, and freshness — then wires them into the pipeline so a quality failure halts the load before bad data reaches the warehouse.

  4. 4

    Plan schema migrations for live databases

    The schema migration advisor plans changes to production schemas using expand-contract patterns. Claude generates migration scripts with rollback steps, validates foreign key and constraint impacts, and sequences the migration so no downstream pipeline reads from a schema in mid-transition.

  5. 5

    Audit data flows for privacy compliance

    Finally, the data privacy advisor maps PII flows through the pipeline and checks compliance against GDPR, CCPA, and the 20 US state privacy laws now in effect. Claude identifies where personal data is stored, how long it is retained, and whether consent and deletion requirements are met in every pipeline stage.

Outcome

An idempotent pipeline architecture, clean data at ingestion, quality rules that catch problems before they reach dashboards, schema migrations that do not break downstream consumers, and privacy compliance audited across every data flow.

Compare the skills

SkillBest forComplexityPrimary use case
ETL DesignerPipeline architecture and orchestrationAdvancedExtraction, loading, and transformation with Airflow or Dagster
Data CleanerSource data standardizationBeginnerFormat normalization, deduplication, and missing value handling
Data Quality Rule DesignerAutomated quality checksIntermediateGreat Expectations and Soda rules for completeness, uniqueness, and freshness
Schema Migration AdvisorLive database schema changesAdvancedExpand-contract migrations with rollback steps and constraint validation
Data Pipeline DebuggerPipeline failure diagnosisIntermediateRoot cause analysis for failed loads, data drift, and orchestration errors
Data Privacy AdvisorRegulatory compliance auditingAdvancedGDPR, CCPA, and US state privacy law compliance across data flows
Spreadsheet-to-SQL ConverterExcel to database migrationBeginnerConverting spreadsheet formulas and structures into SQL tables and queries

Skills in this topic

Spreadsheet to SQL Converter

Converts spreadsheet logic to SQL and normalized schemas. Use when migrating Excel/Google Sheets formulas, VLOOKUPs, and pivot tables into database queries. Formula translation, denormalization repair, pivot conversion.

Translate Excel/Google Sheets formulas, pivot tables, VLOOKUP patterns, and manual data workflows into normalized database schemas and maintainable SQL queries — preserving exact business logic while

Data Cleaner

Profiles datasets and fixes duplicates, missing values, type mismatches, and outliers. Use when data quality issues block analysis or distort results. Data profiling, deduplication, format standardization.

Identify data quality issues — duplicates, missing values, type mismatches, inconsistent formats, outliers, referential integrity violations — and recommend or implement fixes, prioritized by downstre

Data Pipeline Debugger

Systematically diagnoses data pipeline failures including data loss, transformation errors, and freshness issues. Use when a pipeline produces wrong results, loses records, or runs late. SLICE triage, root cause analysis, backfill.

Systematically diagnose why a data pipeline produces incorrect results, loses records, runs late, or fails silently — using a structured triage process that isolates the problem to a specific stage, t

ETL Designer

Designs ETL/ELT pipelines with extraction, transformation, loading, and orchestration strategies. Use when building data pipelines to move and transform data between systems. Batch, streaming, CDC, idempotency, monitoring.

Design ETL/ELT pipelines — choosing extraction methods, transformation strategies, loading patterns, and orchestration tools for reliable data integration that runs at 3 AM without paging anyone.

Schema Migration Advisor

Plans database schema migrations with backward compatibility, zero-downtime, and reversibility. Use when changing production database schemas that serve live traffic. Multi-phase deployment, backfill strategies, lock analysis.

Plan schema migrations that are backward-compatible, zero-downtime, and reversible — turning what most teams treat as a terrifying deploy into a boring, predictable operation.

Data Privacy Advisor

Designs technical privacy controls for databases and pipelines including PII detection, anonymization, and retention policies. Use when implementing data privacy for compliance or data sharing. Anonymization, pseudonymization, access control.

Identify PII in datasets, recommend anonymization or pseudonymization strategies, design retention policies, and implement access controls — balancing privacy obligations with analytical utility.

Data Quality Rule Designer

Designs data validation rules across six quality dimensions with prioritized severity levels. Use when building quality checks for pipelines, warehouses, or applications. Completeness, accuracy, consistency, timeliness checks.

Design comprehensive, prioritized data quality checks — the kind that detect a broken pipeline at 6 AM before the CFO notices at 9 AM.

Frequently asked questions

What is the difference between ETL and ELT?

ETL transforms data before loading it into the warehouse. ELT loads raw data first and transforms it inside the warehouse using its compute power. ELT has become the default for most analytics use cases because modern warehouses like Snowflake and BigQuery have enough processing capacity to handle transformations at scale — and keeping raw data available supports ad hoc analysis.

How do I build idempotent data pipelines?

The ETL designer generates pipelines where every step produces the same result regardless of how many times it runs. Claude uses upsert logic instead of append, deduplication keys, and checkpoint markers so a failed pipeline can restart from the last successful stage without creating duplicates or missing records.

What data quality framework should I use?

Great Expectations and Soda are the two leading open-source frameworks. Great Expectations excels at expectation suites integrated into Python pipelines. Soda offers YAML-based checks that non-engineers can read and modify. The data quality rule designer generates rules for either framework, wired into the pipeline to halt on failure.

How do I comply with 2026 privacy laws in my data pipelines?

Twenty US states now have comprehensive consumer privacy laws, and the EU AI Act reaches full enforcement in August 2026. The data privacy advisor maps PII flows through every pipeline stage, identifies retention periods, checks consent mechanisms, and validates deletion capabilities — producing a compliance audit that covers GDPR, CCPA, and the growing patchwork of state-level requirements.

How does schema migration work in a modern data stack?

The schema migration advisor uses expand-contract patterns — add the new column or table first, migrate consumers one by one, then remove the old schema. Claude generates the migration scripts with explicit rollback steps and validates that foreign keys and constraints are satisfied at every stage, so no downstream pipeline reads from a schema in mid-transition.