Schema changes are an accountability problem, not a Data Contract problem

Schema changes are one of the most common causes of data pipeline failures, but introducing a data contract tool is rarely the best decision to tackle this problem.

When a column gets renamed or an integer becomes a string, ingestion breaks. Sometimes silently until a report fails, or a dashboard shows missing data. Data engineers know the operational pattern. They are called up to investigate the incident, patch the pipeline, replay data, re-trigger downstream transformations, and explain delays to stakeholders.

The fix to these recurrent issues is not technical, but organisational.

The recurring “solution”: data contracts

After a schema incidents, data teams often arrive at the same conclusion in post-mortems: "We need data contracts".

The idea sounds reasonable. Define schemas formally, validate compatibility, introduce governance, and prevent breaking changes before deployment. But in practice, many organizations never get there.

Application teams worry that stricter contracts will slow delivery velocity. They also feel blamed by the data engineering teams and may think of data contracts as a tool to control them. Also, the effort to introduce and implement tooling across engineering teams company-wide grows quickly and require close collaboration, sometimes without clear incentives for adoption. Just picking the right tool and deciding how to integrate it and who owns it is challenging.

In the meantime, the incidents remain intermittent enough that the initiative never becomes urgent. So the organization falls back to the default operating model: "We’ll communicate better next time.” Until the next production incident.

The real problem is accountability

A few years ago, a team I was working with had a CDC pipeline that streamed changes from PostgreSQL into a warehouse through Kafka Connect.

The ingestion layer accepted backward-compatible schema evolution, but one application release introduced a data type change that broke the connector. The pipeline monitoring was weak, so the issue went unnoticed until business reports failed to generate within SLA. By the time the team discovered the problem, downstream reporting deadlines had already been missed and database storage consumption had started growing rapidly because replication lag accumulated in the database.

The application team had no idea their schema modification could break ingestion. They knew other teams consumed data from the database, but they did not understand the operational dependencies, connector limitations, or downstream processing assumptions. And why would they?

There was no ownership model around downstream consumption, no integration testing for schema compatibility, no shared operational visibility and no process governing schema evolution. The absence of a data contract tool was not what caused the incident. The real issue was that the team producing the data was not accountable for safely exposing it to consumers.

The effective model: producer-owned integrations

The best results come from treating data ingestion pipelines as part of the producing application’s responsibility. If a team owns a service and generates data needed for analytics, they should also own the supported interfaces through which other teams consume its data. APIs, warehouse tables, CDC streams, event schemas, or files, it doesn’t matter as long as the application team is accountable for the interface.

Once the ingestion pipeline becomes part of the application boundary, schema evolution becomes visible during development and compatibility testing becomes easier to automate. Any breaking changes become operationally expensive to the team introducing them and version migration strategies become intentional.

This mirrors a principle most software organizations already accept: No one should directly depend on another team’s application database. The same principle applies to analytical and operational data integrations.

Application teams are not data engineers

This is the common pushback, and it’s partially true.

Data engineering and application engineering involve different skills. But team boundaries should not become excuses for fragmented accountability. Software engineers don’t need to become data specialists. There are practical ways to bridge the gap. You can embed data engineers temporarily within product teams or the data engineering team can provide ingestion frameworks and reusable templates to standardize patterns.

The goal is to ensure the teams producing data understand they are also responsible for exposing it reliably.

Reliability improves when ownership shifts left

Schema incidents will never disappear entirely. Teams will still introduce breaking changes, and migrations will still fail, but when producer teams own the interface, breaking changes are considered before deployment instead of after the incident. That matters far more than introducing another governance tool that only a subset of the organization adopts consistently.

Data contracts can be valuable, but without accountability, they don’t improve reliability.

‍

Data Contracts don’t solve schema change problems

The recurring “solution”: data contracts

The real problem is accountability

The effective model: producer-owned integrations

Application teams are not data engineers

Reliability improves when ownership shifts left

Related News & Blog

TOP 3% TALENT