Data architecture describes the structure of an organization's logical and physical data assets and data management resources, according to The Open Group Architecture Framework (TOGAF).
It is an offshoot of enterprise architecture that comprises the models, policies, rules, and standards that govern the collection, storage, arrangement, integration, and use of data in organisations. An organisation's data architecture is the purview of data architects.
Data architecture goals
The goal of data architecture is to translate business needs into data and system requirements and to manage data and its flow through the enterprise.
Data architecture principles
According to Joshua Klahr, vice president of product management, core products, at Splunk, and formerly vice president of product management at AtScale, six principles form the foundation of modern data architecture:
- Data is a shared asset: A modern data architecture needs to eliminate departmental data silos and give all stakeholders a complete view of the company
- Users require adequate access to data: Beyond breaking down silos, modern data architectures needs to provide interfaces that make it easy for users to consume data using tools fit for their jobs
- Security is essential: Modern data architectures must be designed for security and they must support data policies and access controls directly on the raw data
- Common vocabularies ensure common understanding: Shared data assets, such as product catalogs, fiscal calendar dimensions, and KPI definitions, require a common vocabulary to help avoid disputes during analysis
- Data should be curated: Invest in core functions that perform data curation (modelling important relationships, cleansing raw data, and curating key dimensions and measures)
- Data flows should be optimised for agility: Reduce the number of times data must be moved to reduce cost, increase data freshness, and optimise enterprise agility
Data architecture components
Dataversity says data architecture can be synthesized into three overall components:
- Data architecture outcomes: These are the models, definitions, and data flows often referred to as data architecture artifacts
- Data architecture activities: These are the forms, deploys, and fulfills of data architecture intentions
- Data architecture behaviours: These are the collaborations, mindsets, and skills of the various roles that affect an enterprise's data architecture
Data architecture vs. data modelling
According to Data Management Book of Knowledge (DMBOK 2), data architecture defines the blueprint for managing data assets by aligning with organisational strategy to establish strategic data requirements and designs to meet those requirements.
On the other hand, DMBOK 2 defines data modelling as, "the process of discovering, analysing, representing, and communicating data requirements in a precise form called the data model."
While both data architecture and data modelling seek to bridge the gap between business goals and technology, data architecture is about the macro view that seeks to understand and support the relationships between an organisation's functions, technology, and data types. Data modelling takes a more focused view of specific systems or business cases.
Data architecture frameworks
There are several enterprise architecture frameworks that commonly serve as the foundation for building an organization's data architecture framework.
- DAMA-DMBOK 2: DAMA International's Data Management Body of Knowledge is a framework specifically for data management. It provides standard definitions for data management functions, deliverables, roles, and other terminology, and presents guiding principles for data management
- Zachman Framework for Enterprise Architecture: The Zachman Framework is an enterprise ontology created by John Zachman at IBM in the 1980s. The "data" column of the Zachman Framework comprises multiple layers, including architectural standards important to the business, a semantic model or conceptual/enterprise data model, an enterprise/logical data model, a physical data model, and actual databases
- The Open Group Architecture Framework (TOGAF): TOGAF is an enterprise architecture methodology that offers a high-level framework for enterprise software development. Phase C of TOGAF covers developing a data architecture and building a data architecture roadmap
Characteristics of modern data architecture
Modern data architectures must be designed to take advantage of emerging technologies such as artificial intelligence (AI), automation, internet of things (IoT), and blockchain. Dan Sutherland, distinguished engineer and CTO, data platforms, at IBM, says modern data architectures should hold the following characteristics in common:
- Cloud-native: Modern data architectures are designed to support elastic scaling, high availability, end-to-end security for data in motion and data at rest, and cost and performance scalability
- Scalable data pipelines: To take advantage of emerging technologies, data architectures support real-time data streaming and micro-batch data bursts
- Seamless data integration: Data architectures integrate with legacy applications using standard API interfaces. They are optimised for sharing data across systems, geographies, and organisations
- Real-time data enablement: Modern data architectures support the ability to deploy automated and active data validation, classification, management, and governance
- Decoupled and extensible: Modern data architectures are designed to be loosely coupled, enabling services to perform minimal tasks independent of other services