Data Modeling Cheatsheet

Snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. “Snowflaking” is a method of normalizing the dimension tables in a star schema.

A fact table consists of the measurements, metrics or facts of a business process. It is located at the center of start/snowflake schema surrounded by dimension tables.

A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions e.g. people, place, time, and son on.

Types of fact tables

  • Transactional – each grain associated with indicated as a row per line in a transaction
  • Periodic snapshots –
  • Accumulative snapshots

Designing fact and dimension tables are – choosing a business process to model, then defining grain meaning describing exactly what a fact table record represents. Then choose dimensions to describe and determine facts, lastly identify/define facts.

There may be faceless fact table – they describe either events/activities or condition, eligibility and coverage.

Dimension tables may have Surrogate keys that are key generated and managed inside the data warehouse rather than keys extracted from data source systems.

Different types of Dimension

  • Type 0 (Retain original): Attributes never change. No history.
  • Type 1 (Overwrite): Old values are overwritten with new values for attribute. No history.
  • Type 2 (Add new row): For a new value, a new row is created with either a start date / end date or version. This creates a history.
  • Type 3 (Add new attribute): For a new value, a new columm is created. History is limited to the number of columns designated for storing historical data.
  • Type 4 (Add history table): One table keep the current value, where as the history is saved in a second table. This creates a history.
  • Type 5 (Combined Approach 1 + 4): Combination of type 1 and type 4. History is created through a second history table.
  • Type 6 (Combined Approach 1 + 2 + 3): Combination of type 1, type 2 and type 3. History is created through separate row and attributes.
  • Type 7 (Hybrid Approach): Both surrogate and natural key are used.

Dimensions:

  • Conformed Dimension: A conformed dimension is a set of data attributes that have been physically referenced in multiple database tables using the same key value to refer to the same structure, attributes, domain values, definitions and concepts. A conformed dimension cuts across many facts.
  • Junk Dimensions:
  • Degenerate Dimension: a transaction number, invoice number, ticket number, or bill-of-lading number, that has no attributes and hence does not join to an actual dimension table
  • Role-playing Dimension
  • Outrigger Dimension
  • Shrunken Dimension: A conformed dimensions that includes a subset of the rows and/or columns of the original dimension
  • Calendar date dimension

Data Governance

Data Governance is a principled approach to managing data during its life cycle from acquition to archival or disposal.

Definition:

Data governance is everything you do to ensure data is secure, private, accurate, available, and usable. It includes

– the actions people must take,

– the processes they must follow, and

– the technology that supports them throughout the data life cycle.

Data governance shall answer

  • Data Stewardship: accountability and responsibility for both the data itself and the processes that ensure its proper use 
  • Data Quality: Proper, authoritative, dependable and accurate
  • Data Management: encompassing all aspects of managing data as an enterprise asset, from collection and storage to usage and oversight, making sure it’s being leveraged securely, efficiently, and cost-effectively

References: