Data Modeling Cheatsheet | Shirish Ranjit

Snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. “Snowflaking” is a method of normalizing the dimension tables in a star schema.

A fact table consists of the measurements, metrics or facts of a business process. It is located at the center of start/snowflake schema surrounded by dimension tables.

A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions e.g. people, place, time, and son on.

Types of fact tables

Transactional – each grain associated with indicated as a row per line in a transaction
Periodic snapshots –
Accumulative snapshots

Designing fact and dimension tables are – choosing a business process to model, then defining grain meaning describing exactly what a fact table record represents. Then choose dimensions to describe and determine facts, lastly identify/define facts.

There may be faceless fact table – they describe either events/activities or condition, eligibility and coverage.

Dimension tables may have Surrogate keys that are key generated and managed inside the data warehouse rather than keys extracted from data source systems.

Different types of Dimension

Type 0 (Retain original): Attributes never change. No history.
Type 1 (Overwrite): Old values are overwritten with new values for attribute. No history.
Type 2 (Add new row): For a new value, a new row is created with either a start date / end date or version. This creates a history.
Type 3 (Add new attribute): For a new value, a new columm is created. History is limited to the number of columns designated for storing historical data.
Type 4 (Add history table): One table keep the current value, where as the history is saved in a second table. This creates a history.
Type 5 (Combined Approach 1 + 4): Combination of type 1 and type 4. History is created through a second history table.
Type 6 (Combined Approach 1 + 2 + 3): Combination of type 1, type 2 and type 3. History is created through separate row and attributes.
Type 7 (Hybrid Approach): Both surrogate and natural key are used.

Dimensions:

Conformed Dimension: A conformed dimension is a set of data attributes that have been physically referenced in multiple database tables using the same key value to refer to the same structure, attributes, domain values, definitions and concepts. A conformed dimension cuts across many facts.
Junk Dimensions:
Degenerate Dimension: a transaction number, invoice number, ticket number, or bill-of-lading number, that has no attributes and hence does not join to an actual dimension table
Role-playing Dimension
Outrigger Dimension
Shrunken Dimension: A conformed dimensions that includes a subset of the rows and/or columns of the original dimension
Calendar date dimension

Data Governance

Data Governance is a principled approach to managing data during its life cycle from acquition to archival or disposal.

Definition:

Data governance is everything you do to ensure data is secure, private, accurate, available, and usable. It includes
– the actions people must take,
– the processes they must follow, and
– the technology that supports them throughout the data life cycle.

Data governance shall answer

Data Stewardship: accountability and responsibility for both the data itself and the processes that ensure its proper use
Data Quality: Proper, authoritative, dependable and accurate
Data Management: encompassing all aspects of managing data as an enterprise asset, from collection and storage to usage and oversight, making sure it’s being leveraged securely, efficiently, and cost-effectively

References:

Data mart: https://en.wikipedia.org/wiki/Data_mart
Star Schema: https://en.wikipedia.org/wiki/Star_schema
Snowflake schema: https://en.wikipedia.org/wiki/Snowflake_schema
Fact table: https://en.wikipedia.org/wiki/Fact_table
Dimension: https://en.wikipedia.org/wiki/Dimension_(data_warehouse)
OLAP Cube: https://en.wikipedia.org/wiki/OLAP_cube
Data Cube: https://en.wikipedia.org/wiki/Data_cube
Multidimensional https://docs.oracle.com/cd/B13789_01/olap.101/b10333/multimodel.htm
Online analytical processing (OLAP): https://en.wikipedia.org/wiki/Online_analytical_processing#Multidimensional_databases
Google Data Governance: https://cloud.google.com/learn/what-is-data-governance
Information Architecture: https://xd.adobe.com/ideas/process/information-architecture/

Follow me