{"id":2328,"date":"2022-10-31T09:07:48","date_gmt":"2022-10-31T13:07:48","guid":{"rendered":"https:\/\/shirishranjit.com\/blog1\/?page_id=2328"},"modified":"2022-10-31T15:11:08","modified_gmt":"2022-10-31T19:11:08","slug":"data-modeling-cheatsheet","status":"publish","type":"page","link":"https:\/\/shirishranjit.com\/blog1\/big-data\/data-modeling-cheatsheet","title":{"rendered":"Data Modeling Cheatsheet"},"content":{"rendered":"\n<p>Snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. &#8220;Snowflaking&#8221; is a method of normalizing the dimension tables in a star schema.<\/p>\n\n\n\n<p>A fact table consists of the measurements, metrics or facts of a business process. It is located at the center of start\/snowflake schema surrounded by dimension tables.<\/p>\n\n\n\n<p>A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions e.g. people, place, time, and son on. <\/p>\n\n\n\n<p>Types of fact tables<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Transactional &#8211; each grain associated with indicated as a row per line in a transaction<\/li><li>Periodic snapshots &#8211; <\/li><li>Accumulative snapshots<\/li><\/ul>\n\n\n\n<p>Designing fact and dimension tables are &#8211; choosing a business process to model, then defining grain meaning describing exactly what a fact table record represents. Then choose dimensions to describe and determine facts, lastly identify\/define facts.<\/p>\n\n\n\n<p>There may be faceless fact table &#8211; they describe either events\/activities or condition, eligibility and coverage.<\/p>\n\n\n\n<p>Dimension tables may have Surrogate keys that are key generated and managed inside the data warehouse rather than keys extracted from data source systems. <\/p>\n\n\n\n<p>Different types of Dimension <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Type 0 (Retain original)<\/strong>: Attributes never change. No history.<\/li><li><strong>Type 1 (Overwrite)<\/strong>: Old values are overwritten with new values for attribute. No history.<\/li><li><strong>Type 2 (Add new row)<\/strong>: For a new value, a new row is created with either a start date \/ end date or version. This creates a history.<\/li><li><strong>Type 3 (Add new attribute)<\/strong>: For a new value, a new columm is created. History is limited to the number of columns designated for storing historical data.<\/li><li><strong>Type 4 (Add history table)<\/strong>: One table keep the current value, where as the history is saved in a second table. This creates a history.<\/li><li><strong>Type 5 (Combined Approach 1 + 4)<\/strong>: Combination of type 1 and type 4. History is created through a second history table.<\/li><li><strong>Type 6 (Combined Approach 1 + 2 + 3)<\/strong>: Combination of type 1, type 2 and type 3. History is created through separate row and attributes.<\/li><li><strong>Type 7 (Hybrid Approach)<\/strong>: Both surrogate and natural key are used.<\/li><\/ul>\n\n\n\n<p>Dimensions:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Conformed Dimension: A conformed dimension is a set of data attributes that have been physically referenced in multiple database tables using the same key value to refer to the same structure, attributes, domain values, definitions and concepts. A conformed dimension cuts across many facts.<\/li><li>Junk Dimensions:<\/li><li>Degenerate Dimension: a transaction number, invoice number, ticket number, or bill-of-lading number, that has no attributes and hence does not join to an actual dimension table<\/li><li>Role-playing Dimension<\/li><li>Outrigger Dimension<\/li><li>Shrunken Dimension: A conformed dimensions that includes a subset of the rows and\/or columns of the original dimension<\/li><li>Calendar date dimension<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Data Governance<\/h2>\n\n\n\n<p>Data Governance is a principled approach to managing data during its life cycle from acquition to archival or disposal.<\/p>\n\n\n\n<p>Definition: <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Data governance is everything you do to ensure data is secure, private, accurate, available, and usable. It includes <\/p><p>&#8211; the actions people must take, <\/p><p>&#8211; the processes they must follow, and <\/p><p>&#8211; the technology that supports them throughout the data life cycle.<\/p><\/blockquote>\n\n\n\n<p>Data governance shall answer<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Data Stewardship: accountability and responsibility for both the data itself and the processes that ensure its proper use\u00a0<\/li><li>Data Quality: Proper, authoritative, dependable and accurate<\/li><li>Data Management: encompassing all aspects of managing data as an enterprise asset, from collection and storage to usage and oversight, making sure it\u2019s being leveraged securely, efficiently, and cost-effectively<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">References:<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>Data mart: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_mart\">https:\/\/en.wikipedia.org\/wiki\/Data_mart<\/a><\/li><li>Star Schema: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Star_schema\">https:\/\/en.wikipedia.org\/wiki\/Star_schema<\/a><\/li><li>Snowflake schema: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Snowflake_schema\">https:\/\/en.wikipedia.org\/wiki\/Snowflake_schema<\/a><\/li><li>Fact table: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fact_table\">https:\/\/en.wikipedia.org\/wiki\/Fact_table<\/a><\/li><li>Dimension: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dimension_(data_warehouse)\">https:\/\/en.wikipedia.org\/wiki\/Dimension_(data_warehouse)<\/a><\/li><li>OLAP Cube: <a href=\"https:\/\/en.wikipedia.org\/wiki\/OLAP_cube\">https:\/\/en.wikipedia.org\/wiki\/OLAP_cube<\/a><\/li><li>Data Cube: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_cube\">https:\/\/en.wikipedia.org\/wiki\/Data_cube<\/a><\/li><li>Multidimensional <a href=\"https:\/\/docs.oracle.com\/cd\/B13789_01\/olap.101\/b10333\/multimodel.htm\">https:\/\/docs.oracle.com\/cd\/B13789_01\/olap.101\/b10333\/multimodel.htm<\/a><\/li><li>Online analytical processing (OLAP): <a href=\"https:\/\/en.wikipedia.org\/wiki\/Online_analytical_processing#Multidimensional_databases\">https:\/\/en.wikipedia.org\/wiki\/Online_analytical_processing#Multidimensional_databases<\/a><\/li><li>Google Data Governance: <a href=\"https:\/\/cloud.google.com\/learn\/what-is-data-governance\">https:\/\/cloud.google.com\/learn\/what-is-data-governance<\/a><\/li><li>Information Architecture: <a href=\"https:\/\/xd.adobe.com\/ideas\/process\/information-architecture\/\">https:\/\/xd.adobe.com\/ideas\/process\/information-architecture\/<\/a><\/li><\/ul>\n<div class=\"twttr_buttons\"><div class=\"twttr_twitter\">\n\t\t\t\t\t<a href=\"http:\/\/twitter.com\/share?text=Data+Modeling+Cheatsheet\" class=\"twitter-share-button\" data-via=\"\" data-hashtags=\"\"  data-size=\"default\" data-url=\"https:\/\/shirishranjit.com\/blog1\/big-data\/data-modeling-cheatsheet\"  data-related=\"\" target=\"_blank\">Tweet<\/a>\n\t\t\t\t<\/div><div class=\"twttr_followme\">\n\t\t\t\t\t\t<a href=\"https:\/\/twitter.com\/shiranjit\" class=\"twitter-follow-button\" data-size=\"default\"  data-show-screen-name=\"false\"  target=\"_blank\">Follow me<\/a>\n\t\t\t\t\t<\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. &#8220;Snowflaking&#8221; is a &hellip; <a href=\"https:\/\/shirishranjit.com\/blog1\/big-data\/data-modeling-cheatsheet\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":396,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2328","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/2328"}],"collection":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/comments?post=2328"}],"version-history":[{"count":12,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/2328\/revisions"}],"predecessor-version":[{"id":2341,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/2328\/revisions\/2341"}],"up":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/396"}],"wp:attachment":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/media?parent=2328"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}