Saturday, April 1, 2017

how to model dimensions with cognos DMR in real life

Based on the design principle from Kimball, there are many clear defined guideline for different facts (transactional, periodic snapshot and accumulative snapshot) and dimensions (regular, fact, junk, etc.). The key concept is to use surrogate key to make a clear star and slow flake dimension, regardless of whether it is slow change dimension type 1 or type 2. However, this approach may not be used everywhere with data warehouse project. It have been many variations implemented in real life to reduce cost and complexity. In other word, we can establish dimension model (Cognos DMR) without ETL and STAR SCHEMA in database. A few approaches are listed in this document.

1.    Standard approach
Description:
-       Create physical table for each dimension
-       Use surrogate key in dimension (SCD1 and SCD2)
-       Star schema
-       DMR will be established based on star schema

Advantages:
-       Conformed dimensions can be easily established.
-       Superior performance  

Disadvantages:
-       Big effort, need to develop serious ETL.
-       Need to handle late arrived dimensions

 2.    Standard approach but without surrogate key
Description:
-       Create physical table for each dimension, whether it is static or dynamic.
-       The relationship between fact and dimension is established by business key, in many cases, you need to create a business key adaptor to establish relationship.
-       Star schema
-       DMR will be established based on star schema

Advantages:
-       Conformed dimensions can be easily established.
-       Much less effort to create data mart

Disadvantages:
-       You may still need to handle late arrived dimensions
-       Cannot  or very difficult to support slow change dimension type 2


3.    Create dimension based on multiple fact tables
Description:
-       NO physical table for dimension is created, Instead, we can use the query  to define dimension, such as
select distinct Column1, Column2, column3 ... from fact1
UNION
select distinct Column1, Column2, column3 ... from fact2
-       The relationship between fact and dimension is established by business key, such as connect Fact 1 to this dimension, and Fact 2 to this dimension
-       Star schema
-       DMR will be established based on star schema

Advantages:
-       Conformed dimensions can be established, as both Fact 1 and Fact 2 shares the same dimension.
-       no physical dimension and ETL are needed for this dimension.
-       Do not need to handle late arrived dimensions, as the data integrity is automatically enforced.

Disadvantages:
-       Cannot  support slow change dimension type 2
-       Could result in a bad performance

 4.    Create dimension based on single fact table with hierarchy
Description:
-       NO physical table for dimension is created
-       NO query for dimension is created, and therefore no separated query subject is created. Instead , create additional columns for hierarchy on business layer.
-       NO relationship between fact and dimension is needed, as dimension is directly come from fact table
-       NO Star schema
-       DM layer will be established based on fact table

Advantages:
-       no physical dimension and ETL are needed for this dimension.
-       Do not need to handle late arrived dimensions, as the data integrity is automatically enforced.

Disadvantages:
-       Cannot  support slow change dimension type 2
-       Could be result in a bad performance, as dimension will be extracted from fact on the fly
-       Conformed dimensions can NOT established

5.    Create dimension based on single fact table with NO hierarchy
Description:
-       NO physical table for dimension is created.
-       NO query for dimension is created.
-       NO query subject in both data tier and business tier is created.
-       NO relationship between fact and dimension is needed, as dimension is directly come from fact table
-       NO Star schema
-       dimension will be established by simply dragging one attribute to DM layer

Advantages:
-       no physical dimension and ETL are needed for this dimension.
-       Do not need to handle late arrived dimensions, as the data integrity is automatically enforced.

Disadvantages:
-       Cannot  support slow change dimension type 2
-       Could result in a bad performance, as dimension will be extracted from fact on the fly
-       Conformed dimensions can NOT established
-       No hierarchy can be used.

How to enable custom report by business users


  1. Context

Power user can be enabled to create and publish report as needed
  1. Basic setup

2.1 Created a folder in Cognos to host all custom developed report in Cognos server
2.2 Added a folder in OpenPages to report menu in OPX back end


2.3 Configure user access to report Studios
2.4 Enable permission to add Report to Reporting Menu


  1. Create new report under My folder and copy it to GOR Admin reports

3.1


3.2


3.3


3.4


3.5


  1. Publish/Share report

4.1


4.2


4.3
4.4



Wednesday, March 1, 2017

How to model a effective date dimension in DMR with alias shortcut


This document is a practical extension of concept described by my ex-post. (How to use “Alias Shortcut” for role playing dimensions in Framework Manager)  The goal is try to figure out a most effective way to design date dimension in DMR. The date dimension must satisfy following criteria:
1)    There must be only one physical date dimension. Once changed, the date dimensions at business layer will reflect the change accordingly.

2)    There must be only one date dimension at business layer. Once additional columns of date dimension changed, all date dimensions should be changed accordingly.
3)    Date dimension at date dimensional layer should be user friendly.

 
1.    Approach in detail

Database Layer:
-       Date dimension  (from physical table)

Business Layer:
-       Date1 dimension  (model from database layer and added all additional columns)
-       Date2 dimension  - alias shortcut to Date1, Please note any change of Date1 will be reflected in Date2.
-       Date3 dimension  - alias shortcut to Date1 , Please note any change of Date1 will be reflected in Date2.

Relationship
      FactTable ß----------à Date1
      FactTable ß----------à Date2
      FactTable ß----------à Date3

DM Layer:
-       Date1 dimension   --  mode dimension from Date1 from business layer
-       Date2 dimension  --  mode dimension from Date2 from business layer
-       Date3 dimension  --  mode dimension from Date3 from business layer
-       FactTable Measures

(It is impossible to define a single date dimension at DM layer and make other date dimension refer to the first date dimension. If there is any change for date 1 dimension at DM layer, you need to change Date 2 and Date 3 accordingly by manual.)

Presentation  Layer:
-       Date A dimension   --  Shortcut to Date1 dimension from DM layer
-       Date B dimension  --  Shortcut to Date2 dimension from DM layer
-       Date C dimension  --  Shortcut to Date3 dimension from DM layer
-       FactTable Measures
Date A, Date B and Date C is user friendly name.

 
2.    Sample implementation
The some screenshots below demonstrates the idea described above.



 
3.    Note
Cognos DOES NOT support reference shortcut to point to alias shortcut. 

Wednesday, February 1, 2017

how to write a effective SQL query to retrieve most recent records

It is often required to retrieve most recent record when building ETL to create data mart. The source table is normally very big, which make us pay attention to performance .The performance could be an issue even with data warehousing appliance like Netezza database. Three methods are introduced in this document.  Though many people use the method 1 (Use a Complete Subquery), it is not a good solution, as this method could results in duplicated records, and poor performance. Based on my experience, the method 3 (Use Window Functions)  is best option, as it provides you no duplicates and best performance.

 Use a Complete Subquery

SELECT F.*

FROM   SourceFact F

INNER JOIN (SELECT SourceID,Max(TIMESTAMP_SUBMITTED) AS LAST_TIMESTAMP_SUBMITTED

            FROM   SourceFact

            GROUP  BY SourceID) F1

ON F.SourceID = F1.SourceID

AND F.TIMESTAMP_SUBMITTED = F1.last_TIMESTAMP_SUBMITTED;

 

if more than one sourceID with the same TIMESTAMP_SUBMITTED, then more than  records for this sourceID will be selected, which breaks the uniqueness of this query. Also, the performance is very poor when the table SourceFact is very big.

 
Use Correlated Subqueries
Correlated subqueries are subqueries that depend on the outer query. It’s like a for loop in SQL. The subquery will run once for each row in the outer query:

SELECT *

FROM   SourceFact

WHERE  ( SourceID, TIMESTAMP_SUBMITTED )

IN (SELECT SourceID,Max(TIMESTAMP_SUBMITTED) AS LAST_TIMESTAMP_SUBMITTED

    FROM   SourceFact

    GROUP  BY SourceID);

 This solution is even worse for big table, as it result in very poor performance.
 

Use Window Functions
Window function is little complicated, but it yields much better performance, 10-100 times for big table. In addition, it will grantee the uniqueness, which is extremely important in data warehouse design.

SELECT F.*

FROM   (SELECT ROW_NUMBER() OVER (

        PARTITION BY SourceID

ORDER BY TIMESTAMP_SUBMITTED desc)      LAST_TIMESTAMP_SUBMITTED,

          *

        FROM   SourceFact) F

WHERE  F.last_TIMESTAMP_SUBMITTED = 1;