Step 7.1 - Best Practices for Data Management and Modeling

The process proposal assumes the architecture above will be in place. The process includes some thought to future uses for the data especially considering data governance and reducing the need to do re-work for different used and projects using the source data. At minimum the data management process needs to have included the steps below (sub steps are not exhaustive but to give some idea of what the step will involve):

Source Data Profiling

This step is meant to be performed only once for a source. It is not meant for a specific use, but so that we can begin to profile data, standardized and collect metadata for future uses of the source.

  1. Identify source tables to stage
  2. Assess columns in use to keep
  3. Field name standardization and business friendly naming
  4. Identify measures (counts, dollars, )
  5. Identify data anomalies and data completeness. For example, number of null values in a column

Data Staging

This step may be completed in two phases. The initial copy/staging the source tables and the profiled, cleanses, reduced column version that will be used for all downstream uses.

  1. Obtain source underlying relational data model
  2. Set up ETL jobs to transfer source data to stage area
  3. Set up data cleansing and column reduction from profiling
  4. Apply standard naming

Data Modeling

This will be needed for sources not currently in a STAR schema. Although if source STAR schemas are not efficient, complete or will be integrated with other data sources this step may still be necessary.

  1. Identify fact table candidates and granularity needed (e.g. transaction, patient, encounter)
  2. Identify fact table measures
  3. Identify dimension tables
  4. Design Star Model (Suggest to de-normalize as much as possible, say 2n normal form)
  5. Standardize all Key field names

ETL to Base Data Marts

The Base Data Marts are the marts that will be in Tableau efficient STAR schemas ready for use or ready for further filtering, column reduction or aggregation.

  1. Data Mart Build
  2. Base Data Mart Validation
  3. Base Data Mart Metadata Documentation

Optional: Data Modeling Integrated Data Mart

Building integrated STAR schema data marts may require going back to the detail staged data to identify where data from different sources can be combined.

  1. Identify Base Data Marts / Stage Data
  2. Identify Fact table candidates
  3. Identify fact table measures
  4. Identify dimension tables
  5. Design Star Model for integrated model

Optional: ETL to Extract Build Data Marts or Create Views

Some efficiencies may be gained at this step-in pre-aggregating data. Assessment of the volume of data and time to create the extract as well as refresh frequency would be considered whether a materialized table or view is even needed rather than building off the base data mart.

  1. Column reduction - include only necessary columns for an extract or live connection
  2. Row Filtering and/or Aggregation - include only necessary rows
  3. Build Data Mart Validation or Views

Extract Build

  1. Extract Prep - Disk space and sizing analysis on both the DBMS and Tableau Server for extract builds
  2. Extract build
  3. Extract publishing
  4. Extract Validation

Production Setup and Scheduling - set for both ETL to STAR schema and (.hyper) extract creation

  1. Timing consider source database updates
  2. Hand-off from source to stage
  3. Refresh frequency and job setup

Next


For any questions or suggestions, please e-mail to [email protected]