Re: Relational Views as Transformation Method in Data Warehouse
On Sep 26, 4:14 pm, Marco Russo wrote:
> On Sep 20, 9:04 pm, DanUp <dupton.@.DecisionLab.Dot.Net.> wrote:
>
>
>
>
>
> > Open Question:
> > A recent post-reply, one expert (Marco Russo) indicated that he doesn't like
> > to use relational views as a quasi-ETL method, but rather prefers real
> > data-moving ETL to de-couple OLTP data from DW Data. I do agree with the
> > importance of the classic de-coupling approach, but would like to hear more,
> > from Marco or others, about whether relational views, per se, should always
> > be avoided as a component to ETL with OLAP downstream.
>
> > Specific Scenario:
> > For a SQL 2005 Data Mart / SSAS BI Application, suppose that, rather than
> > heavily transforming our large (~500g) operational data (which needs some,
> > but not tons of transformation) during it's transfer to a Data Mart box,
> > suppose we bulk-copy it (for speed of job completion), and then use
> > relational views to transform it (cleanse, filter, perform calcs), in
> > preparation for relational views' usage in SSAS '05?
>
> > Questions:
> > (1) Does anybody see value in that we now have more flexibility in terms of
> > change-management during project lifecycle, to manipulate actual table data
> > without disturbing (if we're careful) the views used for SSAS?
> > (2) How else might this hurt us, either now or later?
> > (3) By the way, will Proactive Caching work fully against relational views?
>
> > Thanks.
>
> > Daniel Upton
>
> I read other replies to this thread and I agree that this is more an
> implementation issue than a functional issue.
> That said, having consistent guidelines helps a lot if you want to
> build a maintainable solution. I don't remember what was my post-reply
> (if you can give me a link...) - I read your scenario and your
> solution works. You leverages on SQL engine to transformate data. If
> you would use SSIS to do data move and/or transformation, then it
> would be better having all transformations inside SSIS package.
> However, still today I find myself writing JOINs into source query
> (that acts on a copy of OLTP data, as in your case) just because
> otherwise SSIS would consume too much memory. As we said, it's an
> implementation issue.
> This topic is very large. In a perfect world, all ETL transformations
> are described through consistent meta-data that allows an easy
> analysis of field dependencies between source and destination data. In
> the real world, I still haven't seen a perfect solution like that :-)
>
> I'm very interesting in this kind of discussions - if there is someone
> interested, we could open a separate discussion on that (eventually
> moving out from the general newsgroup).
>
> Marco Russohttp://www.sqlbi.euhttp://sqlblog.com/blogs/marco_russo- Hide quoted text -
>
> - Show quoted text -
Hi Daniel/All,
this is a broad topic.....but let me put my 2cents in....
With faster machines and cheaper database licenses (relative to what
you can do) we moved to using the RDBMS as the staging area over files
in the 1997 time frame.
Of course, over the last 10 years machines have become faster
still.....so nowadays we see a number of benefits to heavily using
views in the ETL processing.
Specifically the track that we have now gone down is to create an
Excel workbook into which virtually ALL ETL transformations that are
needed in a typical mid-tier company can be coded. We typically need
to have very, very little ETL outside of the definitions in the
workbook....we use a utility to generate views out of the workbook
that are placed over both the source and the target and our ETL
software pumps data from source to target with some specific
processing defined for each program.
The advantage we see in views is the complete disconnection of the
physical database from the ETL subsystem. This means the DBA can make
changes to the physical database and be certain that nothing on top of
that will be damaged. This is a very handy thing to be able to do.
We also load all the definitions in the spreadsheet into a small
reporting database and expose them via Report Services reports......
using these tools we can write ETL subsystems about 10x faster
(development time not run times) than writing them in other tools
including SSIS. So we think this approach has signicant merit and we
now sell the tools we developed.
We see the future of ETL Subsystem devlopment as defining all ETL
directives into Excel workbooks and then generating whatever needs to
be generated to create a runnable ETL subsystem. By doing this the
development of the ETL subsystm becomes encoding the mappings into the
workbook and testing.......there will be no faster way....ever...of
building ETL subsystems.....
Best Regards
Peter Nolan
www.peternolan.com
date: Sat, 03 Nov 2007 10:15:06 -0700
author: Peter Nolan