By Larissa Moss - TDWI
Most organizations treat project management as an administrative function. A project manager often “manages” multiple projects. However, a more accurate way to define a project manager would be to say that he or she “administers” multiple projects because he/she is rarely involved in any daily project activities. The project teams merely report to him/her.
Project managers, assuming that data warehouse projects are like any other project, are often surprised when their data warehouse project spins out of control. The requirements appear to be a “moving target;” the schedule keeps slipping; the source data is much dirtier than expected and is impacting the ETL team; the staff does not have the necessary skills and is not properly trained; communication between staff members takes too long; traditional roles and responsibilities, and how they are assigned, seem to result in too much rework; the traditional methodology does not seem to work; and so on.
Techniques that work on other projects do not work well on data warehouse projects. This booklet describes how to avoid 10 common mistakes made by data warehouse project managers.
Larissa Moss, president of Method Focus Inc., specializes in data warehousing, business intelligence, information quality, data integration, project management, and spiral data warehouse methodologies. She presents and lectures at various conferences worldwide. She co-authored the books Data Warehouse Project Management, Impossible Data Warehouse Situations, Business Intelligence Roadmap, and Data Strategy. Her works have been published in trade journals, including DM Review, Teradata Magazine, and TDWI’s Business Intelligence Journal. Additionally, her white papers are available through the Cutter Consortium and NCR/Teradata. She can be reached at methodfocus@earthlink.net.
1. Failing to Use a Methodology
Software development has become relatively lax over the past two decades, and the use of system development methodologies has become more of an exception rather than a rule. Project teams—as well as business users—seem to think that with all the new development tools available, system development is (or should be) trivial. They are often surprised to learn that project managers and project teams must consider approximately 920 tasks when developing a data warehouse. Who can remember 920 tasks? No one. But every one can look up 920 tasks in a methodology.
Having the right kind of methodology is important. It cannot be a traditional “waterfall” methodology because that type of methodology assumes you are building a stand-alone “final” product, which does not have to integrate with any other product and will not dramatically evolve or expand over time. Thus, a traditional methodology does not include cross-organizational business integration tasks. Since a data warehouse is an evolving environment with many databases and applications, it is important to design databases and processes for reuse whenever possible. This requires specific integration tasks that a data warehouse methodology must provide.
In addition, a data warehouse methodology must take into account that a data warehouse environment cannot be built all at once. In other words, the deliverable will not be a stand-alone “final” product, but will have to be expanded and enhanced over time. If a data warehouse is successful, then each release will most likely generate new requirements. Sometimes these requirements will be for a brand new data warehouse application, but many times they are simply an enhancement of an existing application. Periodically, these new requirements may even demand that new technology be evaluated and purchased. A data warehouse methodology provides appropriate tasks for all of these activities.
Another differentiating aspect of data warehouse projects is that you have to manage multiple sub-projects in parallel. One such sub-project is the development of the data warehouse application (e.g., reports, canned queries, or customized cubes for slicing and dicing). Then there is the ETL process, including data profiling, data transformations, and data cleansing in addition to source data extracting and target data loading. A third sub-project may be building and loading the metadata repository. And there may even be a data mining deliverable, requiring its own development track.
A data warehouse methodology includes tasks for all of these development activities, and recognizes that many of these activities can run simultaneously.
Since metadata is an important deliverable, it deserves special mention when discussing methodologies. Not only does a data warehouse methodology have to include tasks for gathering, storing, and delivering metadata to the business community, it must also provide tasks for either evaluating and installing a purchased metadata repository product, or designing and building one.
In an evolving and expanding data warehouse environment, where maximum reusability must be built into all deliverables, it is important to continuously review and improve the environment. That means reviewing old and new requirements against existing data warehouse databases and applications, and finding ways to reuse what has already been built. Such reviews may result in requirements for minor database design changes, or program changes to the ETL process, reports, queries, or other applications. The methodology must provide tasks for conducting such reviews and folding the resulting changes into the next data warehouse project.
Taking the many development steps into account (from business case assessment to post-implementation review) and considering that most data warehouse projects are composed of several sub-projects, it is easy to understand that there are hundreds of tasks to be considered. Naturally, not all tasks have to be executed on each project, but all tasks must be known to the project manager so that he/she can pick the right ones for each data warehouse release.
The role of a methodology is to provide a list of all possible tasks, their dependencies, the roles and responsibilities assigned to execute them, and the deliverables resulting from them. Not using a methodology almost guarantees that vital tasks will be dropped, requiring rework that could have been avoided.
2. Ineffective Project Team Structure
Traditional project teams are not structured to cope effectively with the dynamic nature of data warehouse projects and cannot react fast enough to the constant changes and challenges. What’s a traditional project team structure? Typically, the project manager alone defines and plans the project, and assigns a discrete set of tasks to each project team member. When a team member completes a task, the deliverable gets “handed over the cubicle wall” to the next team member who performs his/her assigned tasks and hands over the deliverable to the next person, and so on. Then on Friday afternoon, all team members submit a status report of their individually assigned tasks to the project manager who uses these reports to monitor and control the project activities. Occasionally, or regularly, team meetings are called to exchange information, and when a problem arises, special meetings are arranged with the business people or other stakeholders who can help resolve the issue.
A data warehouse team must be much more flexible and dynamic than that. There should be a core team of four to five people who together define, plan, and co-lead the project. The core team should be thought of as a high-powered, self-organizing SWAT team. Core team members must be 100 percent available from the beginning to the end of the data warehouse project. They brainstorm together, assign work to each other, review each other’s deliverables (peer reviews), resolve issues, and make project-related decisions together. This team should be staffed with senior-level team members who are experts in:
- Project management (a lead person, not an administrator)
- Subject matter expertise (a business representative, not an
IT person) (1) - Business analysis practices (data modeling and process modeling)
- System analysis techniques (light programming)
- Programming (ETL, OLAP, report writers, metadata repository, etc.)
Each person on the core team can be, and probably will be, assigned multiple roles. The core team roles and their main responsibilities are listed in the following table.
| Core Team Role | Major Responsibilities |
| Application Lead Developer | Design and oversee the development of the data warehouse access and analysis applications. |
| Business Representative (2) | Make business decisions, resolve disputes between business units, and improve the source data quality. |
| Data Administrator | Perform cross-organizational data analysis, establish naming standards, create the project-specific logical data models and merge those models into an enterprise logical data model. |
| Database Expert (Architect and Administrator) | Design, load, monitor, and tune the data warehouse databases. |
| Data Quality Analyst | Assess source data quality and prepare data cleansing specifications for the ETL process. |
| ETL Lead Developer | Design and oversee the development of the ETL process. |
| Metadata Administrator | Build or buy, enhance, load, and maintain the metadata repository. |
| Project Manager | Define, plan, control, and review all project activities. |
| Subject Matter Expert | Provide business knowledge about data, processes, business rules, metadata, and requirements. |
| Technical Architect | Establish and maintain the technical infrastructure (hardware, network, middleware, system software). |
(1) One-hundred percent availability from a business representative is a critical success factor for a data warehouse project. If management resists releasing one businessperson full time, it’s an indication that they don’t support the data warehouse as a critical strategic business initiative.
(2) The business representative role on the core team is usually assigned to the primary business user who represents the business units for which the data warehouse application is being developed. This person must be authorized to make business decisions on behalf of the business community he/she represents.
3. Failing to Involve the Business People
Data warehouse projects are notoriously dynamic. That can be good or bad. Usually, it is perceived as bad because requirements change constantly; the scope is hard to control; the timeframes for delivering applications are unreasonably short; the data is usually dirtier than expected; business people are hard to pin down to provide business rules for data cleansing; project team members are often unclear about their specific roles and responsibilities, etc.
On the other hand, dynamic projects can be good because business people have an opportunity to learn about any new technology or tools early on. They also have an opportunity to “play” with their requirements and adjust them as they learn more about the capabilities and limitations of the data warehouse. IT folks can experiment with different database and application designs, and they can negotiate the project scope and delivery time to be more realistic depending on the difficulties they encounter. They can also profile the source data early and show all the data defects to the business people for resolution or deferment.
It should be obvious that dynamic projects have to be set up differently from traditional projects if the dynamics are to have a positive impact on the project team or the project schedule. The differences include adoption of a rapid development approach similar to prototyping, acceptance of the software release concept, a self-organizing SWAT team, and full-time involvement of business people in project activities. This is a paradigm shift for how applications are developed, and only a few business and IT people are comfortable with it. In contrast, organizations that successfully practice “extreme programming” techniques understand the benefits of this new approach because the prerequisites for extreme programming are the same (rapid development, software releases, SWAT teams, and participation from business people).
Why is it so important for business people to participate and what would they be doing on the projects? The most important reason business people should participate is to speed up the development work. It is a common complaint among IT people that situations come up several times a week for which they need input from the business people. But business people don’t often make themselves available. And when they do, it can be too late, especially when weeks have passed and IT has already assumed how best to resolve the situation. Sometimes IT guesses incorrectly, leading to rework, which can impact the project schedule. When combining these situations with changing business requirements that are not subjected to rigorous impact analysis, the result is a frustrated IT team and unhappy business users who don’t understand why IT takes so long to deliver the application. Business people must “live” the projects alongside IT in order for the entire project team to be more productive. Hence, business people must make time to become participating members of data warehouse project teams. They must participate in project planning, perform impact analysis on their own requirement (scope) changes, remove business-related road blocks (like data disputes between business areas), and perform other business-related project activities such as:
- Determining project deliverables for each software release
- Participating in tool evaluation and selection
- Negotiating data and functional requirements
- Participating in data and process modeling sessions
- Providing data definitions and business rules
- Participating in testing activities, including writing the
test cases - Profiling the source data and validating the quality of data
- Identifying the cleansing rules for dirty source data
- Validating/testing the accuracy of the ETL programs
- Validating/testing the accuracy of reports and queries · Resolving disputes among business units
- Monitoring/auditing the data warehouse data on an ongoing basis
- Participating in post-implementation review discussions
Business people must be told that they are an invaluable and indispensable part of an effective data warehouse project team because they possess certain knowledge and authority their IT counterparts don’t have. In addition, business people understand the severity and monetary impact of their organization’s business problems, and they are the only ones with the position and authority to negotiate the priorities of data warehouse projects.
4. Failing to Have Application Releases
The deliverable of a data warehouse project is usually a fully operable data warehouse application with a lot of functionality and a lot of data. The source data typically resides on multiple (and often heterogeneous) operational files or databases, which adds to the complexity of integrating the source data. The amount of data redundancy, data inconsistency, and data defects is habitually underestimated. Project teams who did not plan to spend the majority of their time on data cleansing are caught off guard—especially because they are expected to implement the fully functioning data warehouse application in an extremely short timeframe. In short, many project teams bite off much more than they can chew (i.e., their scope is much too large for their deadline).
It has been said for years that “you cannot build a whole data warehouse [environment] in one big bang.” Nobody challenges that anymore. But that does not go far enough in the attempt to reduce scope and complexity to a manageable chunk of work. Therefore, the new mantra should be: “you should not build a data warehouse application in one big bang.”
Following the principles of extreme programming, extreme project management, extreme methodologies, we should also adopt extreme scoping. Extreme scoping means reducing the complexity, and thus, the scope of each project in order to deliver something to the business users in a very short period of time. The first something would equate to only a fraction of the requested fully functioning application, but more functionality and more data would be added rapidly with each subsequent application release. Many business users still balk at this approach and insist on the “minimum required” deliverable, saying that anything less is of no use to them. They don’t realize that not only are they not losing anything, but they are gaining a lot with application releases.
Building an application in small iterations (application releases) will not take any longer than building the whole enchilada at one time. In addition, the business users can see their application grow and catch mistakes or adjust their requirements if needed (under strict change control procedures). This will greatly enhance the quality of the final fully functioning application. Another benefit of this approach is the opportunity for business people to slowly become familiar with their data warehouse application and any new technology or tools before the application is even completed.
Maybe the best way to illustrate the effectiveness of extreme scoping with application releases is to recall the concepts of prototyping. In prototyping we focus on a small (partial and incomplete) scope that is not too complex, and we produce a not yet fully functioning application. The next prototype release includes another small portion of the overall scope with a little more complexity and a little more functionality. This process can be repeated until the application is fully functioning. Although application releases are based on the same concepts as prototyping, they are not equal to prototyping. The difference is that traditional prototyping is pure ad hoc development, whereas application releases demand all necessary project activities to be performed with the rigor of a methodology.
Scoping data warehouse projects will remain a struggle as long as we are married to the idea that a project must produce a fully functioning application. But if we use application releases to build data warehouse applications, controlling the scope becomes much easier because the complexity of each release is reduced, the number of activities performed is decreased, and the project team can be smaller, rendering the methods for controlling the project more effective.
5. Failing to Have an Active Project Charter
Most project managers create a short document that describes their project in terms of high-level requirements, users, schedule, resources, and budget. This document is often known as a document of understanding, project agreement, scope agreement, or project charter. Frequently, this document is created by copying an old document from a previous project and changing a few details here and there. Once the project is kicked off, this document disappears into a project manual—never to be seen again.
A well-thought-out project charter is a very useful instrument and should be used actively to monitor and control project activities during the entire development cycle of a data warehouse project. Therefore, the project manager and the business user, or business sponsor, should spend some time documenting the details of their agreement in this charter. While a detailed project charter can contain as many as 20 sections, the following four sections are the most useful to serve as a baseline for change control:
- Scope
- Risks
- Assumptions
- Constraints
Traditionally, scope has been measured by the number of functions the system will perform (function point analysis)—a sure way to underestimate effort, budget, and resources. Data warehouse applications are data-intensive, not function-intensive. Therefore, scope must be measured by the number of data elements that have to be extracted from the source systems, transformed and cleansed, and loaded into the data warehouse target databases.
Every project is subject to some risks—risks are unavoidable. Such risks could severely affect the project schedule as well as the project deliverables, depending on the likelihood the risks will materialize and the impact they would have on the data warehouse project. The project manager must identify triggers for each risk and incorporate a risk mitigation plan as well as a contingency plan into the project program.
An assumption is anything taken for granted; a supposition or a presumption. It’s important to document assumptions because a wrong assumption could very quickly turn into a risk. Important assumptions should always have counterpart risks, in case the assumptions either turn out to be false or do not materialize.
All projects are subject to the four constraints of scope, effort (time), budget, and resources (capable and available people). In reality, there is a fifth constraint: quality. Although quality is a measure of how well the deliverables meet the requirements, it can also be considered a constraint that must be balanced with the other constraints because higher quality requires more effort and therefore more time to deliver.
The project charter should record the agreed-upon scope negotiated under the stated risks, assumptions, and constraints. If any of these components change, the entire project has to be reevaluated and renegotiated, and the changes should be reflected in the revised project charter so it can be used as a baseline for the next change request.
Unfortunately, business managers and IT managers frequently put their project teams under unwarranted pressure to incorporate scope changes without ever referring back to the original agreement in the project charter, and also without performing the necessary impact analysis.
6. Lack of a Readiness Assessment
Many project managers attend data warehouse conferences and learn best practices for planning, designing, and implementing data warehouses. But, when they return to their organizations and try to apply the best practices, they often encounter resistance from business users and “uninitiated” IT members. Regardless of how hard the project managers try to educate, convince, or force those who oppose them to follow best practices, they are often unsuccessful.
The mistake these project managers make is not realizing their organizations aren’t ready to suddenly stop building traditional stand-alone systems and begin building an integrated data warehouse environment. Some organizations are bent on trying every shortcut and silver-bullet solution before they admit those solutions only add to their data chaos.
Therefore, at the beginning of the initiative, an organization’s readiness should be assessed (understanding, ability, and willingness). A readiness assessment would include the following questions:
1. Have the goals and objectives been defined?
2. Do the goals and objectives for the data warehouse map to those
of the organization?
3. Has the source data been inventoried and modeled?
4. What is the quality of the source data?
5. Are the skills in place to build and support the data warehouse?
6. Is an adequate budget in place?
7. Has supporting software (ETL, cleansing, DBMS, etc.) been
selected and installed?
8. Is there a strong, well-placed, and reasonable business sponsor?
9. Does the business sponsor understand that a data warehouse is
not a stand-alone system?
10. Are the primary business users computer literate?
11. Are the business users’ expectations realistic?
12. Do the business users understand they have to participate in
project activities?
13. Does the business sponsor accept the approach of building
applications iteratively (using the software release concept)?
Based on the assessment, the project manager can determine which best practices to implement. Periodically, the questionnaire should be re-distributed to gauge the organization’s understanding of data warehousing. At that time, more best practices can be incorporated.
Data warehouse testing is often done poorly. It’s unacceptable—and so is the excuse that “it can be fixed in the next release.” If it takes too long to test the data warehouse application properly now, it will take even longer to test it later because the next release will be larger and more complicated.
The same types of testing activities that apply to operational systems also apply to data warehouse applications, including unit testing, integration (systems) testing, performance (stress) testing, quality assurance testing, and user acceptance testing.
Unit testing refers to the testing of discrete program modules and scripts. Every developer must test his or her program modules and scripts individually.
Integration testing tests the complete process. The interactions and flow of all programs must be observed and validated. Every time actual test results do not equal the expected test results, the program producing the error must be corrected, and all programs must be rerun.
The most complicated and time consuming type of testing is regression testing. The main goal of regression testing is to ensure any modifications to existing programs did not inadvertently produce errors.
Performance testing, also known as stress testing, is performed to predict system behavior and performance. Performance testing can be limited to only the most critical program modules with the highest volumes of data and the longest runtimes.
Most large organizations have strict procedures for moving applications into production. These procedures usually include QA testing, at which time the operations staff goes through a simulated production run before allowing the application to transfer to the production environment.
Acceptance testing is done by business users. They validate the functionality of the data warehouse application.
With the possible exceptions of unit and performance testing, all other testing activities are controlled by a test plan. The bulk of the plan will be a list of test cases. Each test case specifies the input criteria and the expected output results for each run. It also describes the program logic performed and the appearance of the resulting data.
8. Underestimating Data Cleansing Efforts
Most organizations admit they are not paying sufficient attention to their data quality. As disheartening as the situation is with operational source systems, it is very discouraging to see that many IT and business managers are continuing to put pressure on project teams to build data warehouses quicker using the motto: “There’s never enough time to do it right, but always enough time to do it over.”
With project schedules shrinking and project scopes expanding, project managers are under the gun to deliver more in less time. Therefore, they habitually do not allocate enough time for source data analysis, business rule discovery, data cleansing, data reconciliation, and ETL testing. As a result, two things happen: (1) many data defects propagate into the data warehouse unnoticed, and (2) some dirty data is discovered too late when data exceptions are caught during ETL testing or while loading the data warehouse databases.
To avoid project delays, the project manager should build sufficient time into the project plan to profile each data element. Common data violations to look for include:
- Missing data values
- Default values that actually have a meaning, e.g., using
“888-88-8888” as a social security number to indicate
a non-resident alien - Logic embedded in a data value, such as an implied roll-up
structure, e.g., a 10-digit account number where the first four
digits are the branch number - Cryptic and overused data elements, e.g., using the values
“A, B, C, D” to mean type of customer, while the values
“E, F, G, H” mean type of location - Multipurpose data elements, e.g., data elements redefined using
the old COBOL “redefines” clause - Contradicting data values among dependent data elements, e.g., “Boston, CA” · Reused primary key, e.g., two different employees (one retired,
one active) with the same employee number - No unique primary key, e.g., one customer with multiple
customer numbers - Objects without their dependent parent object, e.g., job
assignments for employee 3321, but employee 3321 does not exist in the employee database
Metadata is nothing new—it has always been a part of automated systems. It was found in system documentation, record layouts, database catalogs, and data declaration sections in programs. In fact, it used to be called the dirty “D” word—documentation. Since technicians detested the thought of doing documentation, they often simply didn’t do it.
However, in a data warehouse environment, metadata takes on a new level of importance. Since one data warehouse objective is to eliminate inconsistencies, data must be standardized. Standardization may result in renaming the data, splitting one source data element into multiple target columns, or populating one target column from multiple source data elements. It can also mean translating codes into mnemonics, standardizing (changing) data values, and filtering out inappropriate or invalid records. At the end of the day, business people will not be able to reconcile their operational source data to the data warehouse data unless they have the ability to trace these changes.
Therefore, metadata is now the nice “N” word—navigation. It helps the business people locate, manage, understand, and use the data in the data warehouse databases. It describes what data is available in which database, what it means, where it came from, how it was processed, how clean it is, and how it is used in reports and queries. Not delivering any metadata that could help business people navigate through their data warehouse environment is a mistake.
Granted, implementing a metadata repository has its challenges. Although many data warehouse experts consider metadata to be the “glue” holding the warehouse environment together, most organizations allocate little or no money for creating and maintaining a metadata repository. Also, metadata should be “living” documentation that is constantly updated, which means—at a minimum—one metadata administrator must be assigned full time to manage the metadata repository. But regardless of the challenges, metadata must be an integral part of a data warehouse environment.
And, since the importance of metadata is still not understood by most business executives, project managers need to do a much better job of promoting metadata and communicating its benefits to the business people.
10. Being a Slave to Project Management Tools
Planning and controlling a project is not a trivial task. It takes a long time to create a work breakdown structure, estimate effort and duration time for all tasks, apply task dependencies and resource dependencies, and determine the critical path. Critical path refers to a string of dependent tasks that cannot be late without affecting the project schedule, as compared to other tasks executed at the same time with slack time built into them. For example, if it takes four days to evaluate a product but only three days to create a project charter (at the same time the product is being evaluated), then the task of evaluating a product is the critical path because it has no leeway in timing. If that task is a day late, the project schedule is impacted.
Knowing where the critical path tasks are at any point during a project is crucial to staying on track. Since estimates are only best guesses based on prior experience with similar tasks, the actual time it takes to complete a task usually differs from its estimate. These differences can easily change the critical path. Tracking the differences between estimated and actual time, and continuously adjusting the critical path has enslaved more than one project manager to his/her project management tool. Other project managers find this activity too laborious and too tedious, and they stop tracking the critical path altogether.
The best approach (and compromise) for tracking the critical path is to do it at the milestone level rather than at the task level because there are fewer milestones than there are tasks. Thus, it takes less time and effort to continuously adjust the project plan. The critical path among the tasks between the milestones can be tracked more informally using a whiteboard or a flipchart instead of the project management tool. This approach can be safely and effectively used when the scope of the data warehouse project is very small and the project team is managed by a self-organizing core team (SWAT team) rather than by
a project manager who is not involved in daily project activities.




