Why do it and what should it include?
During a previous article on why organizations struggle with data lineage my second point was around the need for a data lineage strategy rather than lineage simply being a part of other strategies – i.e., Data Management, Governance, etc. The reason that this is necessary is not only the complexity that building significant lineage capabilities involve, but also the scope at which you see the most benefit. While you can certainly scope lineage into certain areas and should for initial implementation and rollout, the real, lasting, and greatest benefits come from having it for as much of your data as possible.
Understanding where your data comes from and where it is consumed across your organization gives so many benefits beyond just traceability for governance or quality. For instance, impact analysis provides significant value and savings if it is applied to all data not just CDEs. Another is test planning, particularly system integration test planning. How much would your testing be improved if your test planners knew all the downstream systems that might be impacted by a change? A third benefit is identifying orphaned or duplicate data structures. How beneficial is it to be able to quickly identify and remove or consolidate these? How about identifying all reports that are sourced from a data structure that has been flagged with quality issues? CDEs are a good place to start but to truly take advantage of your investment in lineage, scale is your friend. Enterprise lineage is not easy but with a well thought out strategy you can get there.
What are some key considerations in building a Data Lineage Strategy?
1) Goals/ Objectives – short term, intermediate and long term for the program. These should include priority use cases for lineage to help drive scoping, a proposed timeline to provide initial value, and rollout plan for at least the first year.
2) Who is responsible and who will supply the resources for the program – who will own the strategy and the execution? A data lineage program should have executive sponsorship and be part of a department or function that can work across the organization influencing and if necessary, forcing compliance with the strategy. This strategy should be resourced to be successful both from a personnel and tooling perspective.
3) Who will implement the plan? Implementing a lineage strategy will require dedicated technical resources as well as part time resources from a variety of other areas – networking, server admins, DBAs, and others.
4) Initial Scope and Priority for expansion including time schedule / phases. Who will provide the platforms in scope for the initial phases. The idea is to work through manageable chunks rather than trying to boil the ocean.
5) Who will use the system, what are their requirements, and how will they be enabled. If you want business users to use lineage it will need business context so your tooling will need to support that. All users will need training – having a plan to not only rollout the lineage but roll out the training as lineage becomes available to different areas.
6) What is the plan for building the automated capture and maintenance of lineage into your normal processes and how will you enforce compliance. Examples include:
- Rate of change to systems – how often you must run maintenance rescans.
- Time frame available for rescans – batch window, no restriction, etc.
- Ability to be notified of changes or harvest changed items for rescanning.
- Will you / can you do periodic full rescans?
7) What integrations are required / desired? Data Catalog, Data Governance Processes, etc.
8) Lineage tooling requirements – a few to consider below based on the items above.
- Automated extraction and maintenance
- Ability to capture custom objects and lineage.
- Incremental scanning to handle the volume that will ultimately be scanned or maintained.
- Ease of integration with existing tools.
- Ease of use
9) How will you measure and communicate the benefits and cost savings associated with your lineage strategy? Resources are a finite without a way to quantify and expose the benefit of your lineage program to leadership you leave yourself vulnerable to being seen as a nice to have or good place to cut during budget discussions. This is why scope is important – scope the implementation to give you an initial “sprint to value” then scope follow on phases so they each build on that initial value. Don’t be afraid to toot your own horn – share the savings with leadership and the organization at large – make sure they know the difference you are making!!
A good, well executed strategy will keep you on track and help you show value over the long-term giving you the enterprise lineage and metadata that support making your organization not only data driven but data savvy.