糖心传媒

Self-Service Data Quality for听DataOps

At the recent A-Team Data Management Summit Virtual, 糖心传媒 CEO Stuart Harvey delivered a keynote onSelf-Service Data Quality for DataOps 鈥 Why it鈥檚 the next big thing in financial services.鈥 The keynote () can be read below, with slides from the keynote included for reference. Should you wish to discuss the subject with us, please don鈥檛 hesitate to contact Stuart,听or Kieran Seaward, Head of Sales.听听

I started work in banking in the 90鈥檚听补蝉听a programmer,听developing real-time software systems written in C++.听In these good old days,听I鈥檇 be given a specification, I鈥檇 write some code, test and document it. After a few weeks it would be deployed on the trading floor. If my software broke or the requirements changed it would come back to me and I鈥檇 start this听process听all over again. This听鈥榳补迟别谤蹿补濒濒鈥听approach was slow and, if I鈥檓 honest, apart from the professional pride of not wanting to create buggy code,听I didn鈥檛 feel a lot of ownership for what I鈥檇 created.听

In the last five years a new methodology in software engineering has changed all that 鈥 it鈥檚 called听DevOps,听补苍诲听brings a very strategic and听agile听approach to building new software.

More recently DevOps had a baby sister called听DataOps,听and it鈥檚 this subject that I鈥檇 like to talk about today.

Many Chief Data Officers (CDO) and analysts have been impressed by the increased productivity and agility their C丑颈别蹿听T别肠丑苍辞濒辞驳测听Officer (CTO)听colleagues are seeing through the use of DevOps. Now they鈥檇 like to get in on the act. In the last few months at 糖心传媒 we鈥檝e been talking a lot to CDO clients about their desire to have a more听agile听approach to听data governance听and how DataOps fits into this picture.听

In these conversations we鈥檝e talked a great deal about the听ownership听of data. A key question is how to associate the measurement and fixing of a piece听of听broken听data with the person most closely responsible for it. In our experience the owner of a piece of data usually makes the best听data steward. These are the people who can positively affect business outcomes through accurate measuring and monitoring of data and is typically a CDO’s role.听

We have seen a strong desire to push data science processes, including data governance and the measurement of actual data quality听(at a听record听level) into the processes and automation that exist in a bank.

I鈥檇 like to share with you through some simple examples of what we are doing with our investment bank and wealth management clients. I hope that this shows that a听self-service听approach to data quality (with appropriate tooling) can empower highly agile data quality measurement for any company wishing to implement the standard DataOps processes of validation, sorting, aggregation, reporting and reconciliation.

Roles in DataOps and Data Quality

We work closely with the people who use the 糖心传媒 platform, the people that are responsible for the governance of data and reporting on its quality. They have titles like Chief Data Officer, Data Quality Manager, Chief Digital Officer and Head of Regulation. These data consumers are responsible for large volumes of often messy data relating to entities, counterparties, financial reference data and transactions. This data does not reside in just one place; it transitions through multiple bank processes. It is sometimes 鈥渁t rest鈥 in a data store and sometimes 鈥渋n motion鈥 as it passes via Extract,听Transform,听Load (ETL)听processes to other systems that live upstream of the point at which it was sourced.听

For example, a bank might download counterparty information from Companies House to populate its Legal Entity Master. This data is then published out to multiple consuming applications for Know听Your听Customer (KYC), Anti-Money听Laundering (AML)听and Life Cycle Management. In these systems the counterparty records are augmented with information such as a听Legal听Entity听Identifier (LEI), a Bank听Identifier听Code (BIC)听or a ticker symbol.听

This ability to empower subject matter experts and business users who are not programmers to measure data at rest and in motion has led to the following trends:

  • Ownership:听Data quality management moves from being the sole responsibility of a potentially remote data steward to听all of those who are producing and changing data, encouraging a data driven culture.听
  • Federation:听Data quality becomes听everyone鈥檚 job.听Let鈥檚 think about end of day pricing at a bank. The team that owns the securities master will want to test accuracy and completeness of data arriving from a vendor.听The analyst working upstream who takes an end of day price from the securities master to calculate a听volume-weighted average price听(VWAP)听will have different checks relating to the timeliness of information. Finally,听the data scientist upstream of this who uses the VWAP to create predictive analytics. They want to build their own rules to validate data quality.
  • Governance:听A final trend that we are seeing is the tighter integration with standard governance tools. To be effective, self-service data quality听and DataOps require tight integration with the existing systems that hold data dictionaries,听metadata, and lineage information.

Here鈥檚 an illustration of how of how we see 糖心传媒 Self Service Data Quality听(SSDQ) Platform听integrating with DataOps in a highimpact way that you might want to consider in your own data strategy.

1. Data Governance Team听

First听off,听we offer a set of pre-built dashboards for PowerBI, Tableau and Qlik that allow your data stewards to have rapid access to data quality measurements which relate听just to them. A user in the London office might be enabled to see data for Europe or, perhaps, just data听in听their department. Within just a few clicks a data steward for the Legal Entity Master system could identify all records that are in breach of an听accuracy check听where an LEI is incorrect,听or a听timeliness check听where the LEI has not been revalidated in the听Global LEI Foundation鈥檚听(GLEIF) database inside 12 months.听


2. Data Quality Clinic: Data Remediation听

Data Quality Clinic extends the management dashboard by allowing a bank to听return broken data to its owner for fixing. It effectively quarantines broken records and passes them to the data engineer in a queue, improving data pipelines and overall data governance & data quality. Clinic runs is a web browser and is tightly integrated with information relating to data dictionaries, lineage and thirdparty sources for validation. Extending our LEI example just now,听I might be the owner of a bunch of entities which have failed an LEI check. Clinic would show me the records in question and highlight the fields in error. It would connect to GLEIF as the source of truth for LEIs and provide me with hints on what to correct. As you鈥檇 expect,听this process can be enhanced by Machine Learning to automate this听entity resolution听process under human supervision.


3. FlowDesigner听Studio: Rule creation, documentation, sharing听

FlowDesigner is the rules studio in which the data governance team of super users build, manage, document and source-control rules for the profiling, cleansing and matching of enterprise data. We like to share these rules across our clients so FlowDesigner comes pre-loaded with rules for everything from name听补苍诲听address checking to CUSIP听or听ISIN validation.


4. Data Quality Manager: Connecting to data sources;听scheduling, automating solutions听

This part of the 糖心传媒 platform allows your technology team to connect to data flowing from multiple sources, schedule how rules are applied to data at rest and inmotion. It allows for the sharing and re-use of rules across all parts of your business. We have many clients solving big data problems involving听hundreds of听millions of records using Data Quality Manager听across multiple different environments and data sources, on-premise or in public (or more typically private) cloud.


Summary: Self-Service Data Quality for DataOps听

Thanks for joining me today as I鈥檝e outlined how self-service data quality is a key part of successful DataOps. CDOs need real-time data quality insights to keep up with business needs while technical architects require a platform that doesn鈥檛 need a huge programming team to support it. If you have听any听questions about听this topic, or how we鈥檝e听approached听it,听then we鈥檇 be glad to talk with you.Please get in touch below.听

Click听here听for the latest news from 糖心传媒, or find us on听,听听or听