Machine Learning Archives - ��Ĵ�ý

The Importance of Data Quality in Machine Learning

Fiona Browne — Mon, 18 Dec 2023 12:40:03 +0000

We are currently in an exciting area and time, where Machine Learning (ML) is applied across sectors from self driving cars to personalised medicine. Although ML models have been around for a while – for example, the use of algorithmic trading models from the 80’s, Bayes since 1700s – we are still in the nascent stages of productionising ML.

From a technical viewpoint, this is ‘Machine Learning Ops’ or MLOPs. MLOPs involve figuring out how to build, deploy via continuous integration and deployment, tracking and monitoring models and data in production.��

From a human, risk, and regulatory viewpoint we are grappling with big questions about ethical AI (Artificial Intelligence) systems and where and how they should be used. Areas including risk, privacy and security of data, accountability, fairness, adversarial AI, and what this means, all come into play in this topic. Additionally, the debate over supervised machine learning, semi-supervised learning, and unsupervised machine learning, brings further complexity to the mix.

Much of the focus is on the models themselves, such as��Everyone can get their hands on pre-trained models or licensed APIs; What differentiates a good deployment is the data quality.

However, the one common theme that underpins all this work, is the rigour required in developing production-level systems and especially the data necessary to ensure they are reliable, accurate, and trustworthy. This is especially important for ML systems; the role that data and processes play; and the impact of poor-quality data on ML algorithms and learning models in the real world.

Data as a common theme��

If we shift our gaze from the model side to the data side, including:

Data management – what processes do I have to manage data end to end, especially generating accurate training data?
Data integrity – how am I ensuring I have high-quality data throughout?
Data cleansing and improvement – what am I doing to prevent bad data from reaching data scientists?
Dataset labeling – how am I avoiding the risk of unlabeled data?
Data preparation – what steps am I taking to ensure my data is data science-ready?

A far greater understanding of performance and model impact (consequences) could be achieved. However, this is often viewed as less glamorous or exciting work and, as such, is often unvalued. For example, what is the impetus for companies or individuals to invest at this level (such as regulatory – e.g. BCBS, financial, reputational, law)?

Yet, as well defined in

“Data largely determines performance, fairness, robustness, safety, and scalability of AI systems…[yet]��In practice, most organizations fail to create or meet any data quality standards, from under-valuing data work vis-a-vis model development.”��

This has a direct impact on people’s lives and society, where “…data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations”.

What this looks like in practice

We have seen this in the past, with the in the UK during Covid. In this case, teachers predicted the grades of their students, then an algorithm was applied to these predictions to downgrade any potential grade inflation by the Office of Qualifications and Examinations Regulation, using an algorithm. This algorithm was quite complex and non-transparent in the first instance. When the results were released, 39% of grades were downgraded. The algorithm captured the distribution of grades from previous years, the predicted distribution of grades for past students, and then the current year.

In practice, this meant that if you were a candidate who had performed well at GCSE, but attended a historically poor performing school, then it was challenging to achieve a top grade. Teachers had to rank their students in the class, resulting in a relative ranking system that could not equate to absolute performance. It meant that even if you were predicted a B, were ranked at fifteenth out of 30 in your class, and the pupil ranked at fifteenth the last three years received a C, you would likely get a C.

The application of this algorithm caused an uproar. Not least because schools with small class sizes – usually private, or fee-paying schools – were exempt from the algorithm resulting in the use of the teaching predicted grades. Additionally, it baked in past socioeconomic biases, benefitting underperforming students in affluent (and previously high-scoring) areas while suppressing the capabilities of high-performing students in lower-income regions.

A major lesson to learn from this, therefore, was transparency in the process and the data that was used.

An example from healthcare

Within the world of healthcare, it had an impact on ML cancer prediction with IBM’s ‘Watson for Oncology’, partnering with The University of Texas MD Anderson Cancer Center in 2013 to “uncover valuable insights from the cancer center’s rich patient and research databases”. The system was trained on a small number of hypothetical cancer patients, rather than real patient data. This resulted in erroneous and dangerous cancer treatment advice.

Significant questions that must be asked include:

Where did it go wrong here – certainly the data but in general a wider AI system?
Where was the risk assessment?
What testing was performed?
Where did responsibility and accountability reside?

Machine Learning practitioners know well the statistic that 80% of ML work is data preparation. Why then don’t we focus on this 80% effort and deploy a more systematic approach to ensure data quality is embedded in our systems, and considered important work to be performed by an ML team?

This is a view recently articulated by who urges the ML community to be more data-centric and less model-centric. In fact, Andrew was able to demonstrate this using a steel sheets defect detection prediction use case whereby a deep learning computer vision model achieved a baseline performance of 76.2% accuracy. By addressing inconsistencies in the training dataset and correcting noisy or conflicting dataset labels, the classification performance reached 93.1%. Interestingly and compellingly from the perspective of this blog post, minimal performance gains were achieved addressing the model side alone.

Our view is, if data quality is a key limiting factor in ML performance –then let’s focus our efforts here on improving data quality, and can ML be deployed to address this? This is the central theme of the work the ML team at ��Ĵ�ý undertakes. Our focus is automating the manual, repetitive (often referred to as boring!) business processes of DQ and matching tasks, while embedding subject matter expertise into the process. To do this, most of our solutions employ a human-in-the-loop approach where we capture human decisions and expertise and use this to inform and re-train our models. Having this human expertise is essential in guiding the process and providing context improving the data and the data quality process. We are keen to free up clients from manual mundane tasks and instead use their expertise on tricky cases with simpler agree/disagree options.

To learn more about an AI-driven approach to Data Quality, read our press release about our Augmented Data Quality platform here.��

The post The Importance of Data Quality in Machine Learning appeared first on ��Ĵ�ý.

How to test your data against Benford’s Law��

Matt Neill — Tue, 09 May 2023 16:04:04 +0000

One of the most important aspects of data quality is being able to identify anomalies within your data. There are many ways to approach this, one of which is to test the data against Benford’s Law. This blog will take a look at what Benford’s Law is, how it can be used to detect fraud, and how the ��Ĵ�ý platform can be used to achieve this.

What is Benford’s Law?��

Benford’s law is named after a physicist called Frank Benford and was first discovered in the 1880s by an astronomer named Simon Newcomb. Newcomb was looking through logarithm tables (used before pocket calculators were invented to find the value of the logarithms of numbers), when he spotted that the pages which started with earlier digits, like 1, were significantly more worn than other pages.��

Given a large set of numerical data, Benford’s Law asserts that the first digit of these numbers is more likely to be small. If the data follows Benford’s Law, then approximately 30% of the time the first digit would be a 1, whilst 9 would only be the first digit around 5% of the time. If the distribution of the first digit was uniform, then they would all occur equally often (around 11% of the time). It also proposes a distribution of the second digit, third digit, combinations of digits, and so on.��According to Benford’s Law, the probability that the first digit in a dataset is d is given by P(d) = log10(1 + 1/d).

Why is it useful?��

There are plenty of data sets that have proven to have followed Benford’s Law, including stock prices, population numbers, and electricity bills. Due to the large availability of data known to follow Benford’s Law, checking a data set to see if it follows Benford’s Law can be a good indicator as to whether the data has been manipulated.��While this is not definitive proof that the data is erroneous or fraudulent, it can provide a good indication of problematic trends in your data.��

In the context of fraud, Benford’s law can be used to detect anomalies and irregularities in financial data. For example, within large datasets such as invoices, sales records, expense reports, and other financial statements. If the data has been fabricated, then the person tampering with it would probably have done so “randomly”. This means the first digits would be uniformly distributed and thus, not follow Benford’s Law.

Below are some real-world examples where Benford’s Law has been applied:

Detecting fraud in financial accounts – Benford’s Law can be useful in its application to many different types of fraud, including money laundering and large financial accounts. Many years after Greece joined the eurozone, the economic data they provided to the E.U.

Detecting election fraud – Benford’s Law was used as evidence of fraud in the 2009 Iranian elections and was also used for auditing data from the 2009 German federal elections. Benford’s Law has also been used in multiple US presidential elections.

Analysis of price digits – When the euro was introduced, all the different exchange rates meant that, while the “real” price of goods stayed the same, the “nominal” price (the monetary value) of goods was distorted. Research carried out across Europe showed that the first digits of nominal prices followed Benford’s Law. However, deviation from this occurred for the second and third digits. Here, trends more commonly associated with psychological pricing could be observed. Larger digits (especially 9) are more commonly found due to the fact that prices such as £1.99 have been shown to be more associated with spending £1 rather than £2.��

How can ��Ĵ�ý’ tools be used to test for Benford’s Law?��

Using the ��Ĵ�ý platform, we can very easily test any dataset against Benford’s Law. Take this dataset of financial transactions (shown below). We’re going to be testing the “pmt_amt” column to see if it follows Benford’s Law for first digits. It spans several orders of magnitudes ranging from a few dollars to 15 million, which means that Benford’s Law is more likely to accurately apply to it.

The first step of the test is to extract the first digit of the column for analysis. This can very easily be done using a small FlowDesigner project (shown below).

Here we import the dataset and then filter out any values that are less than 1, as these aren’t relevant to our analysis. Then, we extract the first digit. Once that’s been completed, we can profile these digits to find out how many times each occurs and then save the results.

The next step would be to perform a statistical test to see how confident we can be that Benford’s Law applies here. We can use our Data Quality Manager tool to architect the whole process.

Step one runs our FlowDesigner project, whilst the second executes a simple Python script to perform the test and the last two steps let us set up an automated email alert to let the user know if the data failed the test at a specified threshold. While I’m using an email alert here, any issues tracking platform, such as Jira, can be used. We can also show the results in a dashboard, like the one below.

The graph on the left, with the green line, represents the distribution we would expect the digits to follow if it obeyed Benford’s Law. The red line shows the actual distribution of the digits. The bottom right table shows the two distributions and then the top right table shows the result of the test. In this case, it shows that we can be 100% confident that the data follows Benford’s Law.

In conclusion…

Physicist Frank Benford discovered a useful methodology that is as beneficial today as ever. The applicability of Benford’s law is a powerful tool for detecting fraud and other irregularities in large datasets. By combining statistical analysis with expert knowledge and AI-enabled technologies, organizations can improve their ability to detect and prevent fraudulent activities, thus safeguarding their financial health and reputation.

Matt Neil is a Machine Learning Engineer at ��Ĵ�ý. For more insights from��Ĵ�ý,��find us on��,��or��.

The post How to test your data against Benford’s Law�� appeared first on ��Ĵ�ý.

AI Ethics: The Next Generation of Data Scientists

Matt Flenley — Mon, 04 Apr 2022 12:54:50 +0000

In March 2022, ��Ĵ�ý took advantage of the offer to visit a local secondary school and the next generation of Data Scientists to discuss AI Ethics and Machine Learning in production. Matt Flenley shares more from the first of these two visits in his latest blog below…

Students from Wallace High School meet Dr Fiona Browne (centre) and Matt Flenley (right)

AI Ethics is often the poster child of the modern discourse on whenever the inevitable machine-led apocalypse occurs. Yet, as we look around at wars in Ukraine and Yemen, record water shortages in the developing world, and the ongoing struggle for the education of girls in Afghanistan, it becomes readily apparent that as in all things, ethics starts with humans.

This was the main thrust of the discussion with the students at Wallace High School in Lisburn, NI. As Dr Fiona Browne, Head of AI and Software Development, talked the class of second-year A-Level students through data classification for training machine learning models, the question of ‘bad actors’ came up. What if, theorised Dr Browne, people can’t be trusted to label a dataset correctly, and the machine learning model learns things that aren’t true?

At this stage, a tentative hand slowly raised in the classroom; one student confessed that, in fact, they had done exactly this in a recent dataset labelling exercise in class. It was the perfect opportunity to detail in a practical way how the human involvement in Artificial Intelligence, Machine Learning, and especially in the quality of the data underpinning both.

Humans behind the machines, and baked-in bias

As is common, the exciting part of technology is often the technology itself. What can it do? How fast can it go? Where can it take me? This applies just as much to the everyday, from home electronics through to transportation, as it does to the cutting edge of space exploration or genome mapping. However, the thought processes behind the technology, imagined up by humans, specified and scoped by humans, create the very circumstances for how those technologies will behave and interact with the world around us.

In her promotion for the book , the author Caroline Criado-Perez writes,

“Imagine a world where your phone is too big for your hand, where your doctor prescribes a drug that is wrong for your body, where in a car accident you are 47% more likely to be seriously injured, where every week the countless hours of work you do are not recognised or valued. If any of this sounds familiar, chances are that you’re a woman.”
Caroline Criado-Perez, Invisible Women

One example is of the comparatively high rate of anterior cruciate ligament injuries among female soccer players. While some of this can be attributed to different anatomies, it is in part caused by the lack of female-specific footwear in the sport (with most brands choosing to offer smaller sizes rather than tailored designs). Yet the anatomical design of the female knee in particular is substantially different to that of males. Has this human-led decision, to simply offer small sizes, taken into account the needs of the buyer, or the market? Has it been made from the point of view of creating a fairer society?

The ��Ĵ�ý team (L to R: Matt Flenley, Shauna Leonard, Edele Copeland) meet GCSE students from the Wallace High School as part of a talk on Women in Technology Careers

If an algorithm was therefore applied to specify a female-specific football boot from the patterns and measurements of existing footwear on the market today, would it result in a different outcome? No, of course not. It takes humans to look at the world around us, detect the risk of bias, and then .

It is the same in computing. The product, in this case the machine learning model or AI algorithm, is going to be no better than the work that has gone into defining and explaining it. A core part of this is understanding what data to use, and of what quality the data should be.

Data Quality for Machine Learning – just a matter of good data?

Data quality in a business application sense is relatively simple to define. Typically a business unit has requirements, usually around how complete the data is and to what extent the data in it is unique (there are a wide range of additional data quality dimensions, which you can read about here). For AI and Machine Learning, however, data quality is a completely different animal. On top of the usual dimensions, the data scientist or ML engineer needs to consider if they have all the data they need to create unbiased, explainable outcomes. Put simply, if a decision has been made, then the data scientists need to be able to explain why and how this outcome was reached. This is particularly important as ML becomes part and parcel of everyday life. Turned down for credit? Chances are an algorithm has assessed a range of data sources and generated a ‘no’ decision – and if you’re the firm whose system has made that decision, you’re going to need to explain why (it’s the law!).

This is the point at which we return to the class in Wallace High School. The student tentatively raising their arm would have got away with it, with the model predicting patterns incorrectly, if the student had stayed silent. There was no monitoring in place to detect which user had been the ‘bad actor’ and so the flaw would have gone undetected without the student’s confession. It was, however, utterly perfect to explain the need to free algorithms from bias, for this next generation of data scientists. In the five years between now and when these students are working in industry, they will need to be fully aware of needing every possible aspect of the society people wish to inhabit being in the room when data is being classified, and models are being created.

For an industry still so populated , it is clear that the decision to do something about what comes next lies where it always has: in the hearts, minds and hands of technology’s builders.

The post AI Ethics: The Next Generation of Data Scientists appeared first on ��Ĵ�ý.

AI Con | 3 December 2021

Jamie Gordon — Wed, 24 Nov 2021 15:17:52 +0000

We are delighted to be involved withthis year again. This year, , Presales Engineer, and Machine Learning Engineer will be discussing Machine Learning Augmentation.

The north’s premier conference on artificial intelligence, AI Con returns to face-to-face business this year with a hybrid event on Friday 3 December.

As the adoption of AI expands into all areas of our lives, and the business and societal opportunities and challenges become ever more apparent, this ground-breaking conference addresses core issues of the technology for a range of audiences: general, business and specialist.

The event, which is now in its third year, brings together world-leading technology professionals and business leaders to examine how AI is changing our world and the opportunities and challenges that presents.

In-person attendance will take place at Titanic Belfast and will feature some of the top figures in the field, with other leading professionals streaming in from across the globe.

The themes from this year’s event, which hosted 450 attendees in its first year and 800 at the virtual event last year, include:

– Applied AI: Targeted primarily at a general audience, Applied AI looks at existing, mature technology that can be deployed today and examines case studies on where these are adding value and inspiration for people and their organisations to start their own AI investigations.

Chaired by Kathryn Harkin of Allstate NI, Rachael Bland of Kainos, and Sam Beni of Tech Nation

– Business of AI: Designed for a business audience, Business of AI looks at how AI can challenge existing business models, create entirely new ones and debates what “AI Startups” need to know in this burgeoning space.

Chaired by Alexandra Mousavizadeh of Tortoise Media and Tom Gray of Kainos.

Attendees are asked to note that strict Covid precautions will be in operation at the in-person event which will be limited to 200 people. Attendees must be double vaccinated and proof of vaccination will be required for entry.

The full programme is available online

The post AI Con | 3 December 2021 appeared first on ��Ĵ�ý.

Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?��

Jamie Gordon — Wed, 15 Sep 2021 09:06:21 +0000

Written by Daniel Browne, Machine Learning Engineer

Defining data quality rules and collection of rules for data quality projects is often a manual time-consuming process. It often involves a subject matter expert reviewing data sources and designing quality rules to ensure the data complies with integrity, accuracy and / or regulatory standards. As data sources increase in volume and variety with potential functional dependencies, the task of defining data quality rules becomes more difficult. The application of machine learning can aid with this task by identifying dependencies between datasets through to the uncovering patterns related to data quality and suggesting previously applied rules to similar data.

At ��Ĵ�ý, we recently undertook a Rule Suggestion Project to automate the process of defining data quality rules for datasets through rule suggestions. We use natural language processing techniques to analyse the contents of a dataset and suggest rules in our rule library that best fit each column.

Problem Area and ML Solution

Generally, there are several data quality and data cleansing rules that you would typically want to apply to certain fields in a dataset. An example is a consistency check on a phone number column in a dataset such as checking that the number provided is valid and formatted correctly. Unfortunately, it is not usually as simple as searching for the phrase “phone number” in a column header and going from there. A phone number column could be labelled “mobile”, or “contact”, or “tel”, for example. Doing a string match in these cases may not uncover accurate rule suggestions. We need context embedded into this process and this is where machine learning comes in. We’ve been experimenting with building and training machine learning models to be able to categorise data, then return suggestions for useful data quality and data cleansing rules to consider applying to datasets.

Human in the Loop

The goal here is not to take away control from the user, the machine learning model isn’t going to run off with your dataset and do what it determines to be right on its own – the aim is to assist the user and to streamline the selection of rules to apply. A user will have full control to accept or reject some or all suggestions that come from the Rule Suggestion model. Users can add new rules not suggested by the model and this information is captured to improve the suggestions by the model. We hope that this will be a useful tool for users to make the process of setting up data quality and data cleansing rules quicker and easier.

Developers View

I’ve been involved in the development of this project from the early stages, and it’s been exciting to see it come together and take shape over the course of the project’s development. A lot of my involvement has been around building out the systems and infrastructure to help users interact with the model and to format the model’s outputs into easily understandable and useful pieces of information. This work surrounds allowing the software to take a dataset and process it such that the model can make its predictions on it, and then mapping from the model’s output to the individual rules that will then be presented to the user.

One of the major focuses we’ve had throughout the development of the project is control. We’ve been sure to build out the project with this in mind, with features such as giving users control over how cautious the model should be in making suggestions by being able to set confidence thresholds for suggestions, meaning the model will only return suggestions that meet or surpass the chosen threshold. We’ve also included the ability to add specific word-to-rule mappings that can help maintain a higher level of consistency and accuracy in results for very specific or rare categories that the model may have little or no prior knowledge of. For example, if there are proprietary fields that may have their own unique label, formatting, patterns or structures, and their own unique rules related to that, it’s possible to define a direct mapping from that to rules so that the Rule Suggestion system can produce accurate suggestions for any instances of that information in a dataset in the future.

Another focus of the project we hope to develop further upon is the idea of consistently improving results as the project matures. In the future we’re looking to develop a system where the model can continue to adapt based on how the suggested rules are used. Ideally, this will mean that if the model tends to incorrectly predict that a specific rule or rules will be useful for a given dataset column, it will begin to learn to avoid suggesting that rule for that column based on the fact that users tend to disagree with that suggestion. Similarly, if there are rules that the model tends to avoid suggesting for a certain column that users then manually select, the model will learn to suggest these rules in similar cases in the future.

In the same vein as this, one of the recent developments that I’ve found really interesting and exciting is a system that allows us to analyse the performance of various different machine learning models on a suite of sample data, which allows us to gain detailed insights into what makes an efficient and powerful rule prediction model, and how we can expect models to perform in real-world scenarios. It provides us with a sandbox to experiment with new ways of creating and updating machine learning models and being able to estimate baseline standards for performance, so we can be confident of the level of performance for our system. It’s been really rewarding to be able to analyse the results from this process so far and to be able to compare the different methods of processing the data and building machine learning models and see which areas one model may outperform another and so on.

Thanks to Daniel for talking to us about rules suggestion. If you would like to discuss further or find out more about rules suggestion at ��Ĵ�ý, reach out to directly or you can reach out to our Head of AI.

Get in touch or find us on , , or .

The post Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?�� appeared first on ��Ĵ�ý.

��Ĵ�ý is involved with the KTN: AI for Services UK Tour!

Jamie Gordon — Tue, 23 Feb 2021 11:30:00 +0000

The first stop on the AI for Services UK will be Northern Ireland curated by the fantastic team at and !

We are delighted that��Ĵ�ý��will be one of the companies involved, the aim of the event is to discover the innovation taking place across the UK in the professional and financial, insurance, accountancy and law��sectors.��

Kainos,��Adoreboard��and Analytics Engines are in amongst the few other companies also representing Northern Ireland in the AI for Services Tour.��Ĵ�ý Head of AI, Dr Fiona Browne will be pitching at the event.��We thought it would be a good idea to catch up with Dr Browne ahead of the event to find out what it’s all about!��

Hi Fiona! Could you tell me more about the event and why��Ĵ�ý��is involved?��

The AI��for��Services event is��a UK-wide event��hosted��by KTN��Innovate UK and we are part of the NI cohort. The��event is a roadshow, which will��provide the opportunity for��companies from all the different regions to highlight what they are doing in terms of��innovation��and��AI and how these can address��areas within the various sectors.��The roadshow will also allow each of the companies to pitch��to organisations in different sectors including Accountancy, Insurance and Financial Services.

Fiona, you will be giving one of these pitches at the event. What can you tell us about it?��

All the regions have a chance to provide a��7-minute��pitch. We will be describing who��Ĵ�ý��are��and what��we specialise in (Data Quality and Matching). We will be focusing on a particular use case, which is related to Onboarding and the role of entity matching within this process, highlighting the recent work we have done in this area. We will be highlighting the data quality required before the matching process occurs, but also how we have augmented our matching process with machine learning.��

If you could pick one key takeaway��that you would want people to��get��from the pitch, what would it be?��

I think the key message to takeaway is that Machine Learning (ML) has a role to play in��addressing manual��time-consuming��task and when applied to the correct applications, it can make efficiencies savings. However, good��ML is built on quality data and effort is needed to ensure that you have a��reproducible��data quality pipeline in place.��At��Ĵ�ý��we pride ourselves on our��data quality and matching technology and have innovated in these areas.��We are��really excited��about the developments we are making, and we can’t wait to tell you more!��

��Ĵ�ý��will be representing NI. Do you think that the talent here locally and the technological developments are matching up to the rest of the UK?��

Yes! There’s a real focus on Artificial Intelligence and FinTech within NI.��The country may be��small��in size��but in terms of capabilities it��offers great solutions.��

What do you hope to be the biggest takeaway for attendees��on the whole event?��

The idea of this event is��for companies��within sectors such as finance, insurance, law and accountancy who are embarking or on their way��to their��digital transformation��journey��to connect with companies that offer��innovative solutions.��At ��Ĵ�ý we want to better understand��the bottlenecks and��pain points��that these companies in these sectors are facing and offer a solution��that addresses these. We hope to deepen our specialist knowledge in understanding the current challenges in the industry so that we can tailor our technology to solve real business problems. We��will��showcase our��self-service��data quality and matching��solutions��highlighting the��continual developments we have made with machine learning to augment the matching process.��

It is also a great opportunity to leverage our presence in these sectors as we are primarily linked to financial and governmental. Accountancy, Law and Insurance are sectors that we haven’t��traditionally marketed to��but have similar��areas to address such as compliance to regulation and common data management challenges.��

What would you like the audience to share?

We will highlight what our solution is and what we do, but we want to understand better the pain points. Where do the difficulties lie?��Is it extracting knowledge from textual sources of information? Or is it issues with integrating different data sources? Or is it issues with adhering to regulations?��It will be good to hear first-hand from these organisations.

Are you looking forward to hearing any particular pitch on the day?��

I am looking forward to hearing them all. Particularly because all the companies are very different, it’ll be interesting to hear��more about their solutions and the innovations that they are��offering.��

How can attendees be able to get in touch with you?��

You��can��register as��a delegate��to hear the presentations��. Then, Innovate UK is using a platform called Meeting where 1:1 meeting can be booked��between��12:30-2 pm��with��companies.��

The event is sure to be a good one,��we are excited to be involved. We are most excited to learn more about the different sectors!��Keep an eye on the KTN social media pages for updates��on the event. KTN also has an events archive where you can listen to past events if you have missed them, check it out .

Visit��here��for more by ��Ĵ�ý, or find us on��,��or��for the latest news.��

The post ��Ĵ�ý is involved with the KTN: AI for Services UK Tour! appeared first on ��Ĵ�ý.

KTN: AI for Services on Tour | 23/02

Tania — Tue, 16 Feb 2021 09:25:27 +0000

We are delighted to be one of a few Northern Irish’s businesses to be part of the KTN: AI for Service on Tour.

This is a brilliant opportunity for those in attendance to hear what ��Ĵ�ý is doing in the space.

Kainos, Adoreboard, and Analytics Engines are amongst the few other companies also representing Northern Ireland. Dr Fiona Browne will be speaking at the roadshow, which will be happening on 23rd February.

For more details and registration, .

The post KTN: AI for Services on Tour | 23/02 appeared first on ��Ĵ�ý.

The Open University talk: Business Ethics | 17/02

Tania — Tue, 16 Feb 2021 09:06:10 +0000

Matt��Flenley, Marketing and Partnerships Manager at��Ĵ�ý��will be speaking this week at The Open University, delivering a talk on Business Ethics.

The talk is going to cover four things:

The impact of unintended and cultural bias in machine learning��
What to��do if your business loses or has no soul
Corporate Social��Responsibility��– Looking after people when the world is upside down
The benefits and pitfalls of big corporate machines and rapid growth��start-ups��when it comes to doing charitable work and��being a force for good.��

You can also read Matt’s blogs here such as a piece on AI Ethics he has written about or find out about our people here, explore our open vacancies. If you’re curious about working at ��Ĵ�ý please for a chat.

The post The Open University talk: Business Ethics | 17/02 appeared first on ��Ĵ�ý.

The Open University Business Ethics talk & ��Ĵ�ý

Jamie Gordon — Mon, 15 Feb 2021 13:00:00 +0000

Matt��Flenley, Marketing and Partnerships Manager at��Ĵ�ý��will be speaking this week at The Open University, delivering a talk on Business Ethics.

Prior to The Open University, we thought it would be a good idea��to have a chat and find out why this topic, what other views he hopes to talk about,��and the importance of business ethics, especially from a data perspective.

Hi Matt, what can you tell us about the talk you are giving at The Open University?

I am really excited to give this talk as this is an area I am passionate about. The talk is going to cover four things:

The impact of unintended and cultural bias in machine learning
What to do if your business loses or has no soul
Corporate Social Responsibility – Looking after people when the world is upside down
The benefits and pitfalls of big corporate machines and rapid growth start-ups when it comes to doing charitable work and being a force for good.

How important do you think ethics is within the data industry?

I think ethics are��important. People very often think about��algorithms��and automated rules as being the critical part to measure,��but��before all of that, there’s data. You��must��involve data in��the process, to be able to understand whether the sample you��are measuring��is right. The quality of the information you use��depends on whether the��information��is��complete and whether you sought out the correct data,��to begin with.��

Do you think that an understanding of ethics and data has increased in importance in recent years?

I do, due to the increased understanding of the importance of AI. For example, there are images on the internet, that some specific algorithms can learn from, to be able to generate people that don’t actually exist. As a result of this, images are created that are recognisable to you or me, but these people don’t exist – it’s a clever piece of AI. A problem that has been increasingly recognised with the source material is that it doesn’t contain enough images of older women. This has meant that as the algorithm generated people, the AI’s conclusions were that as they age, everyone becomes an old man! Due to the fact that there is an absence of older women images, an inaccurate representation of society becomes prevalent. If you don’t have the right data going into an algorithm, you won’t have accurate data coming out of it. People are increasingly understanding the importance of data, and examples like this shine a light on bias and how damaging it can be to society.

How important it is to share this knowledge with the leaders of tomorrow at The Open University?

It is absolutely critical! I believe it’s vital for business people as well as technologists to be ethicists. The more people there is that are ethicists in the discussion, the more you are going to end up with less bias in the room which will fundamentally lead to fairer outcomes.

How important is The Open University and ��Ĵ�ý partnership? When did it begin?

The relationship has been longstanding. We have a number of staff members that are studying at alongside working and indeed one working as a lecturer at the institution. One of the best parts of working with The Open University is the access to talent in unexpected places. There are a number of students that are pursuing careers in technology, who have not gone about it in a conventional way, like immediately heading to a red-brick university for a computer science degree. Some of them are further down the line in different careers and have decided to make a career change, and some have decided to retrain while working. It’s a real mix and a really encouraging, affirming environment for people to pursue their education and career.

Thank you Matt! We will be sharing soundbites from this talk, so make sure to keep an eye out for those.

You can also read Matt’s blogs here such as a piece on AI Ethics he has written about. Or find out about our people here, explore our open vacancies, or if you’re curious about working at ��Ĵ�ý please for a chat.

The post The Open University Business Ethics talk & ��Ĵ�ý appeared first on ��Ĵ�ý.

AI Con 2020 Interview with Dr. Fiona Browne and Matt Flenley

Jamie Gordon — Wed, 02 Dec 2020 12:00:36 +0000

Dr. Fiona Browne, Head of AI, and Matt Flenley, Marketing and Partnerships Manager at ��Ĵ�ý are contributing to AI Con 2020 this year.

After a successful first year, AI Con is back!

This year it’s said to be bigger and better than ever with a range of talks across AI, including AI/ML in Fintech; AI in the public sector; the impact of arts; the impact of AI on research and innovation; and how AI has caused a change in the screening industry. All these topics will be tackled by world-leading technology professionals and business leaders to unpack how AI is changing our world.

Ahead of AI Con 2020 taking place virtually on the 3rd and 4th December, we thought it would be a good idea to sit down with two of those industry experts, Fiona and Matt, and ask them a few things. I wanted to understand what their involvement with AI is this year, any previous involvements they’ve had with AI Con, what they envisage to be the key takeaways, and of course, what talks they are most looking forward to engaging with themselves.

Hi, Fiona and Matt. Perhaps to kick-off, you could tell talk a bit about why you both wanted to be involved with AI Con?

Fiona: Hello! Well, we were involved with it last year and it was a great experience. We were involved in the session that focused on business and the applications of AI. We were asked then to pull a session together for this year, and we’ve been able to focus on the area that ��Ĵ�ý specialises in, which is Financial Services.

This has given us the chance to unpack how machine learning can be used in Financial Services; we’ve tried to cover three broad areas within this session: firstly, understanding those people who work in the financial institutions. Secondly, we will then delve into our bread-and-butter data quality & matching, and lastly the importance of data governance.

Matt: Hi! Last year I worked with Fiona to arrange our involvement. This year, we had the chance to have more time to prepare. This meant that Fiona and I could collaborate even more so.

I particularly enjoyed approaching speakers such as Peggy and Sarah (to name but a few!). What interests me most is the application of AI and we are delighted to have contributed towards pulling together such a strong line-up.

The variety of talks too will bring a wide range of attendees!

This is the second year. Perhaps you both could talk to me about your previous involvement with AI Con, if any, and how it has evolved?

Fiona: Last year we discovered there was a significant appetite for this content. We have been able to expand this year’s conference over more streams by being more strategic with the messaging. We have also been able to create a session for ourselves (one that we know about and are vastly passionate and experienced in). This year, the conference is not local, it’s much more international. Even if you look at the line-up of our speakers for our session, they come from New York and Switzerland.

The International flavour offers a greater perspective, knowledge, and insight.

Matt: I agree. I’ve been blown away by how engaged people have been. We have Andrew Jenkins, the Fintech Envoy for Northern Ireland and Gary Davidson of Tech Nation, who are keen to contribute to where they think the market is going.

The panel I am chairing is focusing on FinTechs that are scaling and exporting with a focus on why people should invest in NI technology. The event is well-prepared and timely, and I am looking forward to chairing on Thursday.

So, Matt what will the panel you are chairing be discussing, who is on the panel?

Matt: We are joined by Pauline Timoney, COO of Automated Intelligence; Chris Gregg, CEO and Founder of Light Year; and as I mentioned before, Andrew Jenkins, and Gary Davidson. We are going to look at the opportunities to collaborate with incubators like TechNation, the impact of COVID-19, Brexit, and FinTech investments for last year.

FinTech is a hugely growing sector, and we are excited to delve into why and explore where the sector is going next!

Fiona, you have been one of the curators of AI Con, how has that process been?

Fiona: It has been great! We were given the remit of FinTech and we could pick and choose what topics and who we wanted to add to the line-up. We have a very clear message. The talks are practical application-centred with a focus on trends and experience.

One of the largest Wealth Management Companies in the world is coming to speak to discuss their usage of technology, future projections, and more!

What do you both envisage the biggest takeaways of AI Con being?

Matt: One of the biggest takeaways is going to be the incredible, thriving NI FinTech sector.

When you look around the ecosystem, for example of the you can see the sheer explosion of firms and the problems being solved.

Fiona: There will be maturity across the board, with more companies implementing these technologies.

People are increasingly thinking about Machine Learning and AI… how can we use it?

I believe there will be a skillset gap which will be a challenge; it will be a challenge for many firms to attract the talent that can implement these processes and technologies.

To wrap up! On a personal, note, what talk(s) are you both most looking forward to?

Matt: I am excited to hear from Sarah Gadd, Credit Suisse. Her wealth of experience will offer great insight into how they apply AI into reality. Not only are they on the cutting edge of technology but they have taken it off the ground. I am also looking forward to Peggy Tsai’s contribution.

Fiona: From our side, Sarah and Peggy will be interesting. It’s an honour to have a speaker like Sarah Gadd. It’s brilliant to hear how they are applying this technology now in a regulated area. What are their challenges, solutions? Also, Peggy is giving time to the complexity of data, which is more important than ever before. Austin too will be unpacking AI in the arts and music sector. I am looking forward to the overall variety, calibre, and diversity of point of view that will be offered.

Thank you both, for taking the time out of our schedules! If you haven’t got your place for AI Con 2020 reserved, there is no time like the present! You can secure your place for free . It will be a brilliant conference. Who’s ready to learn more about AI?

The post AI Con 2020 Interview with Dr. Fiona Browne and Matt Flenley appeared first on ��Ĵ�ý.

AI Con 2020 | 3rd – 4th December

Jamie Gordon — Mon, 09 Nov 2020 17:05:55 +0000

We are delighted that our very own has helped to co-curate AI CON 2020.

The second annual AI CON is taking place virtually on Thursday 3rd and Friday 4th December and will be hosted by and as per last year. This year’s event will be recorded live from the AI CON studio in Belfast over two days. The event will bring together world-leading technology professionals and business leaders to discuss and examine how AI is continuing to change our world. This year’s gathering will discuss:

• AI ML in Fintech

• AI in the Public Sector

• Impact of AI on Society, Arts, and Culture

• Applied AI/Supporting AI Startups

• AI Research and Innovation

• AI in the Screen Industries

will once again be a brilliant opportunity to listen and engage with professionals that range from developers to business leaders, who have led on adopting AI as a tool to build better services, products, or business operations.

The conference is free to attend but delegates must ahead of time.

Find us on , , or for the latest news or click here to contact us.

The post AI Con 2020 | 3rd – 4th December appeared first on ��Ĵ�ý.

How can banks arm themselves against increasing regulatory and technological complexity? – FinTech Finance

Jamie Gordon — Tue, 03 Nov 2020 10:00:22 +0000

��Ĵ�ý Head of Artificial Intelligence, Dr. Fiona Browne, recently contributed to the episode of FinTech Finance: Virtual Arena. Steered by Douglas MacKenzie, the interview covered the extent of the Anti-Money Laundering (AML) fines currently faced by banks over the last number of years and start to unpack what we do at ��Ĵ�ý in relation to this topic: helping banks address their data quality, with essential solutions designed to combat fraudsters and money launderers.

How can banks arm themselves against increasing regulatory and technological complexity?

Fiona began by highlighting how Financial Institutions face significant challenges when managing their data. However, the increase in financial regulations since the financial crisis of 2008/2009, ensuring data quality has gained in its importance, obliging institutions to have a handle on their data and make sure it is up to date. Modern data quality platforms mean that the timeliness of data can now be checked via a ‘pulse check’ to ensure that it can be used in further downstream processes and that it meets regulations.

Where does ��Ĵ�ý fit in to the AML arena?

A financial institution needs to be able to verify the client that they are working with when going through the AML checks. The AML process itself is vast but at ��Ĵ�ý, we focus on the area of profiling data quality and matching – it is our bread and butter. Fiona stressed the importance of internal checks as well as public entity data, such as sanction and watch lists.

In a nutshell, there is a significant amount of data to check and compare and with lack ofquality data, it becomes a difficult and costly task to perform so we at ��Ĵ�ý, focus on data quality cleansing and matching at scale.

Why should banks look to partner, rather than building it in house?

One of the key issues of doing this in house is not having the necessary resources to perform the required checks and adhere to the different processes in the AML pipeline. According to the Financial Conduct Authority (FCA), in-house checks and a lack of data are causing leading financial institutions to receive hefty fines. Fiona reiterated that when Banks bring it back to the fundamentals and get their processes right and data into order, they can then use the partner’s technology to automate and streamline these processes, which in turn speeds up the onboarding process and ensure the legislation is being met.

Why did the period of 2018/2019 have such a high number of AML breaches?

Fiona explained that many transactions go back over a decade, it takes time to identify such transactions. AML compliance is difficult to achieve and regulators know that it is challenging. The regulators are doing a better job at providing guidelines to financial institutions, enabling them to address these regulations. Fiona reaffirmed that perhaps 2018/2019 was a wakeup call that was well needed to address this issue.

And with AML fines already at $5.6 billion this year, more than the whole of 2019, what can banks do?

Looking at the US, where although the fines for non-compliant AML processes are not as high as 2019, there is still a substantial number of fines being issued, Fiona said that it is paramount to ensure financial institutions have the right data and the right processes in place. Although it can be considered as an administrative burden, there is real criminal activity behind the scenes, which is why AML is so important. It is vital that financial institutions get a handle on this, enabling them to also improve the experience for their clients.

The fines will continue to be issued. Why should firms look to clean data when they just want to get to the bottom line?

It is essential to have the building blocks in place. Data quality is key for the onboarding process, but it is also essential downstream, particularly if you are wanting to do more trend analysis. Getting the fundamentals right at the start will pay back in dividends.

Are there any other influences that Artificial Intelligence (AI) and Machine Learning (ML) can have on the banks onboarding process?

According to Fiona, there is no silver bullet. One AI/ML technique will not solve all the AML issues. It is about deploying these techniques when approaching the issues in different ways. A large part of the onboarding process is gathering data and extracting relevant information from the data set. Fiona has seen a lot of Neuro-Linguistic Programming (NLP) techniques employed to extract the data from documents. At ��Ĵ�ý, we use Machine Learning in the data matching process to reduce the manual review time. ML techniques are employed in supervised and unsupervised approaches geared to pinpoint fraudulent transactions. We think that the graph databases and network analysis side of machine learning is an interesting area, we are currently exploring how it can be deployed into AML and fraud detection.

Bonus content: In the US and Canada, one way to potentially identity fraud was to look at transactions that were over $10,000. The criminals however become increasingly savvy and utilise Machine Learning to muddy their tracks. By doing this, they can divide transactions into randomised amounts to make them appear less pertinent. As Fiona put it ‘the cat and mouse game’.

If you are employed in the banking sector or if you must deal with large and messy datasets, you will probably face challenges derived from poor data quality, standardization, and siloed information.

��Ĵ�ý provides the tools to tackle these issues with minimum IT overhead, in a powerful and agile way. Get in touch with the self-service data quality experts today to find out how we can help.

The post How can banks arm themselves against increasing regulatory and technological complexity? – FinTech Finance appeared first on ��Ĵ�ý.

��Ĵ�ý contributes to Bank of England and FCA’s AI Public-Private Forum

Tania — Mon, 12 Oct 2020 07:27:00 +0000

Belfast, London, New York, 12^th October 2020

��Ĵ�ý is pleased to announce that its Head of AI, , has been invited to participate in the , joining 20 other experts from across the financial technology sectors as well as academia, along with the observers from the Information Commissioner’s Office and the Centre for Data Ethics and Innovation.

The purpose of the Forum, launched by the Bank of England and the Financial Conduct Authority, is to facilitate dialogue between the public and private sectors to better understand the use and impact of AI in financial services, which will help further the Bank’s objective of promoting the safe adoption of this technology.

The AI Public-Private Forum, with an intended duration of one year, will consist of a series of quarterly meetings and workshops structured around three topics: data, model risk management, and governance.

Commenting on the initiative’s launch, the deputy governor for markets and banking at the BofE, David Ramsden said:

The existing regulatory landscape is somewhat fragmented when it comes to AI, with different pieces of regulation applying to different aspects of the AI pipeline, from data through model risk to governance. The policy must strike a balance between high-level principles and a more rules-based approach. We also need to future-proof our policy initiatives in a fast-changing field.

The specific aims of the Forum are: firstly, to share information and understand the practical challenges of using AI in financial services, identify existing or potential barriers to deployment, and consider any potential risks or trade-offs; secondly, to gather views on areas where principles, guidance, or regulation could support safe adoption of these technologies; and finally, to consider whether once the forum has completed its work ongoing industry input could be useful and if so, what form this could take.

The knowledge, experience, and expertise of the Forum’s members and observers will be invaluable in helping us to contextualise and frame the Bank’s thinking on AI, its benefits, its risk and challenges, and any possible future policy initiatives.

Fiona Browne, Head of AI at ��Ĵ�ý, said:

I’m really excited and honoured to be part of such a timely forum. AI/ML services touch our everyday lives from recommending what we watch to groceries that we buy.

Within financial services, ML can offer efficiency benefits reducing manual time-consuming tasks, to saving customers money in suggesting best financial products to bespoke customer service solutions and fraud detection. These solutions need to sit within a legal and regulatory environment in the financial sector and are not without their risks and challenges.

I hope to offer the forum insights and experience of the practical implementation of ML-based on the areas of data quality and fairness through to transparency and explainability in the process and model predictions through to the monitoring of models in production. Excited to focus and tease out potential guidance and best practice on how to safely adopt and deploy such solutions.

What is the AI Public-Private Forum?

The BOE working with FCA have established the AIPPF (AI Public-Private Forum). This forum launched in October 2020 and consists of members reflecting a variety of views who applied to be on the forum bringing with them their expertise in the area of AI/ML. The AIPPF will:

Share information and understand the practical challenges of using AI/ML within financial services, as well as the barriers to deployment and potential risks.��
Gather views on potential areas where principles, guidance or good practice examples could be useful in supporting safe adoption of these technologies.��
Consider whether ongoing industry input could be useful and what form this could take (e.g. considering an FMSB-type structure or industry codes of conduct).��

More information about the Forum can be found .

The post ��Ĵ�ý contributes to Bank of England and FCA’s AI Public-Private Forum appeared first on ��Ĵ�ý.

IRMAC Reflections with Dr. Fiona Browne

Jamie Gordon — Mon, 07 Sep 2020 09:00:00 +0000

There is a lot of anticipation surrounding Artificial Intelligence (Al) and Machine Learning (ML) in the media. Alongside the anticipation is speculation – including many articles placing fear into people by inferring that AI and ML will replace our jobs and automate our entire lives!

Dr Fiona Browne, Head of AI at ��Ĵ�ý recently spoke at an IRMAC (Information Resource Management Association of Canada) webinar, alongside Roger Vandomme, of Neos, to unpack what AI/ML is, some of the preconceptions, and the reasons why different approaches to ML are taken…

What is AI/ ML?

Dr. Browne clarified that whilst there is no official agreed-upon definition of AI, it can be depicted as the ability of a computer to perform cognitive tasks, such as voice/speech recognition, decision making, or visual perception. ML is a subset of AI, entailing different algorithms that learn from input data.

A point that Roger brought up at IRMAC was that the algorithms learn to identify patterns within the data and the used patterns enable the ability to distinguish between different outcomes, for example, the detection of a fraudulent or non-fraudulent transaction.��

ML takes processes that are repetitive and automates them. At ��Ĵ�ý, we are exploring the usage of AI and ML in our platform capabilities – Dr Fiona Browne��

What are the different approaches to ML?

Supervised, unsupervised, and reinforcement machine learning. Dr. Browne communicated that at a broad level, there are three approaches: supervised, unsupervised, and reinforcement machine learning.

In supervised ML, the model learns from a labelled training data set. For example, financial transactions would be labelled as either fraudulent or genuine fed into the ML model. The model then learns from this input and can distinguish the difference.

Where data is unlabelled, Dr. Browne explained that unsupervised ML would be more appropriate, where the model learns from unlabelled data. There is a key difference here with supervised ML in that the model would seek to uncover clusters or patterns inherent in the data to enable it to separate them out.

Finally, reinforcement machine learning involves models that continually learn and update from performing a task. For example, a computer algorithm learning how to play the game ‘Go’. This is achieved by the outputs of the model being validated and that validation being provided back to the model.

The difference between supervised learning and reinforcement learning is that in supervised learning the training data has the answer key with it, meaning the model is trained with the correct answer.

In contrast to this, in reinforcement learning, there is no answer, but the reinforcement agent selects what to do to perform the specific task.

It is important to remember that if there is no training dataset present, it is bound to learn from its experience. Often the biggest trial comes when a model is being transferred out of the training environment and into the real world.

Now that AI/ML and the different approaches have been unpacked… the next question is how does��explainability��fit into this? ��The next mini IRMAC reflection will unravel what��explainability��is and what the different approaches are. Stay tuned!��

Fiona has written an extensive piece on AI enabled data quality, feel free to check it out��here.��

Click here for more by the author, or find us on , or for the latest news.

The post IRMAC Reflections with Dr. Fiona Browne appeared first on ��Ĵ�ý.

IRMAC Detective Data Work: AML and Emergent AI practices | 12/07/20

Tania — Wed, 01 Jul 2020 09:00:00 +0000

Earlier this month, our Head of AI, Dr. Fiona Browne took part in the IRMAC webinar ‘Detective Data Work’ and explored the AML and emergent AI practices.

Missed it? Watch the recording below:

In this webinar, the expert panellists questioned what anti-money laundering (AML) efforts look like, and the complexities in sifting through vast data volumes, data quality and identification in an effort to make their findings ‘explainable’.

Reducing the money flow in criminal activities had a major boast after the events of 9/11/2001.

Now Artificial Intelligence (AI) and Machine Learning (ML) techniques are beginning to revolutionize practices in this field. – IRMAC

��Ĵ�ý Fiona:

Fiona Browne is Head of Artificial Intelligence at ��Ĵ�ý with over 15 years’ research and industrial experience. Prior to joining ��Ĵ�ý, Fiona lectured in Computing Science at Ulster University teaching Data Analytics and undertaking research on applied artificial intelligence and data integration. She was a Research Fellow at Queen’s University Belfast and a Senior Software Developer at PathXL. Fiona received a BSc (Hons.) degree in Computing Science and a PhD on Artificial Intelligence in Bioinformatics from Ulster University.

��Ĵ�ý IRMAC:

The is a non-profit, vendor-independent association of information management and business professionals.

Our primary objective is to provide a forum for members to exchange information, experiences and promote the understanding, development and practice of managing information and data as a key enterprise asset.

The post IRMAC Detective Data Work: AML and Emergent AI practices | 12/07/20 appeared first on ��Ĵ�ý.

Read how AI is transforming Data Quality in this exclusive white paper

Fiona Browne — Wed, 10 Jun 2020 20:00:43 +0000

��

In this AI whitepaper, authored by our Head of AI we provide an overview of Artificial Intelligence (AI) and Machine Learning (ML) and their application to Data Quality.

We highlight how tools in the ��Ĵ�ý platform can be used for key data preparation tasks including cleansing, feature engineering and dataset labelling for input into ML models.

A real-world application of how ML can be used as an aid to improve consistency around manual processes is presented through an Entity Resolution Use Case.

In this case study we show how using ML reduced manual intervention tasks by 45% and improved data consistency within the process.

Having good quality, reliable and complete data provides businesses with a strong foundation to undertake tasks such as decision making and knowledge to strengthen their competitive position. It is estimated that poor data quality can cost an institution on average $15 million annually.��

As we continue to move into the era of real-time analytics and Artificial Intelligence (AI) and Machine Learning (ML) the role of quality data will continue to grow. For companies to remain competitive, they must have in place flexible data management practices underpinned by quality data.

AI/ML are being used for predictive tasks from fraud detection through to medical analytics. These techniques can also be used to improve data quality when applied to tasks such as data accuracy, consistency, and completeness of data along with the data management process itself.

In this whitepaper we will provide an overview of the AI/ML process and how ��Ĵ�ý tools can be applied in cleansing, deduplication, feature engineering and dataset labelling for input into ML models. We highlight a practical application of ML through an Entity Resolution Use Case which addresses inconstancies around manual tasks in this process.

The post Read how AI is transforming Data Quality in this exclusive white paper appeared first on ��Ĵ�ý.

Explainable AI with Dr. Fiona Browne

Fiona Browne — Tue, 26 May 2020 18:19:57 +0000

Dr Fiona Browne, ��Ĵ�ý, discusses Explainable AI

The AI team at ��Ĵ�ý is building explainability from the ground up and demonstrating the “why and how” behind predictive models for client projects.

Matt Flenley prepared to open his brains to a rapid education session from Dr Fiona Browne and Kaixi Yang.

One of the most hotly debated tech topics of 2020 concerns model interpretability, that is to say, the rationale of how an ML algorithm has made a decision or prediction. Nobody doubts that AI can deliver astonishing advances in capability and corresponding efficiencies in an effort, but as HSBC’s Chief Data Officer Lorraine Waters shared at a recent A-Team event, “is it creepy to do this?” Numerous agendas at conferences are filled with differing rationales for interpretability and explainability of models, whether business-driven, consumer-driven, or regulatory frameworks to enforce good behaviour, but these are typically ethical conversations first rather than technological ones. It’s clear we need to ensure technology is “in the room” on all of these drivers.

We need to be informed and guided by technology to see what tools are already available to help with understanding AI decision-making, how tech can help shed light on ‘black boxes’ just as much as we’re dreaming up possibilities for the use of those black boxes.

As Head of ��Ĵ�ý’ AI team, Dr Fiona Browne has a strong desire for what she calls ‘baked-in explainability’. Her colleague Kaixi Yang explains more about explainable models,��

Some algorithms, such as neural networks (deep learning), are complex. Functions are calculated through approximation, from the network’s structure it is unclear how this approximation is determined. We need to understand the rationale behind the model’s prediction so that we can decide when or even whether to trust the model’s prediction, turning black boxes into glass boxes within data science.

The team puts their ‘explain first‘ approach to a specific client project to build explainable Artificial Intelligence (XAI) from the ground up, using explainability metrics including LIME – a local, interpretable, model-agnostic way of explaining individual predictions.

“Model-agnostic explanations are important because they can be applied to a wide range of ML classifiers, such as neural networks, random forests, or support vector machines” continued Ms Yang, who has recently joined ��Ĵ�ý after completing an MSc in Data Analytics with Queen’s University in Belfast. “They help to explain the predictions of any machine learning classifier and evaluate its usefulness in various tasks related to trust”.

For the work the team has been conducting, these range of explainability measures provides them with the ability to choose the most appropriate Machine Learning model and AI systems, not just the one that makes the most accurate predictions based on evaluation scores. This has had a significant impact on their work on Entity Resolution for Know Your Customer (KYC) processes, a classic problem of large, messy datasets that are hard to match, with painful penalties if it goes wrong for human users. The project, which is detailed in a recent webinar hosted with the Enterprise Data Management Council, matched entities from the Refinitiv PermID and Global LEI Foundation’s datasets and relied on human validation of rule-based matches to train a machine learning algorithm.

Dr Browne again: “We applied different explainability metrics to three different classifiers that could predict whether a legal entity would match or not. We trained, validated and tested the models using an entity resolution dataset. For this analysis we selected�� two ‘black-box”’classifiers, and one interpretable classifier to illustrate how the explainability metrics were entirely agnostic and applicable regardless of the classifier that was chosen.”

The results are shown here:

“In a regular ML conversation, these results indicate two reliably accurate models that could be deployed in production,” continued Dr Browne, “but in an XAI world we want to shed light on how appropriate those models are.”

By applying, for example, LIME to a random instance in the dataset, the team can uncover the rationale behind the predictions made. ��Ĵ�ý’ FlowDesigner rules studio automatically labelled this record as “not a match” through its configurable fuzzy matching engines.

Dr Browne continued, “explainability methods build an interpretable classifier based on similar instances to the selected instance from the different classifiers and summarises the features which are driving this prediction. It selects those instances that are quite close to the predicted instance, depending on the model that’s been built, and uses those predictions from the black-box model to build a glass-box model, where you can then describe what’s happening.

In this case, for the Random Forest model (fig.), the label has been correctly predicted as 0 (not a match) and LIME exposes the features driving this decision. The prediction is supported by two key features but not a feature based on entity name which we know is important”

Using LIME on the multilayer perceptron model (fig.), which had the same accuracy as Random Forest, it correctly predicted the “0” label of “not a match” but with a lower support score. It has been supported by slightly different features compared to the random forest model.

The Naïve Bayesian model was different altogether. “It fully predicted the correct label of zero with a prediction confidence of one, the highest confidence possible,” said Dr Browne, “however it’s made this prediction supported by only one feature, a match on the entity country, disregarding all other features. This would lead you to doubt whether it’s reliable as a prediction model.”

This has significant implications in something as riddled with differences in data fields as KYC data. People and businesses move, directors and beneficial owners resign, and new ones are appointed, and that’s without considering ‘bad actors’ who are trying to hoodwink Anti-Money Laundering (AML) systems.��

The process of ‘phoenixing’, where a new entity rises from the ashes of a failed one, intentionally dodging the liabilities of the previous incarnation, frequently relies on truncations or mis-spellings of director’s names to avoid linking the new entity with the previous one.��

Any ML model being used on such a dataset would need to have this explainability baked-in to understand the reliability of predictions that the data is informing.

Using one explainability metric only is not good practice. Dr Browne explains ��Ĵ�ý’ approach: “Just as in classifiers, there’s no real best evaluation approach or explainer to pick; the best way is to choose a number of different models and metrics to try to describe what’s happening .There are always pros and cons, ranging from the scope of the explainer to stability of the code to complexity of the model and how and when it’s configured.”

These technological disciplines, to test, evaluate and try to understand a problem are a crucial part of the entire conversation that businesses are having at an ethical or “risk appetite” level.

Click��here for more from ��Ĵ�ý, or find us on��,��or�� for the latest news.

The post Explainable AI with Dr. Fiona Browne appeared first on ��Ĵ�ý.

AI Bias – The Future is Accidentally Biased?

Matt Flenley — Fri, 15 May 2020 10:21:51 +0000

AI Bias

Every now and then a run-of-the-mill activity makes you sit up and take notice of something bigger than the task you’re working on, a sort of out-of-body experience where you see the macro instead of the micro.

Yesterday was one such day. I’d had a pretty normal one of keeping across all the usual priorities and Teams calls, figuring out our editorial calendar and the upcoming��, all the while refreshing some buyer and user personas for our Self-Service Data Quality platform.

Buyer personas themselves are hardly a new thing, and they’re typically represented by an icon or avatar of the buyer or user themselves. This time, rather than pile all our hopes, dreams and expectations into a bunch of cartoons, I figured I’d experiment a little. Back in January I’d been to an��, where I’d heard about��Generative Adversarial Networks (GANs)��and the ability to use AI to create images of pretty much anything.

Being someone who likes to use tech first and ask questions later, I headed over to the always entertaining��where GANs do a pretty stellar job of creating highly plausible-looking people who don’t exist (with some amusing if mildly perturbing issues at the limitations of its capability!). I clicked away, refreshing the page and copying people into my persona template, assigning our typical roles of Chief Data Officer, Data Steward, Chief Risk Officer and so on; it wasn’t until I found myself pasting them in that I realised how hard it was to generate images of people who were not white. Or indeed how it was impossible to generate anyone with a disability or a degenerative condition.

Buyer personas are supposed to reflect all aspects of likely users of the technology, yet this example of AI would unintentionally bias our product and market research activities to overlook people who did not conform to the AI’s model. My colleague Raghad Al-Shabandar wrote about this recently (), and I think probably the most impactful part of this, for me, anyway, was the following quote:

The question, then, is developing models for the society we wish to inhabit, not merely replicating the society we have.

In the website’s case, it’s even worse: it obliterates the society we currently have, by creating images that don’t reflect the diversity of reality, instead layering on an expected or predicted society that is over 50% white and 0% otherwise-abled.

I should make it clear that I’m a big fan of this tech, not least for the bafflement my kids have at the non-existence of a person who looks very much like a person! But at the same time, I think it perhaps exposes the risk all AI projects have – did we really think of every angle about what society looks like today, and did we consider how society ought to look?

These are subjective points that vary wildly from culture to culture and country to country, but we must ensure that every minority and element of diversity is in the room when we’re making such decisions or we risk baking-in bias before we’ve even begun.

Click��here��for the latest news from ��Ĵ�ý, or find us on��,��or��

The post AI Bias – The Future is Accidentally Biased? appeared first on ��Ĵ�ý.

Machine Learning Archives - ���Ĵ�ý

The Importance of Data Quality in Machine Learning

Data as a common theme��

What this looks like in practice

An example from healthcare

How to test your data against Benford’s Law��

What is Benford’s Law?��

Why is it useful?��

How can ���Ĵ�ý’ tools be used to test for Benford’s Law?��

In conclusion…

AI Ethics: The Next Generation of Data Scientists

Humans behind the machines, and baked-in bias

Data Quality for Machine Learning – just a matter of good data?

AI Con | 3 December 2021

We are delighted to be involved withthis year again. This year, , Presales Engineer, and Machine Learning Engineer will be discussing Machine Learning Augmentation.

Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?����

Written by Daniel Browne, Machine Learning Engineer

Problem Area and ML Solution

Human in the Loop

Developers View

���Ĵ�ý is involved with the KTN: AI for Services UK Tour!

The first stop on the AI for Services UK will be Northern Ireland curated by the fantastic team at and !

We are delighted that�����Ĵ�ý��will be one of the companies involved, the aim of the event is to discover the innovation taking place across the UK in the professional and financial, insurance, accountancy and law��sectors.��

KTN: AI for Services on Tour | 23/02

We are delighted to be one of a few Northern Irish’s businesses to be part of the KTN: AI for Service on Tour.

This is a brilliant opportunity for those in attendance to hear what ���Ĵ�ý is doing in the space.

The Open University talk: Business Ethics | 17/02

The Open University Business Ethics talk & ���Ĵ�ý

Matt��Flenley, Marketing and Partnerships Manager at�����Ĵ�ý��will be speaking this week at The Open University, delivering a talk on Business Ethics.

AI Con 2020 Interview with Dr. Fiona Browne and Matt Flenley

Dr. Fiona Browne, Head of AI, and Matt Flenley, Marketing and Partnerships Manager at ���Ĵ�ý are contributing to AI Con 2020 this year.

After a successful first year, AI Con is back!

AI Con 2020 | 3rd – 4th December

How can banks arm themselves against increasing regulatory and technological complexity? – FinTech Finance

How can banks arm themselves against increasing regulatory and technological complexity?

Where does ���Ĵ�ý fit in to the AML arena?

Why should banks look to partner, rather than building it in house?

Why did the period of 2018/2019 have such a high number of AML breaches?

And with AML fines already at $5.6 billion this year, more than the whole of 2019, what can banks do?

The fines will continue to be issued. Why should firms look to clean data when they just want to get to the bottom line?

Are there any other influences that Artificial Intelligence (AI) and Machine Learning (ML) can have on the banks onboarding process?

���Ĵ�ý contributes to Bank of England and FCA’s AI Public-Private Forum

Belfast, London, New York, 12th October 2020

What is the AI Public-Private Forum?

IRMAC Reflections with Dr. Fiona Browne

There is a lot of anticipation surrounding Artificial Intelligence (Al) and Machine Learning (ML) in the media. Alongside the anticipation is speculation – including many articles placing fear into people by inferring that AI and ML will replace our jobs and automate our entire lives!

What is AI/ ML?

What are the different approaches to ML?

IRMAC Detective Data Work: AML and Emergent AI practices | 12/07/20

Read how AI is transforming Data Quality in this exclusive white paper

In this AI whitepaper, authored by our Head of AI we provide an overview of Artificial Intelligence (AI) and Machine Learning (ML) and their application to Data Quality.

Explainable AI with Dr. Fiona Browne

The AI team at ���Ĵ�ý is building explainability from the ground up and demonstrating the “why and how” behind predictive models for client projects.

Matt Flenley prepared to open his brains to a rapid education session from Dr Fiona Browne and Kaixi Yang.

We need to be informed and guided by technology to see what tools are already available to help with understanding AI decision-making, how tech can help shed light on ‘black boxes’ just as much as we’re dreaming up possibilities for the use of those black boxes.

AI Bias – The Future is Accidentally Biased?

Every now and then a run-of-the-mill activity makes you sit up and take notice of something bigger than the task you’re working on, a sort of out-of-body experience where you see the macro instead of the micro.

Machine Learning Archives - ��Ĵ�ý

How can ��Ĵ�ý’ tools be used to test for Benford’s Law?��

Rules Suggestion – What is it and how can it help in the pursuit of improving data quality?��

��Ĵ�ý is involved with the KTN: AI for Services UK Tour!

We are delighted that��Ĵ�ý��will be one of the companies involved, the aim of the event is to discover the innovation taking place across the UK in the professional and financial, insurance, accountancy and law��sectors.��

This is a brilliant opportunity for those in attendance to hear what ��Ĵ�ý is doing in the space.

The Open University Business Ethics talk & ��Ĵ�ý

Matt��Flenley, Marketing and Partnerships Manager at��Ĵ�ý��will be speaking this week at The Open University, delivering a talk on Business Ethics.

Dr. Fiona Browne, Head of AI, and Matt Flenley, Marketing and Partnerships Manager at ��Ĵ�ý are contributing to AI Con 2020 this year.

Where does ��Ĵ�ý fit in to the AML arena?

��Ĵ�ý contributes to Bank of England and FCA’s AI Public-Private Forum

Belfast, London, New York, 12^th October 2020

The AI team at ��Ĵ�ý is building explainability from the ground up and demonstrating the “why and how” behind predictive models for client projects.