This article was first published by Thomson Reuters in October 2018.
Every single day, 2.5 quintillion bytes of data are created on the Internet, and that number will surely continue to grow. With all that information available, knowing and understanding your customers should be easier than ever. In reality, such vast amounts of data often result in a confusing, muddled picture that can be incredibly resource intensive to sift through.
Furthermore, the emergence of partially anonymising forms of social media and digital publishing have created a massive problem: false information, politically or ideologically driven propaganda and other forms of misleading data abound on the Internet. This proliferation of fake or misinformed information has had an undeniable impact on compliance.
How much can you rely on open-source data?
With almost half of all crime in the UK relating to fraud and cybercrime, it should hardly come as a surprise that there are large amounts of fraudulent data on the Internet that appear credible, including fake websites, phone numbers and sophisticated ‘front companies’.
In this environment, corporate customers may fall victim to “short” or “long firm fraud”, tactics in which criminals incorporate apparently legitimate businesses with the intention of defrauding customers and suppliers. The front company not only acts as a screen of deception but also exists to aid in the laundering of the proceeds of the fraud. Firms shouldn’t just be asking themselves, “how much do we know about our business customers?” They should be asking “how much of this information is credible?” as well.
Recognising the validity and credibility of adverse media as it relates to a potential client could also be critical when preventing fraud and complying with anti-money laundering (AML) regulations. Firms should be utilising open-source media to understand the position of their incoming clients or customers within the media, as it could provide a ‘suspicion of a money laundering risk’, with an obligation to report. Additionally, forging a connection to a customer under severe adverse media scrutiny could carry real reputational damage for your firm.
Language translation and misinterpretation can be another significant challenge to obtaining accurate data. Human data entry of ‘typos’ can have a significant impact on the verification of identities. Sometimes, simple alternative spellings of names can cause data disruption. For example, the name ‘Muhammad’, amongst the most common first names in the world, has at least a dozen different spelling variations.
Unstructured data often creates unnecessary problems for those employees and automated systems tasked with analysing said information. One example of unstructured data could be a postal address inputted through a ‘free text’ box, as opposed to the same data inputted through a string of fields. This makes it more difficulty for data owners to manage, mine and identify information anomalies. Many e-commerce websites structure data from the point of input by implementing technologies such as ‘postcode searching’. This automatically populates information into data fields matching street address, suburb, county, and so on. Anyone who has shopped online is likely to have experienced this data collection method.
Information held internally within a firm across different departments can suffer from a combination of these problems. Many firms still fail to remediate these issues by not sharing data, data cleansing or enriching the data to identify inconsistencies within customer files. Enhanced data management within the financial crime arena could significantly reduce costs within any given firm.
The ‘Data Lake’
Even with all that potentially false or misleading information out there, firms should not be deterred from gathering large amounts of information. Channelling data from various open sources and closed sources (e.g. fraud databases) can prove an essential element in building a clearer picture of a client.
The vast amount of collected unstructured data – which could be spread over various systems within the same organisation – is known as the data lake. The challenge lies in understanding how far the data lake is spread and how to channel the information through a system that structures that wide-ranging data into a cohesive form.
Data cleansing and enrichment
Data cleansing is the process of detecting ‘dirty data’ by identifying, remedying and removing inaccurate information. Data can be enriched further by adding related information from various other systems and sources; for example, appending a related phone number to a home or business address. However, to remove errors and validate the accuracy of the information, a reliable data management system requires an intelligent technology solution to analyse vast amounts of disparate data.
Data cleansing and enrichment are essential for financial crime compliance. Firms should have access to the most up-to-date lists of information on politically exposed persons (PEPs) and global sanctions possible, the latter of which is provided free of charge from sanctions issuers.
However, without generally clean data, financial crime processes can be severely protracted due to the emergence of a large number of false positives.
Moving towards a single customer view
The volume of data produced every day cannot be controlled. We can, however, manage the flow of data to ensure that we mitigate regulatory and fraud risks.
Data that are structured, validated and packaged into a format that is suitable for machine reading, screening and assessment improves the accuracy of customer risk assessments. Firms that can automate these processes, and develop a ‘single a customer view’ for each incoming and existing customer, can significantly reduce KYC costs. Regulators expect firms to carry out effective customer risk assessments and implement controls in-line with the risk-based approach. One of the best ways to demonstrate a commitment to compliance is by effectively organising and documenting you customer data.
If the foundation of preventing financial crime is truly ‘knowing your customer’, then the foundation of ‘knowing your customer’ is accurate, organised and reliable data. Operating a data management process will produce clean and structured data ready to be utilised for screening and ensure that the firm meets its regulatory requirements.