AWS, GitHub, Amazon SageMaker
Founded in 2020, TMNL (Transaction Monitoring Netherlands) is an organization that fights financial crime by reviewing the transactions of five of the largest Dutch banks (ABN Amro, ING, Rabobank, Triodos, and Volksbank). By collecting anonymized data from all of them, TMNL establishes a broad picture of the financial transactions for small and medium enterprises. This helps the banks improve their KYC/CDD duty (Know Your Customer / Customer Due Diligence).
When criminals launder money, they often use accounts in different banks, making their activity harder to detect. TMNL solves this problem by working with data from all five banks. When they notice some anomaly, they report the results to the banks involved.
Monitoring the transactions helps the banks and the Dutch society in general. After all, 16 billion euros are estimated to be laundered annually in the Netherlands, and only 2% of that money is currently intercepted.
Knowing the Anonymous
TMNL only processes company transactions (not individual client ones). It does so with the uttermost attention to privacy and security. All the data it receives is pseudonymized, without any visibility of IBANs, addresses, or company names.
By combining the picture of the global financial context with a detailed breakdown of the transactions for different businesses, they detect which transactions are normal and which ones are unusual.
When an unusual transaction is detected, TMNL contacts the banks involved, which process these signals and, where required, report these to the Financial Investigation Unit (FIU).
The Initial Challenge
As TMNL’s staff and infrastructure grew, different teams worked on different ML models, often replicating what other teams had done and not sharing their ideas and code quickly enough. With these custom solutions came the risk of not learning from others and eventually not building new products.
Not only was there an overlap between different teams’ work, but there were also differences in the definitions used for some features (for example, some teams might use the median while others might use the average for the same quantity).
Lucian and his team at TMNL were very aware of this challenge, and the idea of developing a feature store was on their backlog. When Roel came from Xebia to support the team, he took on this project and led the charge to make it happen. Instead of a feature store, he started with a feature catalog, which gives most of the benefits of a store but is much easier to accomplish. If you want to know more about feature catalogs, take a look at this blog post.
The Details of the Solution
Typical features to analyze financial transactions for anti-money laundering purposes include the time they happened, the place, or the number of transactions on a given day.
Let’s say that as a user, e.g., a data scientist developing a new model, you need features A, B, and C. Since many people have used these features before, all you have to do is use the feature catalog, a Python package that gives you the features computed in a standardized way and consistent with the rest of the teams. Apart from saving you development time, there are two extra advantages:
- You can now use the documentation associated with these features for reference.
- You don’t need to worry about security and compliance, both of which are key in banking, because someone has validated these definitions already before you.
The figure below illustrates how the process works with and without a feature catalog for two different teams. If you want to learn more about setting up feature stores, have a look at this GitHub repository.
“With the rise of data lakes, as teams rely more on custom pipelines for data preprocessing and feature engineering, it becomes harder to reuse features or even compare them across different teams.”
The final results
After implementing the feature catalog, all TMNL teams can now use a central repository for feature definitions and contribute to it, too. These high-quality features may be used both for inspiration purposes and to create operational models.
For a specific model, TMNL data scientists could reduce the amount of code by up to 70% by using the feature catalog. All they had to do was import the feature logic from the central repository. As a result, teams currently have more time to refine their models and can ship subsequent products much faster.
For example, some models might have required six months to be developed from scratch. Now, if the features are already available in the catalog, the models can be deployed in weeks.
Naturally, a company’s work is not all about technology. People are vital to implementing the proposed changes. This is where Xebia’s Joost Bosman came in, connecting the MLOps, platform, and data teams (all three on the technical side) with the model development teams. He made sure everyone was aware of the feature catalog’s possibilities and how this could enhance TMNL’s work. Joost also brought in best practices, helped with automation, and made processes smoother.
“I don’t believe in outsourcing thinking, but in co-creation and partnership. I want consultants, partners actually, who come in sharing knowledge and challenging us, but also thriving and getting better as they work with us.”
Lucian Baghiuc, Chief AML Analytics Officer at TMNL