Fraud prevention and anti-money laundering are tough use cases to tackle. They become even more difficult when data originates from various sources, not complete or correct, or even if it’s not kept up-to-date.
To make this work, machine learning models need the data in the right format, but data quality processes at Hadoop cluster scale are no picnic – and the moment that models gets put into practice, the data will be out of date. So, you’ll need a way to keep the cluster in sync with transactional source systems in real-time – which shouldn’t be too hard, right?
Some of the key points we’ll discuss in this webinar are:
- Simply getting datasets ready to analyze
- The need to track data back to the source, and trust in the algorithm’s conclusions
- Strict security requirements
- Resolving fuzzy duplicates
- The need for real-time change data capture from various sources to Hadoop, Cloud and Kafka