Streaming Data Processing Engine to detect UPI Transaction Fraud using Kafka-Flink-Druid
This is an in-house research project that Irisidea innovation lab team executed to check the feasibility of Apache Flink induction as a streaming data processing engine in fraud detection during UPI transactions.
This use-case is for collecting and analyzing the real-time UPI transaction events through third party UPI applications installed on mobile devices like PhonePe, GooglePay etc, and find out that during a specific time frame, how many transaction failed, abnormal transaction occurred and perform a subsequent action accordingly.
challenge
Segregate the stream of data that continuously flows through Flink based on failed transaction & abnormal transaction and publish the same to different Kafka topics.
Note: Irisidea developed a simulator to produce real time UPI transactional stream every second (configurable) for the data required for this project. This was needed, as no financial entity would provide their real-time transactional data to test the real-time streaming data processing within Flink engine due to financial security and Government’s data privacy/protect laws.
solution
- Integrated multi-broker Apache Kafka with the UPI transactional data simulator to publish real time events every second .
- Developed code that reads JSON data from broker and writes it back with simple window grouping for processing and publishing back to different topics in multi-cluster brokers continuously.
- Developed the Kafka supervisor specifications, which was mandataory for streaming data consumption by Apache Druid
- Configured Apache Druid with Kafka’s broker to fetch the published events/data continously for analysing using Druid’s SQL query engine.
outcome
- Flink, Kafka and Druid Installation & Configuration
- Data/Event consumption from the simulator and ingestion to Kafka brokers
- Integration of multi-node Kafka broker with Apache Flink
- Developed java code to seperate on failed and abnormal transactions inside and writing back to different Kafka Topics
- Developed Kafka supervisor specifications for Apache Druid
- Querying and analysing all real-time data/events for the staticstic of successful, failed and abnormal UPI transactions.