Process Streaming Data with Apache Flink

Apache Flink

A framework and distributed processing engine 

Flink enables stateful computations across both unbounded and constrained data streams.



Stateful Computations on Data Streams

Guaranteed correctness

  • Exactly-once state consistency
  • Event-time processing
  • Sophisticated late data handling

Scalability

  • Scale-out architecture
  • Support for very large state
  • Incremental Checkpoints

Performance

  • Low latency
  • High throughput
  • In-Memory computing

Layered APIs

  • SQL on Stream & Batch Data
  • DataStream API & DataSet API
  • ProcessFunction (Time & State)

Operational focus

  • Flexible deployment
  • High-availability setup
  • Savepoints

Apache-flink-in-memory-performance

  • Process Unbounded and Bounded Data

  • Deploy Applications Anywhere

  • Run Applications at any Scale

  • Leverage In-Memory Performance

Building Blocks of Apache Flink

And how it approaches to handle each of them

Streams

  • Bounded and unbounded streams: Flink has sophisticated features to process unbounded streams, and also dedicated operators to efficiently process bounded streams.
  • Real-time and recorded streams: Flink applications can process recorded like a file system & object store or real-time generated streams.

State

  • Multiple State Primitives: Flink has state primitives for data structures, e.g. atomic values, lists, or maps. 
  • Pluggable State Backends: Flink stores state in memory or in an embedded on-disk data store. Custom state backends can be plugged in too.
  • Exactly-once state consistency: Flink guarantees the consistency of application state in case of a failure.
  • Very Large State: Flink maintains application state of several terabytes.
  • Scalable Applications: Flink supports scaling of stateful applications.

Time

  • Event-time Mode: Event-time processing of Flink allows for accurate and consistent results.
  • Watermark Support: Flink employs watermarks to reason about time in event-time applications.
  • Late Data Handling: Flink features multiple options to handle late events, such as rerouting, and updating.
  • Processing-time Mode: Flink also supports processing-time semantics which performs computations as triggered by the wall-clock time of the processing machine.

When do you need Flink?

Event-driven-applications

Event Driven Applications

An event-driven application is a stateful application that ingest events from one or more event streams and reacts to incoming events by triggering computations, state updates, or external actions.

  • Fraud detection
  • Anomaly detection
  • Rule-based alerting
  • Business process monitoring
  • Web application (social network)

stream_processing_batch-processing

Stream & Batch Analytics

Analytical jobs extract information and insight from raw data. Apache Flink supports traditional batch queries on bounded data sets and real-time, continuous queries from unbounded, live data streams.

  • Quality monitoring of Telco networks
  • Analysis of product updates & experiment evaluation in mobile applications
  • Ad-hoc analysis of live data in consumer technology
  • Large-scale graph analysis

Data Pipelines and ETL

Data Pipelines & ETL

Extract-transform-load (ETL) is a common approach to convert and move data between storage systems.

  • Real-time search index building in e-commerce
  • Continuous ETL in e-commerce

Setting up Flink to Work with Kafka

Case studies

Clear chat

Clearing chat history will remove all previous messages. You’ll start a brand-new conversation after this.