Apache Flink
A framework and distributed processing engineÂ
Flink enables stateful computations across both unbounded and constrained data streams.


Stateful Computations on Data Streams
Guaranteed correctness
- Exactly-once state consistency
- Event-time processing
- Sophisticated late data handling
Scalability
- Scale-out architecture
- Support for very large state
- Incremental Checkpoints
Performance
- Low latency
- High throughput
- In-Memory computing
Layered APIs
- SQL on Stream & Batch Data
- DataStream API & DataSet API
- ProcessFunction (Time & State)
Operational focus
- Flexible deployment
- High-availability setup
- Savepoints
-
Process Unbounded and Bounded Data
-
Deploy Applications Anywhere
-
Run Applications at any Scale
-
Leverage In-Memory Performance
Building Blocks of Apache Flink
And how it approaches to handle each of them
Streams
- Bounded and unbounded streams: Flink has sophisticated features to process unbounded streams, and also dedicated operators to efficiently process bounded streams.
- Real-time and recorded streams: Flink applications can process recorded like a file system & object store or real-time generated streams.
State
- Multiple State Primitives: Flink has state primitives for data structures, e.g. atomic values, lists, or maps.Â
- Pluggable State Backends: Flink stores state in memory or in an embedded on-disk data store. Custom state backends can be plugged in too.
- Exactly-once state consistency: Flink guarantees the consistency of application state in case of a failure.
- Very Large State: Flink maintains application state of several terabytes.
- Scalable Applications: Flink supports scaling of stateful applications.
Time
- Event-time Mode: Event-time processing of Flink allows for accurate and consistent results.
- Watermark Support: Flink employs watermarks to reason about time in event-time applications.
- Late Data Handling: Flink features multiple options to handle late events, such as rerouting, and updating.
- Processing-time Mode: Flink also supports processing-time semantics which performs computations as triggered by the wall-clock time of the processing machine.
When do you need Flink?
Event Driven Applications
An event-driven application is a stateful application that ingest events from one or more event streams and reacts to incoming events by triggering computations, state updates, or external actions.
- Fraud detection
- Anomaly detection
- Rule-based alerting
- Business process monitoring
- Web application (social network)
Stream & Batch Analytics
Analytical jobs extract information and insight from raw data. Apache Flink supports traditional batch queries on bounded data sets and real-time, continuous queries from unbounded, live data streams.
- Quality monitoring of Telco networks
- Analysis of product updates & experiment evaluation in mobile applications
- Ad-hoc analysis of live data in consumer technology
- Large-scale graph analysis
Data Pipelines & ETL
Extract-transform-load (ETL) is a common approach to convert and move data between storage systems.
- Real-time search index building in e-commerce
- Continuous ETLÂ in e-commerce
Setting up Flink to Work with Kafka