03 flink stream processing
-01- Introduction to Apache Flink, https://www.youtube.com/watch?v=QRvLAmdwWYQ
-02- Batch vs Real-time Processing https://www.youtube.com/watch?v=9OUD2CVcoOg
-03 Flink Stream processing
-~-~~-~~~-~~-~-
Please watch: "01 05 Why Google Cloud Platform "
https://www.youtube.com/watch?v=gG9hG6refBw
-~-~~-~~~-~~-~-
Flink applications give the ability to handle business logic that requires a contextual state while processing the data streams using its DataStream API at any scale
https://towardsdatascience.com/an-introduction-to-stream-processing-with-apache-flink-b4acfa58f14d
In many application domains, massive streaming data is generated from different sources, for example, user activities on the web, measurements from the Internet of Things (IoT) devices, transactions from financial services, and location-tracking feeds. These data streams (unbounded) that traditionally used to be stored as datasets (bounded), and processed later by batch processing jobs. Although this is not an efficient way in some scenarios due to the time value of the data, where the real-time processing is desirable by businesses to enable them to get insights from data and proactively respond to changes as close as the data is being produced (in motion).
Toward that, the applications have to be updated to be more stream-based using real-time stream processors. That is where Apache Flink comes in; Flink is an open-source framework for stateful, large-scale, distributed, and fault-tolerant stream processing.
This blog post presents an overview of Apache Flink and its key features for streaming applications. It focuses on Flink’s DataStream API and explores some of the underlying architectural design concepts.
Most of the details of this post are based on my hands-on experience in Flink during my involvement in the datAcron EU research project as summarised in this paper.
Distributed Online Learning System Archircture using Apache Flink. Photo by the author | Photo from A Distributed Online Learning Approach for Pattern Prediction over Movement Event Streams.
Apache Flink is gaining more popularity and it is being used in production to build large-scale data analytics and processing components over massive streaming data, where it powers some of the world’s most demanding stream processing applications, for example, it is a crucial component of Alibaba’s search engine.
Apache Flink Overview
Apache Flink is an open-source platform that provides a scalable, distributed, fault-tolerant, and stateful stream processing capabilities. Flink is one of the most recent and pioneering Big Data processing frameworks.
Apache Flink allows to ingest massive streaming data (up to several terabytes) from different sources and process it in a distributed fashion way across multiple nodes, before pushing the derived streams to other services or applications such as Apache Kafka, DBs, and Elastic search. Simply, the basics building blocks of a Flink pipeline: input, processing, and output. Its runtime supports low-latency processing at extremely high throughputs in a fault-tolerant manner. Flink capabilities enable real-time insights from streaming data and event-based capabilities. Flink enables real-time data analytics on streaming data and fits well for continuous Extract-transform-load (ETL) pipelines on streaming data and for event-driven applications as well.
It gives processing models for both streaming and batch data, where the batch processing model is treated as a special case of the streaming one (i.e., finite stream). Flink’s software stack includes the DataStream and DataSet APIs for processing infinite and finite data, respectively. Flink offers multiple operations on data streams or sets such as mapping, filtering, grouping, updating state, joining, defining windows, and aggregating.
The two main data abstractions of Flink are DataStream and DataSet, they represent read-only collections of data elements. The list of elements is bounded (i.e., finite) in DataSet, while it is unbounded (i.e., infinite) in the case of DataStream.
Flink programs are represented by a data-flow graph (i.e., directed acyclic graph — DAG) that gets executed on the Flink’s core, which is a distributed streaming dataflow engine. The data flow graphs are composed of stateful operators and intermediate data stream partitions. The execution of each operator is handled by multiple parallel instances whose number is determined by the parallelism level. Each parallel operator instance is executed in an independent task slot on a machine within a cluster of computers. The figure below shows an example of the data flow graph for Flink’s application.
Что делает видео по-настоящему запоминающимся? Наверное, та самая атмосфера, которая заставляет забыть о времени. Когда вы заходите на RUVIDEO, чтобы посмотреть онлайн «03 flink stream processing», вы рассчитываете на нечто большее, чем просто загрузку плеера. И мы это понимаем. Контент такого уровня заслуживает того, чтобы его смотрели в HD 1080, без дрожания картинки и бесконечного буферизации.
Честно говоря, Rutube сегодня — это кладезь уникальных находок, которые часто теряются в общем шуме. Мы же вытаскиваем на поверхность самое интересное. Будь то динамичный экшн, глубокий разбор темы от любимого автора или просто уютное видео для настроения — всё это доступно здесь бесплатно и без лишних формальностей. Никаких «заполните анкету, чтобы продолжить». Только вы, ваш экран и качественный поток.
Если вас зацепило это видео, не забудьте взглянуть на похожие материалы в блоке справа. Мы откалибровали наши алгоритмы так, чтобы они подбирали контент не просто «по тегам», а по настроению и смыслу. Ведь в конечном итоге, онлайн-кинотеатр — это не склад файлов, а место, где каждый вечер можно найти свою историю. Приятного вам отдыха на RUVIDEO!
Видео взято из открытых источников Rutube. Если вы правообладатель, обратитесь к первоисточнику.