RUVIDEO
Поделитесь видео 🙏

Apache Spark Architecture#data

Spark Architecture

Get insights on the architecture of Spark.

We'll cover the following

Spark design
Driver
Executor
Cluster manager
Execution modes
Cluster Mode:
Client Mode:
Spark design
Spark is a distributed parallel data-processing framework and bears many similarities to the traditional MapReduce framework. Spark has the same leader-worker architecture as MapReduce, the leader process coordinates and distributes work to be performed among work processes. These two kinds of processes are formally called the driver and the executor.

Driver#
The driver is the leader process that manages the execution of a Spark job. It is responsible for maintaining the overall state of the Spark application, responding to a user's program or input and analyzing, distributing and scheduling work among executor processes. The driver process is in essence the heart of the Spark application and maintains all application related information during an application's lifetime.

Spark Driver converts Spark operations into DAG computations and schedules and distributes them as tasks across the Spark executors. The Spark Driver accesses the distributed components in the cluster, including the executors and the cluster manager, via the SparkSession. You can consider the SparkSession to be a single point of entry and access to all Spark operations and data. Through SparkSession we can read from data sources, write DataFrames or Datasets, create runtime JVM params, etc. In essence, SparkSession is the unified conduit to all of Spark functionality. If we are using the interactive spark-shell, the Spark driver instantiates the SparkSession for us, whereas if we are in a Spark application, we’ll create the SparkSession ourselves. We’ll look at examples of both in the lessons ahead.

Executor#
Executors are the worker processes that execute the code assigned to them by the driver process and report the state of the computation on that executor back to the driver.

Once the resources have been allocated, the Driver directly communicates with the executors. In most deployment modes a single executor runs per node. Spark executors are assigned tasks that require working on a subset of data located closest to them in the cluster. Working on data in close proximity is referred to as data locality and helps reduce the consumption of network bandwidth.
Drivers and Executors
Drivers and Executors
Cluster manager#
A MapReduce or a Spark job runs on a cluster of machines. MapReduce’s Application Master or Spark’s Driver process don’t have the authority or the ability to allocate cluster resources for job execution. Instead there’s another piece of software that manages the physical resources of the cluster and arbitrates them among jobs, usually based on some user-defined policy. The Spark driver has to negotiate resources with the cluster manager to launch executor processes. YARN is one such example of a cluster manager software. Spark is compatible with the following cluster managers:

Local mode

Built-in standalone cluster manager

Hadoop YARN

Kubernetes

Apache Mesos

Mode Spark Driver Spark Executor Cluster Manager
Local Runs on a single JVM like a laptop or single node Runs on the same JVM as the driver Runs on the same host
Standalone Can run on any node in the cluster Each node in the cluster will launch its own executor JVM Can be allocated arbitrarily to any host in the cluster
YARN (client) Runs on a client not part of the cluster YARN’s NodeManager’s container YARN’s Resource Manager works with YARN’s Application Master to allocate the containers on NodeManagers for executors
YARN (cluster) Runs with YARN Application Master Same as YARN client mode Same as YARN client mode
Kubernetes Runs in a Kubernetes pod Each worker runs within its own pod Kubernetes Master

Execution modes
There are two deployment modes in which Spark can execute. A deployment mode refers to the location where the spark driver program will run. The driver can run either in the cluster or outside of the cluster. The two modes are discussed below:

Cluster Mode:
In cluster mode a spark application (a Java .jar file, Python, or R script) is submitted to the cluster manager by the user. The manager in turn spawns the driver and the executor processes on worker nodes to execute the job. In this setting both the driver and the executors live inside the cluster.
Client Mode:
The client mode is similar to the cluster mode except that the driver process lives on the client machine used to submit the Spark job outside the cluster. The machine hosting the driver process isn’t colocated on the cluster running the executor processes. The client machine is responsible for maintaining the driver process and the cluster is responsible for maintaining the executor processes.

Что делает видео по-настоящему запоминающимся? Наверное, та самая атмосфера, которая заставляет забыть о времени. Когда вы заходите на RUVIDEO, чтобы посмотреть онлайн «Apache Spark Architecture#data», вы рассчитываете на нечто большее, чем просто загрузку плеера. И мы это понимаем. Контент такого уровня заслуживает того, чтобы его смотрели в HD 1080, без дрожания картинки и бесконечного буферизации.

Честно говоря, Rutube сегодня — это кладезь уникальных находок, которые часто теряются в общем шуме. Мы же вытаскиваем на поверхность самое интересное. Будь то динамичный экшн, глубокий разбор темы от любимого автора или просто уютное видео для настроения — всё это доступно здесь бесплатно и без лишних формальностей. Никаких «заполните анкету, чтобы продолжить». Только вы, ваш экран и качественный поток.

Если вас зацепило это видео, не забудьте взглянуть на похожие материалы в блоке справа. Мы откалибровали наши алгоритмы так, чтобы они подбирали контент не просто «по тегам», а по настроению и смыслу. Ведь в конечном итоге, онлайн-кинотеатр — это не склад файлов, а место, где каждый вечер можно найти свою историю. Приятного вам отдыха на RUVIDEO!

Видео взято из открытых источников Rutube. Если вы правообладатель, обратитесь к первоисточнику.