Share

Introduction to Apache Spark

Author: neptune | 09th-May-2022 | views: 205
#Python #Apache Spark

What is Spark?

Apache Spark is an open-source, distributed processing system that utilizes in-memory caching and optimized query execution for faster queries.

All about Spark

Python is generally slower than Scala while Java is too verbose and does not support Read-Evaluate-Print-Loop (REPL).

Apache Spark currently supports multiple programming languages, including Java, Scala, R, and Python. The final language is chosen based on the efficiency of the functional solutions to tasks, but most developers prefer Scala.




Applications of Spark

  • Interactive analysis – MapReduce supports batch processing, whereas Apache Spark processes data quicker and thereby processes exploratory queries without sampling.

  • Event detection – Streaming functionality of Spark permits organizations to monitor unusual behaviors for protecting systems. Health/security organizations and financial institutions utilize triggers to detect potential risks.

  • Machine Learning – Apache Spark is provided with a scalable Machine Learning Library named as MLlib, which executes advanced analytics on iterative problems. Few of the critical analytics jobs such as sentiment analysis, customer segmentation, and predictive analysis make Spark an intelligent technology.

What is RDD's?

The key idea of spark is Resilient Distributed Datasets (RDD); it supports in-memory processing computation. 

This means, it stores the state of memory as an object across the jobs and the object is shareable between those jobs. Data sharing in memory is 10 to 100 times faster than network and Disk.

  • Resilient distributed datasets (RDDs) are known as the main abstraction in Spark.

  • It is a partitioned collection of objects spread across a cluster and can be persisted in memory or on a disk.

  • Once created, RDDs are immutable.


Features of RDDs

  • Resilient, i.e. tolerant to faults using RDD lineage graph and therefore ready to recompute damaged or missing partitions due to node failures.

  • Dataset - A set of partitioned data with primitive values or values of values, For example, records or tuples.

  • Distributed with data remaining on multiple nodes in a cluster.

RDD Operations

  • flatMap, map,reduceByKey, and saveAsTextFile are the operations on the RDDs.

  • Count, Collect, Reduce, Take, and First are a few actions in spark.

  • foreach(func), saveAsTextFile(path) are also examples of Actions.

What is Lazy Evaluation?

When we call a transformation on RDD’s, the operation is not immediately executed. Alternatively, Spark internally records meta-data to show this operation has been requested. It is called Lazy evaluation.

DataFrame in Spark

DataFrames can be created from a wide array of sources like existing RDDs, external databases, tables in Hive, or structured data files.


Thanks for Reading !!!





anonymous | May 19, 2022, 5:23 p.m.

👍



Related Blogs
How to extract Speech from Video using Python?
Author: neptune | 29th-Aug-2022 | views: 2346
#Python
Simple and easy way to convert video into audio then text using Google Speech Recognition API...
How to download video from youtube using python module ?
Author: neptune | 22nd-May-2022 | views: 1433
#Python
We will let you know how you can easily download the Youtube high quality videos along with subtitle, thumbnail, description using python package..
Mostly asked Python Interview Questions - 2022.
Author: neptune | 25th-May-2022 | views: 855
#Python #Interview
Python interview questions for freshers. These questions asked in 2022 Python interviews...
Python 3.9 new amazing features ?
Author: neptune | 22nd-May-2022 | views: 739
#Python
We are going to explore the newest features of Python 3.9 really amazing features here list of some features like Dict union, Type hinting etc. with examples...
Python Built-in functions lambda, map, filter, reduce.
Author: neptune | 22nd-May-2022 | views: 693
#Python
We are going to explore in deep some important Python build-in functions lambda, map, filter and reduce with examples...
Best Python package manager and package for virtual environment ?
Author: neptune | 15th-Apr-2022 | views: 627
#Python #Anaconda #Virtualenv #Pip
Which is the best package manager for python and Virtual environment management using Virtualenv and Anaconda...
Will, AI kills Developer's jobs?
Author: neptune | 22nd-May-2022 | views: 511
#Python #Machine learning #AI
GPT-3’s performance has convinced that Artificial intelligence is closer or at least AI-generated code is closer than we think. It generates imaginative, insightful, deep, and even excellent content...
How to reverse string in Python ?
Author: neptune | 16th-May-2022 | views: 511
#Python
We are going to explore different ways to reverse string in Python...
Do you know Jupyter is now full-fledged IDE?
Author: neptune | 15th-Apr-2022 | views: 425
#Python #Jupyter
Jupyter is a widely used tool by Data scientists. So developers from institutions like Two Sigma, Bloomberg and fast.ai convert it into IDE lets see..
What exactly you can do with Python?
Author: neptune | 15th-Apr-2022 | views: 372
#Python
Well, it's a tricky question to answer because there are lots of application of Python. But, I will tell you about 3 main applications of Python...
Deploy Django project on AWS with Apache2 and mod_wsgi module.
Author: neptune | 22nd-May-2022 | views: 321
#Python #Django
In this blog I use the AWS Ubuntu 18.22 instance as Hosting platform and used Apache2 server with mod_wsgi for configurations. We create a django sample project then configure server...
Core Python Syllabus for Interviews
Author: neptune | 11th-Jun-2022 | views: 292
#Python #Interview
STRING MANIPULATION : Introduction to Python String, Accessing Individual Elements, String Operators, String Slices, String Functions and Methods...
Datatypes in Python.
Author: neptune | 22nd-May-2022 | views: 192
#Python
Python have different types of datatypes like Numbers, Strings, Lists, Tuples, Dictionary, Set, Frozenset, Bool, Mutable, and Immutable...
Input and Output in Python
Author: neptune | 15th-Jun-2022 | views: 121
#Python
In this article, we will see how Python take input from user and How it display the output to user. First we cover input then output...
TOP 10 PYTHON DEVELOPER SKILLS TO GET HIRED BY FAANG COMPANIES
Author: neptune | 07th-Jul-2022 | views: 94
#Python
In this article, we will explore top 10 Python skills required to get hired in worlds top most companies. The FAANG companies which are Facebook, Amazon, Apple, Netflix, and Google...
Classes and Objects in Python 3 | OOP's
Author: neptune | 31st-Aug-2022 | views: 78
#Python
Object-oriented programming can model real-life scenarios and suit the development of large and complex applications...
Inheritance in Python | OOP's
Author: neptune | 05th-Sep-2022 | views: 24
#Python
Inheritance describes is a kind of relationship between two or more classes, abstracting common details into superclass and storing specific ones in the subclass...
View More