Big Data Analytics Using Spark

Yoav Freund, UCSanDiegoX

Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform.

In data science, data is called "big" if it cannot fit into the memory of a single standard laptop or workstation.

The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.

You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).

In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.

What will you learn

  • Programming Spark using Pyspark
  • Identifying the computational tradeoffs in a Spark application
  • Performing data loading and cleaning using Spark and Parquet
  • Modeling data through statistical and machine learning methods

Dates:
  • 25 August 2020
Course properties:
  • Free:
  • Paid:
  • Certificate:
  • MOOC:
  • Video:
  • Audio:
  • Email-course:
  • Language: English Gb

Reviews

No reviews yet. Want to be the first?

Register to leave a review

Show?id=n3eliycplgk&bids=695438
NVIDIA
More on this topic:
Cloud_applications_v01_600x340 Cloud Computing Applications
Learn how to use the cloud and write programs for data analytics. Learn about...
Large-icon Data Manipulation at Scale: Systems and Algorithms
Data analysis has replaced data acquisition as the bottleneck to evidence-based...
Big-data-_2_ Introduction to Big Data Analytics
********* A new, improved version of the Big Data Specialization will become...
Dat202.2x-course_card_image-378x225 Implementing Real-Time Analysis with Hadoop in Azure HDInsight
Learn how to use Hadoop technologies like HBase, Storm, and Spark in Microsoft...
464572_3f38_3 Big Data Analytics with Apache Spark and Python
Learn to use Apache Spark to store and analyze data in real time.
More from 'Computer Science':
Mooc%20icon Investigative Journalism for the Digital Age
Learn from some of the best investigative journalism instructors in the United...
A7f1e3cc-898d-4ee0-90d2-2f0b489e2346-9ce0c8c2541f.small Health Informatics for better and safer healthcare
Learn when and how Informatics is used in different fields of healthcare, what...
B078396d-4d89-45e0-9e03-13b308538f63-f068c8842ab9.small Achieving Product-Market Fit
Learn how to truly know your target customer, your customer’s underserved needs...
84acacf0-92ef-4be7-8044-a60df14282e4-8541828c8675.small Migrating to the AWS Cloud
Learn how to migrate your on-premises applications and workloads to the AWS...
956319ec-8665-4039-8bc6-32c9a9aea5e9-885268c71902.small Introduction to Computer Science and Programming Using Python
An introduction to computer science as a tool to solve real-world analytical...
More from 'edX':
A10b0ef9-7081-45f4-8a3a-b4ba9dae3707-cea6d26e4718.small Strategic Management: From Intuition to Insight
Help your organization prosper in times of transition with engaging in rigorous...
62591cf0-c3b2-4637-a5ee-f2dda85f4826-0826af963023.small Electricity and Magnetism: Maxwell's Equations
In this final part of 8.02, we will cover Faraday’s Law, Circuits with Inductors...
3b2cc8be-f38d-40b3-9b3d-dc9a8aac6584-4e101fc250c9.small High Conflict in Law: An Introduction
Develop your toolkit of techniques for dealing with high-conflict behaviours...
De2fa4fb-2a12-4824-9184-3e5e04fa0e7a-29cb3a414e0c.small Anatomy: Musculoskeletal and Integumentary Systems
Learn about the integumentary system (skin, hair, nails, and glands), and how...
37681467-8cc2-49ee-9656-cb6943fe3859-dd5a78dd281c.small Effective Communication for Program and Project Stakeholders and Teams
Go beyond the communication methods you learned in the project management training...

© 2013-2019