Big Data Analytics Using Spark

Yoav Freund, UCSanDiegoX

Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform.

In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation.

The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.

You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).

In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.

What will you learn

  • Programming Spark using Pyspark
  • Identifying the computational tradeoffs in a Spark application
  • Performing data loading and cleaning using Spark and Parquet
  • Modeling data through statistical and machine learning methods

  • 3 September 2019
Course properties:
  • Free:
  • Paid:
  • Certificate:
  • MOOC:
  • Video:
  • Audio:
  • Email-course:
  • Language: English Gb


No reviews yet. Want to be the first?

Register to leave a review

More on this topic:
Cloud_applications_v01_600x340 Cloud Computing Applications
Learn how to use the cloud and write programs for data analytics. Learn about...
Large-icon Data Manipulation at Scale: Systems and Algorithms
Data analysis has replaced data acquisition as the bottleneck to evidence-based...
Big-data-_2_ Introduction to Big Data Analytics
********* A new, improved version of the Big Data Specialization will become...
Dat202.2x-course_card_image-378x225 Implementing Real-Time Analysis with Hadoop in Azure HDInsight
Learn how to use Hadoop technologies like HBase, Storm, and Spark in Microsoft...
464572_3f38_3 Big Data Analytics with Apache Spark and Python
Learn to use Apache Spark to store and analyze data in real time.
More from 'Mathematics, Statistics and Data Analysis':
92e82193-157a-4b63-81ed-c71549bc794a-255da6ecef5f.small Observation Theory: Estimating the Unknown
Learn how to estimate parameters from observational data for real-world engineering...
1cac89b9-58b6-4f8b-8cee-0a2f7feded60-92cd190ab11e.small Introduction to Analytics Modeling
Learn essential analytics models and methods and how to appropriately apply...
9b9bf897-7ad0-4687-99cf-4d790b281422-242df85ba419.small Computing for Data Analysis
A hands-on introduction to basic programming principles and practice relevant...
1e2cae8c-1c67-4067-a3c0-360543e6a9b8-babaaf5588d2.small Data Analytics for Business
This course prepares students to understand business analytics and become leaders...
04736418-d7e3-4063-89d6-42e7704f9bb1-1713212f8d2c.small Data Analytics and Visualization in Health Care
Learn best practices in data analytics, informatics, and visualization to gain...
More from 'edX':
910f3ce1-1ffb-4bc1-99ff-86f8d5471851-b0689ff3dc8a.small Data Structures and Software Design
Learn how to select, apply, and analyze the most appropriate data representations...
F8a1a729-7f5f-45d4-b131-350cd4e20fa3-a2c9fd1bb762.small 知识产权法律及实务|Big Data and Intellectual Property Law and Practice
懂得在中国如何运用和保护知识产权,为迎接知识经济时代的全球竞争做好准备。Understand how to use and protect intellectual...
4d3258fc-bcee-4c37-bf7a-9f22524bf4a7-ee314bada985.small Human-Computer Interaction I: Fundamentals & Design Principles
Learn the principles of Human-Computer Interaction to create intuitive, usable...
7138aabb-44bc-41c1-97ec-bca2ea95f5dc-c33778a8a76f.small Human-Computer Interaction II: Cognition, Context & Culture
Get into the user’s mind and understand the role of mental models and...
8a795319-4243-4956-af0d-f23ea93d8a2a-94dacb9a8f04.small Human-Computer Interaction III: Ethics, Needfinding & Prototyping
Build on your knowledge of HCI’s core principles by learning to design...

© 2013-2019