Loading Events
  • This event has passed.

CANCELLED: SparkSQL and DataFrames with PySpark

April 17 @ 2:00 pm - 4:30 pm

Modern Languages Building (MLB), Room 2001A

THIS WORKSHOP HAS BEEN CANCELLED

Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. Industry has quickly adopted Spark and deployed it at scale for processing big data. Its main advantage include in-memory processing and a rich set of operations for wrangling data using DataFrames. In this workshop, we’ll introduce attendees to SparkSQL and DataFrames for basic data manipulation, file I/O and SQL querying. Spark has language bindings to R, Python, Scala and Java. We’ll be using PySpark (the Python API) in our workshop. The workshop is intended for users with intermediate knowledge of R, Python, or comparable language. Attendees should be familiar with DataFrames in Python (pandas) or R (dplyr).

​ Attendees will need to have a Flux account beforehand to participate. You can apply for one at https://goo.gl/thTZHx

Details

Date:
April 17
Time:
2:00 pm - 4:30 pm
Event Categories:
,
Event Tags:
, ,

Organizer

CSCAR
Email:
cscar@umich.edu
Website:
cscar.research.umich.edu

Other

Class size
27
Presenter(s)
Alex Cao, CSCAR