MSBA 6330 Big Data Analytics (Fall 2022)

This repository servers as a public homepage for MSBA 6330 - Big Data Analytics at Carlson School, University of Minnesota. It also hosts the syllabus and FAQs.

Syllabus

Syllabus

Homework Related Instructions & Questions

Computing environment

How to update the Vagrant folder

Install Cloudera VM on your own computer
Install Spark Updates on Cloudera VM: if you run into issues, here are suggestions for diagnosis

Hadoop

Hive

How to Fix Hive metastore db error

Databricks & Spark

Install Apache Spark on your own computer: Install Apache Spark Virtual Machine with many featrues on your own computer (but no hadoop or Hive). This may take a while (e.g. 1 hour) to have everything ready.
Use DataBricks Community Edition for Spark: Databricks provides a single node spark cluster for free. It is quite easy to start it with a Jupyter note environment.
Mount s3 folder in Databricks

Common Issues with Running PySpark: Addresses a few common issues with running PySpark on Cloudera VM.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
faqs		faqs
.gitignore		.gitignore
README.md		README.md
faqs.md		faqs.md
intermediate.html		intermediate.html
syllabus.md		syllabus.md
syllabus.pdf		syllabus.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSBA 6330 Big Data Analytics (Fall 2022)

Syllabus

Homework Related Instructions & Questions

Computing environment

Hadoop

Hive

Databricks & Spark

MISC

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MSBA 6330 Big Data Analytics (Fall 2022)

Syllabus

Homework Related Instructions & Questions

Computing environment

Hadoop

Hive

Databricks & Spark

MISC

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages