-
Notifications
You must be signed in to change notification settings - Fork 0
This repository contains Python and SQL-based assignments and projects focused on data analysis and manipulation. It includes Jupyter Notebooks demonstrating various techniques such as data cleaning, visualization, and integration of Python with SQL queries. It is designed as a practical resource for learning and practicing data analysis skills.
HumaArslan/piit
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
# Jupyter Notebook/Python
- The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
- Jupyter has support for over 40 different programming languages, and Python is one of them.
- Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook itself.
Data Analysis
Load data from different sources
Save data into different formats
- Understand the data structures:
> Lists:
- Ordered, mutable collections of items.
- Can store heterogeneous data types.
- Created using square brackets [].
Example: my_list = [1, "hello", 3.14]
> Tuples:
- Ordered, immutable collections of items.
- Can store heterogeneous data types.
- Created using parentheses ().
Example: my_tuple = (1, "world", 2.71)
> Sets:
- Unordered, mutable collections of unique items.
- Used for membership testing and eliminating duplicate entries.
- Created using curly braces {} or the set() constructor.
Example: my_set = {1, 2, 3, 2} (will result in {1, 2, 3})
> Dictionaries:
- Unordered, mutable collections of key-value pairs.
- Keys must be unique and immutable; values can be of any type.
- Created using curly braces {} with key-value pairs separated by colons.
Example: my_dict = {"name": "Alice", "age": 30}
Data Cleaning in Python:
> Remove Unwanted Observations: Eliminate duplicates, irrelevant entries or redundant data that add noise.
> Fix Structural Errors: Standardize data formats and variable types for consistency.
> Manage Outliers: Detect and handle extreme values that can skew results, either by removal or transformation.
> Handle Missing Data: Address gaps using imputation, deletion or advanced techniques to maintain accuracy and integrity.
- sort
- filter - conditions; search for string
- slicing - select rows, cols on conditions
- merge data - row, col
- join
- visualise- bar, pie, presentation, storytelling
- outliers
- groupby summaries, pivot
- rotate data - long, wide
- statistics - mean, median, mode, std, skewness, kurtosis, correlation, covariance
- data distribution - normal,
- sampling
About
This repository contains Python and SQL-based assignments and projects focused on data analysis and manipulation. It includes Jupyter Notebooks demonstrating various techniques such as data cleaning, visualization, and integration of Python with SQL queries. It is designed as a practical resource for learning and practicing data analysis skills.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published