Skip to content
View Anoushka210's full-sized avatar

Block or report Anoushka210

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Anoushka210/README.md

Hey, I'm Anoushka πŸ‘‹

BE Information Technology Β· 2nd Year Β· Passionate about Data

LinkedIn GitHub Resume


About Me

I'm a second-year IT student who loves turning raw data into meaningful stories. Currently diving deep into data analysis, visualisation & ML β€” making numbers make sense.

  • πŸŽ“ BE in Information Technology (Expected 2028)
  • πŸ“Š Focus: Data Analysis β€’ Data Engineering β€’ AI/ML
  • 🌱 Learning: PySpark, NLP, Data Pipelines
  • πŸ” Strong interest in real-world datasets & scalable systems

πŸ† Key Highlights

  • 🚚 Built ETL pipelines processing 100K+ records using PySpark
  • πŸ“Š Analyzed 20K+ global data points to uncover environmental trends
  • πŸ€– Developed chatbot with 100% query accuracy (10/10 test cases)
  • βš™οΈ Strong foundation in data structures, OOP, and system design

πŸ“Œ Data Focus

  • Data Cleaning & Preprocessing
  • Feature Engineering & Transformation
  • Exploratory Data Analysis (EDA)
  • Scalable Data Pipelines
  • Visualization & Insight Generation

πŸ›  Tech Stack

Languages

Python Java C++ JavaScript

Data & ML

Pandas Matplotlib Seaborn PySpark scikit-learn Jupyter

Tools

Git VS Code


πŸš€ Featured Projects

πŸ›’ Scalable E-Commerce ETL Pipeline

End-to-end PySpark-based ETL pipeline built on the Olist dataset.

  • Processed 100K+ orders across multiple tables
  • Designed modular pipeline for scalability
  • Implemented feature engineering for delivery performance
  • Stored optimized output in Parquet format

Tech: PySpark β€’ ETL β€’ Parquet β€’ Big Data


πŸ€– Smart FAQ Chatbot AI Agent

A retrieval-based chatbot using NLP techniques.

  • Implemented TF-IDF + Cosine Similarity
  • Built hybrid system for FAQs + small talk
  • Applied confidence threshold filtering
  • Visualized performance metrics

Tech: Python β€’ NLP β€’ scikit-learn β€’ Matplotlib


πŸ“Š Air Quality Data Analysis

EDA project analyzing pollution trends across global cities.

  • Worked on 20K+ records across 50+ cities
  • Identified seasonal PM2.5 trends
  • Created insightful visualizations

Tech: Pandas β€’ Matplotlib


πŸ“¦ Smart Inventory Management System

Java-based system using OOP principles.

  • Applied inheritance & polymorphism
  • Implemented file-based persistence
  • Automated stock tracking & reporting

Tech: Java β€’ OOP β€’ Serialization


⚑ Currently Working On

  • βš™οΈ Advanced data pipelines using PySpark
  • 🧠 NLP-based intelligent systems
  • πŸ—οΈ Data engineering concepts (warehousing, ETL optimization)
  • πŸ“ˆ Improving data visualization & storytelling

πŸ“Š GitHub Insights

GitHub Streak


πŸ“ˆ Contribution Activity

Activity Graph


⭐ If you like my work, consider giving a star to my repositories!

Pinned Loading

  1. inventory-management-java inventory-management-java Public

    A simple Java-based smart inventory management system demonstrating core object-oriented programming concepts and basic stock tracking logic.

    Java

  2. air-quality-analysis air-quality-analysis Public

    An Exploratory Data Analysis and Visualization (EDAV) project examining global air pollution patterns. Features rigorous data cleaning, advanced 3D visualizations, K-Means clustering for city profi…

    Jupyter Notebook

  3. Smart-FAQ-Chatbot-AI-Agent Smart-FAQ-Chatbot-AI-Agent Public

    An intelligent FAQ chatbot agent built with Python using TF-IDF Vectorization and Cosine Similarity for natural language query matching. Features include small-talk handling, confidence scoring, an…

    Python

  4. pyspark-ecommerce-etl-pipeline pyspark-ecommerce-etl-pipeline Public

    Production-style PySpark ETL pipeline processing 100K+ e-commerce records with optimized joins, feature engineering, and scalable Parquet outputs.

    Jupyter Notebook

  5. Srividhyambika/Question-paper-analyzer Srividhyambika/Question-paper-analyzer Public

    JavaScript