Anoushka Karra Anoushka210

Hey, I'm Anoushka 👋

BE Information Technology · 2nd Year · Passionate about Data

About Me

I'm a second-year IT student who loves turning raw data into meaningful stories. Currently diving deep into data analysis, visualisation & ML — making numbers make sense.

🎓 BE in Information Technology (Expected 2028)
📊 Focus: Data Analysis • Data Engineering • AI/ML
🌱 Learning: PySpark, NLP, Data Pipelines
🔍 Strong interest in real-world datasets & scalable systems

🏆 Key Highlights

🚚 Built ETL pipelines processing 100K+ records using PySpark
📊 Analyzed 20K+ global data points to uncover environmental trends
🤖 Developed chatbot with 100% query accuracy (10/10 test cases)
⚙️ Strong foundation in data structures, OOP, and system design

📌 Data Focus

Data Cleaning & Preprocessing
Feature Engineering & Transformation
Exploratory Data Analysis (EDA)
Scalable Data Pipelines
Visualization & Insight Generation

🛠 Tech Stack

Languages

Data & ML

Tools

🚀 Featured Projects

🛒 Scalable E-Commerce ETL Pipeline

End-to-end PySpark-based ETL pipeline built on the Olist dataset.

Processed 100K+ orders across multiple tables
Designed modular pipeline for scalability
Implemented feature engineering for delivery performance
Stored optimized output in Parquet format

Tech: PySpark • ETL • Parquet • Big Data

🤖 Smart FAQ Chatbot AI Agent

A retrieval-based chatbot using NLP techniques.

Implemented TF-IDF + Cosine Similarity
Built hybrid system for FAQs + small talk
Applied confidence threshold filtering
Visualized performance metrics

Tech: Python • NLP • scikit-learn • Matplotlib

📊 Air Quality Data Analysis

EDA project analyzing pollution trends across global cities.

Worked on 20K+ records across 50+ cities
Identified seasonal PM2.5 trends
Created insightful visualizations

Tech: Pandas • Matplotlib

📦 Smart Inventory Management System

Java-based system using OOP principles.

Applied inheritance & polymorphism
Implemented file-based persistence
Automated stock tracking & reporting

Tech: Java • OOP • Serialization

⚡ Currently Working On

⚙️ Advanced data pipelines using PySpark
🧠 NLP-based intelligent systems
🏗️ Data engineering concepts (warehousing, ETL optimization)
📈 Improving data visualization & storytelling

📊 GitHub Insights

📈 Contribution Activity

⭐ If you like my work, consider giving a star to my repositories!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly