📌 Course Information | 🏛 UC3M
README Version in Spanish/Castilian (versión en español/castellano)
This repository contains my work for the Web Analytics course at Universidad Carlos III de Madrid (UC3M), which I completed while studying abroad from September 2024 to December 2024 as part of my Computer Science degree. The course focused on data retrieval, processing, and analysis using APIs, web scraping, and data visualization techniques. At UC3M, the course is part of the Data Science and Engineering degree program.
Throughout the course, I primarily worked in a group of three students, collaborating on projects and labs that explore real-world datasets, automation, and analytical techniques.
This repository includes 7 major labs and projects, each applying key data science and web analytics concepts:
✔ Lab Goal: Learn fundamental web scraping concepts and ethical data extraction.
✔ Key Topics:
- Understanding HTML structure and CSS selectors.
- Using Requests and BeautifulSoup for extracting text and structured data.
- Parsing tables and lists into structured formats like CSV and JSON.
- Respecting
robots.txtand ethical web scraping guidelines.
📌 View Lab Notebook ➝ Introduction_to_Web_Scraping_Lab.ipynb
✔ Lab Goal: Extract and parse structured data from websites.
✔ Key Topics:
- Using
requestsandBeautifulSoupfor web scraping. - Navigating HTML DOM structures to extract information.
- Implementing data cleaning techniques for web data.
- Ethical considerations of web scraping.
📌 View Lab Notebook ➝ Beautiful_Soup_Lab.ipynb
✔ Lab Goal: Automate browser interactions and scrape dynamic content from JavaScript-heavy websites.
✔ Key Topics:
- Selenium WebDriver for browser automation.
- Interacting with JavaScript-rendered elements and AJAX-loaded data.
- Handling cookies, login authentication, and form submissions.
- Extracting live data from job postings, e-commerce sites, and dynamic tables.
📌 View Lab Notebook ➝ Selenium_Lab.ipynb
✔ Lab Goal: Retrieve and analyze real-world economic indicators from the World Bank API.
✔ Key Topics:
- API-based data extraction using
requests. - Fetching and processing JSON responses.
- Analyzing global economic trends (e.g., GDP, CO₂ emissions, income distribution).
- Ranking countries by CO₂ emissions, GDP growth, and population statistics.
📌 View Lab Notebook ➝ Worldbank_API_Lab.ipynb
✔ Lab Goal: Apply graph theory concepts in network analysis.
✔ Key Topics:
- Graph structures: Nodes, edges, adjacency matrices.
- Shortest path algorithms: Dijkstra's and A* search.
- Network centrality: Degree, betweenness, closeness.
- Graph-based analytics for social networks and web applications.
📌 View Lab Notebook ➝ Graph_Theory_Lab.ipynb
✔ Lab Goal: Explore techniques for data visualization to effectively communicate insights.
✔ Key Topics:
- Creating interactive and static visualizations.
- Using Matplotlib, Seaborn, and Plotly for advanced plotting.
- Geospatial visualization techniques.
- Applying best practices for data presentation.
📌 View Lab Notebook ➝ Data_Visualization_Lab.ipynb
✔ Project Goal: Develop a job recommendation system using TF-IDF and Cosine Similarity to match users with job listings.
✔ Key Topics:
- Adzuna API: Extracting job listings dynamically.
- TF-IDF and Cosine Similarity: Ranking job relevance.
- Data preprocessing: Handling missing values, text tokenization.
- Historical salary trends analysis for job categories.
- Data visualization of job trends and market demand.
- Collaboration: This project was completed as a group assignment and required extensive teamwork.
Interactive and Dynamic Job Map Visualizations:
📌 View Project Notebook ➝ Web_Analytics_Final_Project.ipynb
📌 View Project PowerPoint Presentation ➝ Web Analytics Final Project PowerPoint Presentation.pptx
📌 View Project PDF Presentation ➝ Web Analytics Final Project Presentation.pdf
- Programming Language: Python
- Libraries & Tools:
requests,BeautifulSoup- Web scrapingSelenium WebDriver- Browser automationmatplotlib,seaborn,plotly- Data visualizationscikit-learn,nltk,pandas,numpy- Data analysis & MLnetworkx- Graph TheoryTF-IDF&Cosine Similarity- Job recommendation system
- Open Jupyter Notebook or Google Colab to explore the .ipynb files!
- Some input data files are not included in this repository. If you need access to these files or would like a working demonstration of the code, please contact me through my personal website at Marcos-Sanson.github.io

