Skip to content

Fixed issues and Implementations#2

Open
Bamdad-Mehrvarzan wants to merge 1 commit into
PRAISELab-PicusLab:mainfrom
Bamdad-Mehrvarzan:main
Open

Fixed issues and Implementations#2
Bamdad-Mehrvarzan wants to merge 1 commit into
PRAISELab-PicusLab:mainfrom
Bamdad-Mehrvarzan:main

Conversation

@Bamdad-Mehrvarzan
Copy link
Copy Markdown

OpenAlex ETL Pipeline Implementation & Dashboard Compatibility Patches

1. Architectural Overview (Object-Oriented Design)

We have successfully engineered a robust, production-ready ETL pipeline tailored for the OpenAlex API, mapped to the legacy Web of Science (WoS) target schema. The architecture strictly follows object-oriented principles:

  • Dispatcher Pattern: Orchestrates the lifecycle of the ETL pipeline (convert2df_api), dynamically decoupling extraction, transformation, and validation layers.
  • Mapping Dictionaries: Embedded in the OpenAlexTransformer to seamlessly convert nested JSON responses (e.g., authorships, institutions, inverted abstracts index) into normalized WoS flat columns.
  • Type Contracts: Enforced defensive type checking via a specialized safe_get mechanism, shielding the analytical engine from API data fragility and structural payload changes.

2. Upstream Bug Fixes & Analytical Patches

In strict compliance with the Base Level requirements (Section 3), we reversed-engineered and hot-fixed critical bugs within the legacy analytical dashboard that caused interface crashes:

  1. Global UI DataGrid Table Crash (app.py): Fixed a hardcoded CSS syntax error where style="width=100%;" was used instead of style="width:100%;". This typo originally caused a ValueError during internal index unpacking inside the itables library.
  2. Database Identifier Integration (app.py & cocmatrix.py): Patched the strict case-sensitive evaluation blocks. The dashboard loader automatically overwrites the DB flag to "ISI" in memory, which caused the strict if db == "Web_of_Science" condition to reject the dataset and throw a 'NoneType' object has no attribute 'columns' error. Rewrote the conditional mapping to securely accommodate both "WEB_OF_SCIENCE" and "ISI" tokens.

3. Verification & Validation Evidence

  • Pipeline Execution: Successfully collected and normalized 50 records from OpenAlex API under the machine learning query term.
  • UI Integrity: Verified that the primary metadata tables, productive authors charts, and concept clouds render flawlessly with zero runtime exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant