Skip to content

marichalloran/IRS990

Repository files navigation

Recommended visualization format: *.ipynb Additional formats provided in *.py and *.PDF

This exercise has been divided in 3 files:

IRS990_Tech_Exercise_DATA_ACCESS (.pynb/.py) contains parts 1. Data Access and 2. Feature Extraction. In this file, I access AWS’s database for Form 990 filings, download select features as outlined in the python notebook annotation, and save the output as a csv file.

IRS990_Tech_Exercise_CLUSTERING (.pynb/.py) contains parts 3. Modeling and 4. Visualization, using a K-means algorithm to cluster the database, filtered for 501(c)(3) entities only, based on revenue, expenses and compensation data. Result visualization and analysis are also included.

Input folder contains csv files obtained from AWS (index of filings, and data frames build from XML data of the filings). Output folder contains PDF files generated from the codes above.

About

Clustering exercise using IRS 990 form

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors