This project analyzes single-family properties in Philadelphia to explore how proximity to transit lines, crime incidents, and job locations influences sale prices. Using spatial data, the R script filters properties, calculates distances, computes metrics, runs regression models, and generates visualizations and reports.
- Data Processing: Loads and standardizes spatial shapefiles for properties, transit, crime, and jobs.
- Spatial Analysis: Filters properties within 800m of transit lines and calculates distances to key features.
- Metrics: Computes job accessibility and crime density, with transformations for analysis.
- Modeling: Runs regression models for properties built in 1990–2000 and 2010–2024.
- Outputs: Produces a Word document with regression tables, t-test results, and boxplot analysis, plus PNG visualizations.
- R: Version 4.0 or higher.
- R Packages:
sf,dplyr,ggplot2,sp,car,flextable,officer,broom,tidyr. - Shapefiles: Spatial data files (e.g.,
Opa_Properties_LivingAreas.shp,HighSpeed_Lines.shp, etc.) in a specified directory.
- Clone the repository:
https://github.com/ArnoldMuchene/Class_Project
This project provides an R script for spatial data analysis, focusing on property data processing, distance calculations, regression modeling, and visualization generation using shapefiles.
- Place Shapefiles: Store shapefiles in the
~/ArcGIS/Inputsdirectory or update the file path in the script (analysis.R). - Dependencies: Ensure R and required packages (e.g.,
sf,dplyr,ggplot2,stargazer) are installed. Install missing packages usinginstall.packages().
- Open
analysis.Rin RStudio or your preferred R environment. - Update the working directory in the
setwd()function to point to your shapefile location. - Run the script to:
- Load and preprocess spatial data.
- Filter properties and calculate distances (e.g., to transit).
- Generate metrics, regression models, and visualizations.
- Save outputs to the
outputs/directory.
- Word Document:
regression_tables_scaled_polynomial.docxcontaining formatted regression tables, t-test results, and boxplot analysis. - Visualizations:
filtered_properties_map.png: Map of filtered properties colored by sale price.boxplot_all_variables_faceted.png: Faceted boxplot comparing property characteristics by year built.
- Console Outputs: Summary tables for t-tests and distance-to-transit effects.
├── analysis.R # Main R script for the analysis
├── README.md # Project documentation
├── ArcGIS/Inputs/ # Directory for shapefiles (not included)
└── outputs/ # Directory for generated Word doc and PNGs
- Shapefile Paths: Update
setwd()andst_read()paths inanalysis.Rto match your data location. - Filters: Modify the 800m transit buffer or other distance conditions in the script.
- Models: Add or adjust predictors in regression models for alternative analyses.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/your-feature). - Commit changes (
git commit -m 'Add your feature'). - Push to the branch (
git push origin feature/your-feature). - Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or feedback, open an issue or contact arnoldnjengabiz@gmail.com
If you find this project useful, please give it a star on GitHub!
