Skip to content

Version of CIDER for GiveDirectly <> Safaricom <> IDi engagement

License

Notifications You must be signed in to change notification settings

IDinsight/cider

 
 

Repository files navigation

cider

poverty prediction and targeting with mobile phone metadata

Local installation Deployment

Pre-installation requirements:

  1. To install, and manage dependencies and virtual environments this project uses uv. Follow the instructions to install uv.
  2. Install openjdk-17. Follow the instructions here for your operating system. This is required to allow PySpark to run. You may also need to update the JAVA_HOME environment variable to point to your Java installation. For example, on macOS with Homebrew, you can add the following lines to your shell profile (e.g., .bash_profile, .zshrc):
    export JAVA_HOME=$(/usr/libexec/java_home -v 17)
    export PATH=$JAVA_HOME/bin:$PATH
  3. Some OS-specific dependencies may not be installed by default, for geographic packages (geopandas, geovoronoi, etc). In these cases, you may need to install GDAL.

Installation steps:

  1. Clone the repository: git clone https://github.com/IDinsight/cider.git
  2. From the root directory make fresh-env: this will establish a venv with all the needed dependencies.
  3. Once your venv is made you can use uv run [command] to run a single CLI command inside the venv.
  4. Run make test from the root of the repository to verify that all required modules run.
  5. Run the notebooks/demo_pipeline.ipynb notebook to see a demo of the full pipeline from generating synthetic data to assessing model fit.

Contributing

  1. Before working on any code, please create an issue with a description of the feature or bug fix you plan to implement and appropriate tags, and get approval from a project maintainer.
  2. Please create a new branch for your work, and raise a PR that references the issue when you are ready for review.
  3. Please ensure that your code adheres to the existing coding style and includes appropriate tests for any functionality you implement.
    • Please run make test to ensure that all tests pass before submitting your PR.
    • Make sure you have installed the pre-commit hooks by running pre-commit install so that your code is automatically checked for style issues before committing.
    • Make sure to update the README and any relevant documentation to reflect your changes.
    • Make sure to commit the pyproject.yaml and uv.lock files if you have made any changes to the dependencies (please note, dependency changes should be minimal, well-justified and ONLY through uv update or uv add *).
    • Bonus: add a notebook in the notebooks/ folder that demonstrates the functionality you have added or modified. If you do, and/or have run other notebooks, please remember to run make clear-nb before committing to clear output cells.
  4. Please make sure to fill in the PR template correctly, including linking to the relevant issue, describing the changes made, and any additional context that may be helpful for reviewers. Please also tag one of the project maintainers for review.

Folder structure

Cleaned code

  • src/cider: cleaned / updated cider codesource
  • tests/: unit tests for cleaned code in src/cider
  • notebooks/: Jupyter notebooks for analysis and exploration with cleaned code

Legacy code (TO BE DEPRECATED SOON)

  • deprecated/: old code that is no longer in use but kept for reference
  • old_notebooks/: old notebooks that are no longer in use but kept for reference
  • synthetic_data/: synthetic data generation scripts and generated data for testing and development purposes
  • configs/: configuration files for various environments and settings

OLD README BELOW - TO BE DELETED SOON

Documentation

Visit cider's documentation.

Deployment

To install, and manage dependencies and virtual environments this project uses uv. Follow the instructions to install uv.

From the root directory uv update followed by uv install, this will establish a venv with all the needed dependencies.

Once your venv is made you can use uv run [command] to run a single CLI command inside the venv.

Helper Functions

To support some helper functions that are portable across operating systems we use make. There are many implementations of this functionality for all operating systems. Once you have downloaded one that suits you and setup uv you can run:

  • make test [paths] to run all pytests
  • make clear-nb to clear the results out of notebooks before committing them back to the repo. This helps avoid bloat from binary blobs, and keeps the changes to notebooks readable in diff tools.

Contributing

Before contributing code please:

  • Run make clear-nb if you have made any changes to Jupyter notebooks you would like to commit.
  • Run pre-commit install to install pre-commit hooks that will run on every git commit to check code quality.
  • Run uv update if you made any changes to the dependencies. This will regenerate the poetry.lock file.
  • Run make test and verify that the tests still pass. If they fail, confirm if they fail on master before assuming your code broke them.

Testing

For testing we use pytest. Some guidelines:

  • In any directory with source code there should be a tests folder that contains files that begin with test_ e.g. test_foo_bar_file.py.
  • Within each test file each function that is a test should start with the word test, in source code no function should start with the word test_.
  • We should attempt to write unit tests wherever possible. These are minimal tests that confirm the functionality of one layer of abstraction. We should use the unittest.mock standard python library to make mock objects if one layer of unit tests requires interaction with objects from a different layer of abstraction. This ensures the tests are fast, and it decouples the pieces of code making test failures more meaningful as the failure will likely be contained in unit tests whereas one failure would propigate to cause cascading failures in integration tests.i
  • We can write integration or smoke tests which attempt to run the code end-to-end. These should not be exhaustive and all such tests should take less than a few minutes to run total.
  • Developers should be familiar with the pytest concepts of fixtures to make the setup for tests repeatable, parametrize to make a large number of variations on the same test, and pytest.raises to check that the correct type of errors are thrown when they should be.

License

Copyright ©2022-2023. The Regents of the University of California (Regents). All Rights Reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

About

Version of CIDER for GiveDirectly <> Safaricom <> IDi engagement

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 78.6%
  • Python 21.4%