Pre-installation requirements:
- To install, and manage dependencies and virtual environments this project uses
uv. Follow the instructions to installuv. - Install
openjdk-17. Follow the instructions here for your operating system. This is required to allow PySpark to run. You may also need to update theJAVA_HOMEenvironment variable to point to your Java installation. For example, on macOS with Homebrew, you can add the following lines to your shell profile (e.g.,.bash_profile,.zshrc):export JAVA_HOME=$(/usr/libexec/java_home -v 17) export PATH=$JAVA_HOME/bin:$PATH
- Some OS-specific dependencies may not be installed by default, for geographic packages (
geopandas,geovoronoi, etc). In these cases, you may need to install GDAL.
Installation steps:
- Clone the repository:
git clone https://github.com/IDinsight/cider.git - From the root directory
make fresh-env: this will establish a venv with all the needed dependencies. - Once your venv is made you can use
uv run [command]to run a single CLI command inside the venv. - Run
make testfrom the root of the repository to verify that all required modules run. - Run the
notebooks/demo_pipeline.ipynbnotebook to see a demo of the full pipeline from generating synthetic data to assessing model fit.
- Before working on any code, please create an issue with a description of the feature or bug fix you plan to implement and appropriate tags, and get approval from a project maintainer.
- Please create a new branch for your work, and raise a PR that references the issue when you are ready for review.
- Please ensure that your code adheres to the existing coding style and includes appropriate tests for any functionality you implement.
- Please run
make testto ensure that all tests pass before submitting your PR. - Make sure you have installed the pre-commit hooks by running
pre-commit installso that your code is automatically checked for style issues before committing. - Make sure to update the README and any relevant documentation to reflect your changes.
- Make sure to commit the
pyproject.yamlanduv.lockfiles if you have made any changes to the dependencies (please note, dependency changes should be minimal, well-justified and ONLY throughuv updateoruv add *). - Bonus: add a notebook in the
notebooks/folder that demonstrates the functionality you have added or modified. If you do, and/or have run other notebooks, please remember to runmake clear-nbbefore committing to clear output cells.
- Please run
- Please make sure to fill in the PR template correctly, including linking to the relevant issue, describing the changes made, and any additional context that may be helpful for reviewers. Please also tag one of the project maintainers for review.
Cleaned code
src/cider: cleaned / updated cider codesourcetests/: unit tests for cleaned code insrc/cidernotebooks/: Jupyter notebooks for analysis and exploration with cleaned code
Legacy code (TO BE DEPRECATED SOON)
deprecated/: old code that is no longer in use but kept for referenceold_notebooks/: old notebooks that are no longer in use but kept for referencesynthetic_data/: synthetic data generation scripts and generated data for testing and development purposesconfigs/: configuration files for various environments and settings
Visit cider's documentation.
To install, and manage dependencies and virtual environments this project uses uv. Follow the instructions to install uv.
From the root directory uv update followed by uv install, this will establish a venv with all the needed dependencies.
Once your venv is made you can use uv run [command] to run a single CLI command inside the venv.
To support some helper functions that are portable across operating systems we use make. There are many implementations of this functionality for all
operating systems. Once you have downloaded one that suits you and setup uv you can run:
make test [paths]to run all pytestsmake clear-nbto clear the results out of notebooks before committing them back to the repo. This helps avoid bloat from binary blobs, and keeps the changes to notebooks readable in diff tools.
Before contributing code please:
- Run
make clear-nbif you have made any changes to Jupyter notebooks you would like to commit. - Run
pre-commit installto install pre-commit hooks that will run on every git commit to check code quality. - Run
uv updateif you made any changes to the dependencies. This will regenerate thepoetry.lockfile. - Run
make testand verify that the tests still pass. If they fail, confirm if they fail on master before assuming your code broke them.
For testing we use pytest. Some guidelines:
- In any directory with source code there should be a
testsfolder that contains files that begin withtest_e.g.test_foo_bar_file.py. - Within each test file each function that is a test should start with the word
test, in source code no function should start with the wordtest_. - We should attempt to write unit tests wherever possible. These are minimal tests that confirm the functionality of one layer of abstraction. We should use the
unittest.mockstandard python library to make mock objects if one layer of unit tests requires interaction with objects from a different layer of abstraction. This ensures the tests are fast, and it decouples the pieces of code making test failures more meaningful as the failure will likely be contained in unit tests whereas one failure would propigate to cause cascading failures in integration tests.i - We can write integration or smoke tests which attempt to run the code end-to-end. These should not be exhaustive and all such tests should take less than a few minutes to run total.
- Developers should be familiar with the
pytestconcepts offixturesto make the setup for tests repeatable,parametrizeto make a large number of variations on the same test, andpytest.raisesto check that the correct type of errors are thrown when they should be.
Copyright ©2022-2023. The Regents of the University of California (Regents). All Rights Reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.