This is the artifact for the paper "Compass: Encrypted Semantic Search with High Accuracy" (to appear in OSDI 2025).
WARNING: This is an academic proof-of-concept prototype and has not received careful code review. This implementation is NOT ready for production use.
This repository is structured as follows:
data/: Contains source files, indices, and initial states for both clients and servers corresponding to each dataset used in the evaluation.script/: Includes scripts for uploading/downloading files to/from Google Cloud Storage, and scripts for artifact evaluation.src/: Implements the Compass protocol.tests/: Contains implementations for the Compass client and server, and code for initializing ORAM from a standard HNSW index.compass_initis for building the initial client and server states based on an HNSW index and configuration.test_compass_ringis a reference Compass client/server built on Ring ORAM.test_compass_accuracyis a variant that leaks access pattern. This is for faster accuracy experiments.test_compass_tpis a variant that keeps sending queries. This is for throughput experiments.config/config_ring.jsoncontains the parameter configuration for each of the evaluated dataset.
third_party/: Contains third-party libraries used by Compass.
./script/config.shWe've uploaded the datasets, indices, and pre-built initial client and server states to a public Google Cloud Storage bucket compass_osdi. To download all the files, run
python3 ./script/gcs_download.pymkdir build && cd build
cmake ..
makeUsage: ./test_compass_ring [ name=value ]...
r Role of party: Server = 1; Client = 2 # required
p Port Number [ default=8000 ]
d Dataset: [sift, trip, msmarco, laion] # required
n number of queries [ default=config ]
ip IP Address of server [ default=127.0.0.1 ]
efspec Size of speculation set [ default=config ]
efn Size of directional filter [ default=config ]
batch Disable batching [ default=1 ]
lazy Disable lazy eviction [ default=1 ]
f_latency Save latency [ default= ""]
f_accuracy Save accuracy [ default= ""]
f_comm Save communication [ default= ""]Currently we support four datasets: laion, sift, trip, msmarco. To quickly verify the local dependencies, run the following commands in two separate terminals:
./test_compass_ring r=1 d=laion n=10
./test_compass_ring r=2 d=laion n=10For separate machines, run
# server
./test_compass_ring r=1 d=laion ip=$server_ip
# client
./test_compass_ring r=2 d=laion ip=$server_ipWe use tcconfig to limit each instance's bandwidth and latency.
# fast network
tcset $device_name --delay 0.5ms --rate 3Gbps
# slow network
tcset $device_name --delay 40ms --rate 400Mbps
# reset
tcdel $device_name --allTo reproduce the paper's results, we provide a driver script (driver.py) and a provisioned GCP instance for artifact evaluators. The provisioned instance contains sufficient credentials for the script to launch testing instances during the experiments.
Note: As our experiments take a long time to complete, we recommend running the script within a tmux session. Please avoid running multiple experiments simultaneously.
[Approx. 2hrs] The performance experiments includes the accuracy and latency experiments.This command creates two instances (server and client) to run latency experiments. Accuracy experiments run locally on the server instance for faster results. After completion, results are fetched to ./script/artifact/results/, and instances are terminated.
Use --verbose for detailed test progress.
python3 driver.py --task performance[Approx. 2hrs] We perform the ablation study on msmarco dataset under slow network configuration. Similar to the performance experiments, two instances will be launched, and after the experiment, the results will be fetched to ./script/artifact/results/.
python3 driver.py --task ablationOnce performance experiment is done, run following command to render figures or print tables. Figures are saved in pdf format under eval_fig/. The communication results may slightly differ from the results in the paper because of batching multiple paths during eviction.
python3 driver.py --plot figure6python3 driver.py --plot figure7python3 driver.py --plot table3python3 driver.py --plot table4python3 driver.py --plot figure8Our throughput experiments launches 25 client instances that keeps sending request and one server instance that stores the index for all clients. In our script we have one (non-stop) monitor thread that collects throughput (qps) from each client. To stop the throuput experiment, use key-board (Ctrl+C) interupt. After throuput experiment, please run the cleanup command to mannually terminat all instances.
python3 driver.py --task throughputThe following command terminate all artifact evaluation related instances.
python3 driver.py --task stop_instancesBoth performance and ablation experiments automatically terminate instances upon completion. If an error occurs (e.g., due to instance failure or GCP staging delays), run the cleanup command above before restarting the experiment.
If you run into any issues or have any questions, please contact us on HotCRP or via email at jinhao.zhu@berkeley.edu, and we will reply promptly!