refactor: add abstract base classes for all risk module attack and defense entry points#72
Open
asim29 wants to merge 1 commit intopr/model-base-shared-utilities-toolingfrom
Open
Conversation
c45212c to
db745b5
Compare
…fense entry points All existing base classes now subclass ABC with a single @AbstractMethod that enforces the entry-point contract. Concrete classes that don't implement it will raise TypeError at instantiation, not at call time. Per-risk changes: Evasion: - EvasionAttack(ABC): @AbstractMethod attack() -> DataLoader - EvasionDefense(ABC): @AbstractMethod train_robust() -> nn.Module Poisoning: - PoisoningAttack(ABC) [new file]: @AbstractMethod poison_train() and poison_test() — the base was previously an empty shell - PoisoningDefense(ABC): @AbstractMethod train_robust() -> nn.Module Membership inference: - MembershipInferenceAttack(ABC): shadow-model lifecycle centralised on the base class; InferenceModel removed (LSP violation — it wrapped nn.Module but was used as a data container). Replaced by private _load_shadow_model(shadow_id) -> (model, in_data) which owns checkpoint loading, device placement, and grad freezing. torch.cuda.amp migrated to torch.amp (PyTorch 2.3). @AbstractMethod attack() -> dict. - MembershipInferenceDefense(ABC): @AbstractMethod train_private() -> nn.Module - lira.py: updated to use _load_shadow_model() in place of InferenceModel. Attribute inference: - AttributeInferenceAttack(ABC): @AbstractMethod attack() -> dict Distribution inference: - DistributionInferenceAttack(ABC) [new file]: owns the full model-population lifecycle — train_model_population(), prepare_model_populations() (concrete shared helpers), and @AbstractMethod attack(). Mirrors the MI lifecycle pattern for consistency. - dataset_utils.py [new file]: DistributionSplits dataclass and prepare_distribution_splits() — ratio-preserving subsampling helpers required by the DI base class. Data reconstruction: - DataReconstructionAttack(ABC): @AbstractMethod attack() -> list[Tensor] Discriminatory behavior: - Renamed DicriminatoryBehaviorDefense (typo) -> DiscriminatoryBehaviorDefense. - model_criterion/model_optimizer attributes renamed to criterion/optimizer for consistency with all other defense base classes. - @AbstractMethod train_fair() -> nn.Module - adversarial_debiasing.py: import and attribute references updated to match. Unauthorized model ownership: - UnauthModelOwnershipAttack(ABC) [new file]: @AbstractMethod attack() -> nn.Module - WatermarkDefense(ABC) and FingerprintDefense(ABC) [new file]: two separate ABCs with no shared parent. WatermarkNN and DatasetInference serve distinct verification strategies; a shared base would obscure that. Closes #26
db745b5 to
63758d2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces
ABCbase classes for every risk module's attacks and defenses, establishing a uniform lifecycle contract across security, privacy, and fairness modules. It also ships theDistributionInferenceAttackandDistributionSplitsdata utilities, fixes the discriminatory behavior defense class naming typo, and migrates MI shadow-model training away from the removedInferenceModelwrapper.Stacked on:
pr/model-base-shared-utilities-toolingWhat changed and why
Abstract base classes — all risk modules
Each risk module now has an explicit ABC with a single
@abstractmethodentry point, replacing the previous pattern of duck-typed or convention-only classes.amulet/evasion/attacks/evasion_attack.pyEvasionAttack(ABC)with@abstractmethod attack() -> DataLoaderamulet/evasion/defenses/evasion_defense.pyEvasionDefense(ABC)with@abstractmethod train_robust() -> nn.Moduleamulet/poisoning/attacks/poisoning_attack.pyPoisoningAttack(ABC)with@abstractmethod poison_train()and@abstractmethod poison_test()amulet/poisoning/defenses/poisoning_defense.pyPoisoningDefense(ABC)with@abstractmethod train_robust() -> nn.Moduleamulet/membership_inference/attacks/membership_inference_attack.pyMembershipInferenceAttack(ABC); adds_load_shadow_model()helper; removesInferenceModelinner class; migratestorch.cuda.amp→torch.amp(PyTorch 2.3 deprecation)amulet/membership_inference/defenses/membership_inference_defense.pyMembershipInferenceDefense(ABC)with@abstractmethod train_private() -> nn.Moduleamulet/discriminatory_behavior/defenses/discr_behavior_defense.pyDicriminatoryBehaviorDefense→DiscriminatoryBehaviorDefense; renamedmodel_criterion→criterionandmodel_optimizer→optimizerto match standard namingamulet/attribute_inference/attacks/attribute_inference_attack.pyAttributeInferenceAttack(ABC)with@abstractmethod attack() -> dict[str, np.ndarray]amulet/data_reconstruction/attacks/data_reconstruction_attack.pyDataReconstructionAttack(ABC)with@abstractmethod attack() -> dict[str, np.ndarray]amulet/unauth_model_ownership/attacks/unauth_model_ownership_attack.pyUnauthModelOwnershipAttack(ABC)with@abstractmethod attack()amulet/unauth_model_ownership/defenses/unauth_model_ownership_defense.pyWatermarkDefensewith@abstractmethod watermark()andFingerprintDefensewith@abstractmethod fingerprint(). Intentionally no shared parent — watermarking and fingerprinting have incompatible return types and lifecycles.Distribution Inference lifecycle
DistributionInferenceAttackmirrors the shadow-model lifecycle already present inMembershipInferenceAttack.amulet/distribution_inference/attacks/distribution_inference_attack.pyDistributionInferenceAttack(ABC)withtrain_model_population(),prepare_model_populations(), and@abstractmethod attack() -> dict[str, np.ndarray]amulet/distribution_inference/dataset_utils.pyDistributionSplitsdataclass andprepare_distribution_splits()— required by the DI ABCamulet/distribution_inference/attacks/__init__.pyDistributionInferenceAttackalongside existingDistributionInferenceConcrete class migrations (minimal — only what PR 2 breaks)
amulet/membership_inference/attacks/lira.pyInferenceModelimport; replaced with_load_shadow_model(shadow_id)returning(model, in_data)tupleamulet/discriminatory_behavior/defenses/adversarial_debiasing.pyDiscriminatoryBehaviorDefense; updated attribute references (model_criterion→criterion,model_optimizer→optimizer)amulet/unauth_model_ownership/attacks/__init__.pyUnauthModelOwnershipAttackamulet/unauth_model_ownership/defenses/__init__.pyWatermarkDefense,FingerprintDefenseTest plan
uv run pre-commit run --all-filespasses (ruff, ruff-format, basedpyright, prettier, markdownlint all green)from amulet.membership_inference.attacks import MembershipInferenceAttackraisesTypeErrorwhen instantiated without implementingattack()from amulet.distribution_inference.attacks import DistributionInferenceAttackraisesTypeErrorwhen instantiated without implementingattack()from amulet.unauth_model_ownership.defenses import WatermarkDefense, FingerprintDefenseresolves correctlyfrom amulet.distribution_inference import dataset_utilsresolves correctlyLiRAconcrete class instantiates without error with valid args🤖 Generated with Claude Code