[BRAINSTORM] refactor to use `MOF` objects as inputs 

I'm thinking of switching to a different design pattern to make the implementation of new featurizers, especially based on fragments, graphs, and building blocks, easier. 

I started introducing more and more conditionals on the input types for featurizers, be it graphs/molecules/structures which make the code really complicated -- especially if we also want to cache as much as possible. 

All of this would be much easier if the inputs for the featurizers would always be `MOF` objects. I already started implementing `MOF` objects for `mofchecker` and `moffragmentor` and they typically have (memoized) methods for computing graph/fragments/hashes/...  

If we were to use them, the object would simply "cache" the graph and the featurizer would no longer need to worry about it. Also, the featurizers could simply get the object attributes it needs (and the caller would not need to worry about what to provide, i.e. this logic is only implemented in one place).

In the long run, this might also make the implementation of the datasets easier as, again, the logic is only on the structure object and not partially re-implemented on the datasets. 

### Advantages 

- Fewer conditionals for making the control flow in featurizers dependent on the input type 
- Less logic for caching --- the input objects can cache it 
- Common design pattern with `mofchecker` and `moffragmentor`. Could, perhaps, be factored out in its very thin own library to just provide the API (to be seen) 
- Multiple featurizer would also no longer need to handle so much logic about primitive structures
- We could relatively easily serialize fragments and features in the same workflow without computational overhead
- We can get rid of things such as `MOFBBs` objects 
- Also #420 would close itself because we would not need `lru_cache` anymore 
- Featurizers such as `RACs` would no longer need to have their own nearest-neighbor search thing
- There is no need anymore for a MOF base featurizer. 

### Disadvantages 
- We would need a very thin wrapper for interoperability with `matminer`, i.e., we would basically always extract the structure attribute and then pass this to pymatgen 
- It would be a breaking change for the current API: need to change all docs, tutorials and the examples in the paper (however, doing it now might be a good time as there is not that much use yet) 
- BU featurizer would need to follow a different pattern and would still, in some way, need annotation on what type the featurizers extract
- BU featurizer seems still difficult to couple with `MultipleFeaturizer` with mixed types. Probably, the recommended usage pattern would be to use a MultipleFeaturizer around BUFeaturizers if needed and the BU featurizer always only operates on one type which we could then easily extract and pass to the featurizer. Caching of the fragments is no problem as it is handled by the MOF object  
- many featurizers do not only work on MOFs. In this sense the naming (also of the library) is misleading


### Usage example 

```python
from mofdscribe import MOF
mof = MOF.from_file("some_file.cif")
bu = BUFeaturizer(LSOP())
feats = bu.featurize(mof)
```

### Implementation idea 

```python
class MOF: 
    def __init__(self, structure, structure_graph): 
        self.structure = structure 
        self.structure_graph = structure_graph
    
    @classmethod
    def from_structure(cls, structure): 
        structure_graph = get_structure_graph(structure)
        return cls(structure, structure_graph

    @classmethod
    def from_file(): 
        ... 
    @cached_property
    def fragments(): 
        ...
    
    @cached_property
    def scaffold_hash():
        ... 
 ```

The featurizers would have the following signatures


```python
class GraphFeaturizer: 
    def featurize(self, mof): 
        structure_graph = mof.structure_graph
        return self._featurize(structure_graph)
        
class StructureFeaturizer: 
    def featurize(self, mof): 
        structure = mof.structure
        return self._featurize(structure)

MatminerFeaturizer = StructureFeaturizer

class SiteFeaturizer: 
      def featurize(self, mof, i): 
        structure = mof.structure
        return self._featurize(structure, structuregraph, i)

class BUFeaturizer: 
    def __init__(self, featurizer):
        if isinstance(featurizer, MultiFeaturizer): 
            raise ValueError("featurizer must be a single featurizer")
    
    def featurize(self, mof): 
        fragments = mof.fragments
```

In the last case, for the `BUFeaturizer`, the featurizer can inspect if it needs to get the molecules/structures/graphs from the fragments and then call the `_featurize` method of the featurizer passed in the constructor.


The `MOFMultipleFeaturizer` should probably always loop over structures for `featurize_many`. 

Also, if I'd do something like `featurizer = MOFMultipleFeaturizer([BUFeaturizer(LSOP()), BUFeaturizer(Dimensionality()), RACS()])` it should work without problems as all featurizers always accept `MOF` objects. 

```python
from mofdscribe import MOF 
from fastcore.xtras import save_pickle 

for file in files: 
    mof = MOF.from_file(file) 
    fragments = mof.fragment
    save_pickle(fragments) 
    features = featurizer.featurize(MOF)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BRAINSTORM] refactor to use `MOF` objects as inputs #421

Advantages

Disadvantages

Usage example

Implementation idea

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BRAINSTORM] refactor to use MOF objects as inputs #421

Description

Advantages

Disadvantages

Usage example

Implementation idea

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BRAINSTORM] refactor to use `MOF` objects as inputs #421