I'm looking to create embeddings for the STRING protein-protein interaction database. In this notebook, I convert this network to scipy.sparse.csr_matrix and numpy.array formats. The network is big with 19,566 nodes and 11,759,454 (divide that in two for bidirectional edges).
It's easy to convert the adjacency matrix to network, e.g. with networkx.from_scipy_sparse_matrix. But this is time consuming and the resulting networkx graph takes up a lot of memory. So my question is whether there are any embedding methods that can directly take an adjacency matrix? If so, perhaps that would be computationally be more efficient than going through networkx.
Also any advice on what sort of hardware would be needed to operate on a graph of this size? Which methods are most efficient in terms of memory (and/or compute)? Thanks!
I'm looking to create embeddings for the STRING protein-protein interaction database. In this notebook, I convert this network to
scipy.sparse.csr_matrixandnumpy.arrayformats. The network is big with 19,566 nodes and 11,759,454 (divide that in two for bidirectional edges).It's easy to convert the adjacency matrix to network, e.g. with
networkx.from_scipy_sparse_matrix. But this is time consuming and the resulting networkx graph takes up a lot of memory. So my question is whether there are any embedding methods that can directly take an adjacency matrix? If so, perhaps that would be computationally be more efficient than going through networkx.Also any advice on what sort of hardware would be needed to operate on a graph of this size? Which methods are most efficient in terms of memory (and/or compute)? Thanks!