Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

How can I create embeddings? #6

@stribizhev

Description

@stribizhev

I see the "template":

from compute_pretrained_embeddings import get_embeddings

model = ...
dataloader = ...

path_str_type = ...
emb_memory_loc = ...
paths_memory_loc = ...
dataset_size = ...
emb_size = ...
emb_array = np.memmap(emb_memory_loc, dtype='float32', mode='w+', shape=(dataset_size, emb_size))
path_array = np.memmap(emb_memory_loc, dtype=path_str_type, mode='w+', shape=(dataset_size,))

get_embeddings(model, dataloader, emd_memmap, paths_memmap)

How do I initialize these variables? Are there any pre-requisites you have not mentioned (like installing "transformers" library, etc.)? Where do I get values for path_str_type, emb_memory_loc...? Is there a sample full script?

T oclarify: I have a text file and I want to semdedup the sentences in it. How do I get through the first step?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions