Embedding Service

This README provides comprehensive guidance on setting up, running, and using the Embedding Service. The service leverages advanced AI models for text labeling and embedding, making it ideal for machine learning applications that require high-quality text representation.

Overview

The Embedding Service is a gRPC-based service designed to:

Label Text: Generate meaningful labels using GPT-3.5 Turbo.
Generate Embeddings: Convert labeled text into efficient vector representations using Text-Embedding-Small-3.
JSON Integration: Deliver labeled and embedded data in a structured JSON format.
Real-Time API: Enable real-time data streaming using a gRPC server for seamless integration with other systems.

Features

Text Labeling:
- Uses GPT-3.5 Turbo for generating descriptive labels for input text.
- Enhances the interpretability of raw text data.
Text Embedding:
- Converts labeled data into embeddings with Text-Embedding-Small-3.
- Provides high-dimensional vector representations for downstream tasks like clustering, classification, and search.
JSON Output:
- Structured output includes both the text labels and embeddings in an easily consumable JSON format.
gRPC Server:
- Offers a real-time API to stream JSON responses.
- Designed for scalability and integration with client applications.

Setup Instructions

Prerequisites

Python Environment:
- Python 3.8 or higher is recommended.
- Install virtualenv for managing dependencies.
gRPC Tools:
- Ensure grpcio and grpcio-tools are installed for generating and running gRPC services.
System Requirements:
- Memory: At least 4GB of RAM.
- Disk Space: 10GB available space for dependencies and logs.

Installation

Clone the Repository:

git clone https://github.com/your-repository/embedding-service.git
cd embedding-service

Set Up Virtual Environment:

python3 -m venv venv
source venv/bin/activate  # For Linux/MacOS
venv\Scripts\activate     # For Windows

Install Dependencies: Install all necessary libraries from requirements.txt:
```
pip install -r requirements.txt
```

Generate gRPC Code: If changes have been made to the .proto file, regenerate the gRPC stubs:

python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. modules/proto/embedding_service.proto

Running the Service

Start the gRPC Server: Launch the server to handle incoming gRPC requests:
```
python server.py
```
Output:
```
Server started, listening on port 50051.
```
Test the Service: Run the provided client or use a custom client to send requests. Example:
```
python client.py
```

Usage

The Embedding Service accepts a text file as input and returns labeled and embedded data in JSON format via gRPC.

Example Workflow:

Prepare a text file (e.g., example.txt) containing the raw text data.
Send the file path via the gRPC client.
Receive a JSON stream response with:
- Labels: Generated by GPT-3.5 Turbo.
- Embeddings: Produced by Text-Embedding-Small-3.

Example JSON Response:

{
    "definition": {
        "collection_name": "example_collection",
        "partition_name": "example_partition",
        "description": "A sample text file",
        "dimension": 128,
        "metric_type": "L2"
    },
    "embeddings": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
}

Development

File Structure

server.py: Starts the gRPC server.
client.py: Example client to test the service.
modules/:
- proto/: Contains the .proto file and generated gRPC stubs.
- services/: Contains the Labeler and EmbeddingGenerator logic.
- template.py: Handles JSON template creation and validation.

Troubleshooting

gRPC Not Found:
- Ensure grpcio and grpcio-tools are installed:
```
pip install grpcio grpcio-tools
```
Serialization Errors:
- Ensure the server is returning the correct response type defined in the .proto file.

Port Already in Use:

Check if another service is running on port 50051:

lsof -i:50051  # For Linux/MacOS
netstat -ano | findstr :50051  # For Windows

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a feature branch.
Submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
embed		embed
modules		modules
test		test
.gitignore		.gitignore
.sample.env		.sample.env
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
client.py		client.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embedding Service

Overview

Features

Setup Instructions

Prerequisites

Installation

Running the Service

Usage

Example Workflow:

Development

File Structure

Troubleshooting

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

luannn010/Embedding_Service

Folders and files

Latest commit

History

Repository files navigation

Embedding Service

Overview

Features

Setup Instructions

Prerequisites

Installation

Running the Service

Usage

Example Workflow:

Development

File Structure

Troubleshooting

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages