Skip to content

Feature Request: Speech based Controls #51

Description

@rajeshgayathri2003

Hi @kaushav07 !

I see that while speech based control is one of the core features of this project, no one has taken it up yet. I would like to propose the following workflow:

  1. Speech-to-text algorithm to convert the request of the user to text
  2. Use an image captioning algorithm like CLIP or DinoV2 to help complete the user's request based on visual input.

Would be happy to take this up!

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions