bert-score

Star

Here are 15 public repositories matching this topic...

Adii2202 / Athina-AI

Star

Creating RAG from Scratch . Creating RAG using the langchain. Creating RAG using llama indexing and Qdrant db

transformers bleu-score rag qdrant langchain llama-index mistral-7b bert-score

Updated Jun 13, 2024
Jupyter Notebook

prakhar-189 / LLM-Regression-Guard

Star

End-to-end MLOps pipeline that catches LLM quality regressions before production. Every PR is scored against a versioned golden dataset using BERTScore + ROUGE-L + an LLM-as-Judge rubric, compared to the MLflow production baseline, and shadowed against 5% of live traffic. FastAPI + Celery + TimescaleDB + Streamlit + DVC + GitHub Actions.

Updated Jun 15, 2026
Python

itsubaki / reval

Star

Evaluation metrics for ranking and retrieval

evaluation recall rouge precision ndcg bert-score

Updated Apr 14, 2026
Go

SankethSingh / Text-Translation_BERT

Star

This repository contains Machine-Translation model for French and English languages

tokenizer machine-translation word-embeddings text-translation english-language encoder-decoder-model french-translation bert-model transformer-model bert-score

Updated Jun 13, 2025
Jupyter Notebook

shreyas21563 / VQA-using-BLIP

Star

Leveraging the BLIP Model for Visual Question Answering: A Comparative Analysis on VQA and DAQUAR Datasets

machine-learning natural-language-processing computer-vision inference accuracy image-captioning bleu-score blip visual-question-answering wups vqav2 bert-score daquar

Updated Jun 18, 2024
Jupyter Notebook

MRPRESIDENT66 / CuisineRAG

Star

A RAG-based culinary QA system built with custom chunking, vectorisation, retrieval, and generation components.

nlp faiss rag langchain qwen bert-score south-asian-cuisine

Updated Apr 23, 2026
Jupyter Notebook

mallasiddharthreddy / multimodal-summarizer-ai

Star

Multi-modal Summarization App using Streamlit, Whisper, and Transformers for audio/text input with model comparison and evaluation.

nlp ai transformers summarization whisper streamlit bert-score

Updated Jul 10, 2025
Python

JhaAyush01 / RAG-Evaluation

Star

Different approaches to evaluate RAG !!!

rag wandb giskard langchain vectara rag-evaluation hallucination-detection ragas bert-score

Updated Aug 13, 2024
Jupyter Notebook

pngo1997 / Text-Summarization-Generation-using-LLMs

Star

Explores text summarization using multiple approaches.

python nlp text-generation fine-tuning encoder-decoder-model perplexity state-of-the-art-models google-t5 large-language-models llm prompt-engineering mistral-7b bert-score

Updated Feb 6, 2025
Jupyter Notebook

ikhimwinemmanuel / scientific_text_summarization_llms

Star

Scientific text summarisation using LED, QLoRA fine-tuning, and HSSM evaluation on arXiv papers.

nlp machine-learning transformers pytorch led text-summarization rouge huggingface scientific-summarization generative-ai qlora bert-score

Updated Jun 6, 2026
Python

sodrooome / hybrid-llm-test

Star

hybrid LLM-based automated test for AI generative models used by QA

natural-language-processing llms bert-score

Updated Sep 10, 2025
Python

sarank-21 / AI_News_Intelligence_System

Star

📰 End-to-end NLP pipeline for news intelligence — fine-tuned RoBERTa multilabel classifier, spaCy NER, T5/BART entity-aware summarization & 5-signal misinformation risk scoring. Served via Streamlit. 🧠📊🚀

python numpy transformers pandas pytorch spacy joblib langdetect shap streamlit bert-score rouge-score scikit-learn-

Updated Jun 15, 2026
Python

SanMog / Uroboros

Star

Automated red-teaming framework for LLMs. Tests GPT-4o, Claude, Llama against OWASP Top 10 for LLMs using Red/Blue/Judge multi-agent architecture. Found 6 CRITICAL vulnerabilities in GPT-4o-mini in <3 min.

python owasp llama ai-safety security-testing red-teaming gpt4 prompt-injection llm-security bert-score

Updated Mar 15, 2026
Python

farithadnan / KB-AnswerScorer

Star

A tool for evaluating LLM responses against a knowledge base of expert solutions.

python bleu-score f1-score rag openwebui bert-score

Updated Aug 27, 2025
Python

apayne185 / mBART-GPT-marionMT-machine-translations

Star

Comparative evaluation of neural machine translation architectures (MarianMT, mBART-50, NLLB-200, GPT-2) across German, Spanish, and Arabic. Includes multi-metric scoring (BLEU, chrF, METEOR, BERTScore, LaBSE), cross-lingual semantic similarity analysis, LLM-as-judge evaluation via LangChain, and WMT14/OPUS-100 benchmark runs.

multilingual python nlp machine-translation transformers semantic-analysis cross-lingual bleu huggingface mbart multilingual-nlp labse langchain sentance-transformers bert-score marionmt

Updated Jun 19, 2026
Python

Improve this page

Add a description, image, and links to the bert-score topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bert-score topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert-score

Here are 15 public repositories matching this topic...

Adii2202 / Athina-AI

prakhar-189 / LLM-Regression-Guard

itsubaki / reval

SankethSingh / Text-Translation_BERT

shreyas21563 / VQA-using-BLIP

MRPRESIDENT66 / CuisineRAG

mallasiddharthreddy / multimodal-summarizer-ai

JhaAyush01 / RAG-Evaluation

pngo1997 / Text-Summarization-Generation-using-LLMs

ikhimwinemmanuel / scientific_text_summarization_llms

sodrooome / hybrid-llm-test

sarank-21 / AI_News_Intelligence_System

SanMog / Uroboros

farithadnan / KB-AnswerScorer

apayne185 / mBART-GPT-marionMT-machine-translations

Improve this page

Add this topic to your repo