Skip to content

BPlexica is an AI-based search service. It performs web searches through SearXNG, and uses LangChain and Large Language Models (LLMs) to synthesize and analyze search results to provide users with detailed and accurate answers.

Notifications You must be signed in to change notification settings

sh2orc/bplexica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BPlexica - AI-based Search Service

BPlexica is an AI-based search service. It performs web searches through SearXNG, and uses LangChain and Large Language Models (LLMs) to synthesize and analyze search results to provide users with detailed and accurate answers.

πŸš€ Key Features

  • AI-based Search: Perform web searches through SearXNG instance
  • LLM Integration: Connect with OpenAI GPT models using LangChain (supports custom API URLs)
  • Real-time Streaming: Real-time response streaming via Server-Sent Events (SSE)
  • RESTful API: Modern API design based on FastAPI
  • Various Search Modes: Support for various focus modes including web, image, news, maps, etc.
  • Korean Language Support: Korean interface and response support
  • πŸ›‘οΈ Privacy Protection: Automatic detection and masking of Korean personally identifiable information (PII)
  • πŸ“Š Comprehensive Logging: Date-based log file management and automatic cleanup
  • Comprehensive Error Handling: Custom exceptions and detailed error messages

πŸ—οΈ Project Structure

bplexica/
β”œβ”€β”€ app/                          # Core application code
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                   # FastAPI application entry point
β”‚   β”œβ”€β”€ api/                      # API endpoints
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── routers/
β”‚   β”‚       β”œβ”€β”€ __init__.py
β”‚   β”‚       └── search.py         # Search API router
β”‚   β”œβ”€β”€ core/                     # Core configuration and utilities
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ config.py             # Environment configuration
β”‚   β”‚   β”œβ”€β”€ exceptions.py         # Custom exceptions
β”‚   β”‚   β”œβ”€β”€ logger.py             # Logging system
β”‚   β”‚   β”œβ”€β”€ middleware.py         # Privacy protection middleware
β”‚   β”‚   β”œβ”€β”€ privacy_filter.py     # Personal information detection and masking
β”‚   β”‚   └── prompts.py            # LLM prompt templates
β”‚   β”œβ”€β”€ schemas/                  # Pydantic schemas
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── search.py             # Search request/response schemas
β”‚   └── services/                 # Business logic
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ llm_service.py        # LLM service
β”‚       β”œβ”€β”€ search_service.py     # Search service coordination
β”‚       └── searxng_client.py     # SearXNG client
β”œβ”€β”€ searchxng/                    # SearXNG configuration
β”‚   β”œβ”€β”€ settings.yml              # SearXNG settings file
β”‚   β”œβ”€β”€ uwsgi.ini                 # uWSGI configuration
β”‚   β”œβ”€β”€ limiter.toml              # Rate limiting configuration
β”‚   └── searxng-docker/           # Docker container configuration
β”‚       β”œβ”€β”€ docker-compose.yaml   # Docker Compose configuration
β”‚       β”œβ”€β”€ Caddyfile             # Caddy web server configuration
β”‚       └── searxng/              # SearXNG container configuration
β”œβ”€β”€ tests/                        # Test files
β”‚   β”œβ”€β”€ test_privacy_filter.py    # Personal information masking tests
β”‚   └── test_privacy_protection.sh # Privacy protection integration tests
β”œβ”€β”€ log/                          # Log files directory
β”‚   β”œβ”€β”€ bplexica_YYYY-MM-DD.log   # General logs (by date)
β”‚   └── bplexica_error_YYYY-MM-DD.log # Error logs (by date)
β”œβ”€β”€ requirements.txt              # Python dependencies
└── README.md                     # Project documentation

πŸ”§ Technology Stack

  • Web Framework: FastAPI (high-performance asynchronous web framework)
  • AI/ML: LangChain, OpenAI GPT (or other LLM providers)
  • Search Engine: SearXNG (meta search engine)
  • HTTP Client: HTTPX (asynchronous HTTP requests)
  • Data Validation: Pydantic (type safety and data validation)
  • Configuration Management: python-dotenv, pydantic-settings
  • HTML Parsing: BeautifulSoup4, lxml
  • Web Server: Uvicorn

πŸ“¦ Installation and Setup

1. Clone Repository

git clone https://github.com/sh2orc/bplexica.git
cd bplexica

2. Create and Activate Virtual Environment

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

3. Install Dependencies

# Basic installation
pip install -r requirements.txt

# Or use Makefile (recommended)
make install

# Include development/testing dependencies
make install-dev

4. Set Environment Variables

Create a .env file in the project root and add the following settings:

# SearXNG Configuration
SEARXNG_BASE_URL=http://localhost:8080

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_URL=https://api.openai.com/v1
LLM_MODEL=gpt-3.5-turbo

# Embedding Model Configuration (optional)
EMBEDDING_MODEL=text-embedding-ada-002

# Logging Configuration (optional)
LOG_LEVEL=INFO                    # DEBUG, INFO, WARNING, ERROR, CRITICAL
LOG_DIR=log                       # Directory where log files will be stored
LOG_MAX_BYTES=10485760           # Maximum log file size (10MB)
LOG_BACKUP_COUNT=30              # Number of backup files
LOG_CLEANUP_DAYS=30              # Automatic deletion period for old log files (days)

5. SearXNG Configuration

Run a SearXNG instance using Docker:

# Direct execution
cd searchxng/searxng-docker
docker-compose up -d

# Or use Makefile (recommended)
make run-searxng

Verify that SearXNG is running at http://localhost:8080.

πŸš€ Running the Application

Development Mode

# Direct execution
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Or use Makefile (recommended)
make run-server

Production Mode

uvicorn app.main:app --host 0.0.0.0 --port 8000

Quick Start (Using Makefile)

# Set up the entire development environment
make quickstart

# Run the API server in a separate terminal
make run-server

Using Docker

# Run development environment with Docker (recommended)
make quickstart-docker

# Or manually
make docker-build
make docker-dev

# Production environment
make docker-run

# Check logs
make docker-logs

# Stop containers
make docker-stop

The application will run at http://localhost:8000.

πŸ“š API Usage

Interactive API Documentation

You can test the API through Swagger UI at the following URLs:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Search API Endpoint

POST /api/v1/search

Performs regular or streaming search.

Request Example:

{
  "query": "What is FastAPI?",
  "focus_mode": "web",
  "stream": false
}

Response Example:

{
  "message": "FastAPI is a modern, fast web framework for Python...",
  "sources": [
    {
      "title": "FastAPI Official Documentation",
      "url": "https://fastapi.tiangolo.com/",
      "snippet": "FastAPI is a modern, fast web framework for Python 3.7+..."
    }
  ]
}

GET /api/v1/search

Simple search using query parameters.

Request Example:

GET /api/v1/search?query=Python&focus_mode=web

Streaming Search

Set stream: true to receive real-time responses via Server-Sent Events (SSE).

JavaScript Client Example:

const eventSource = new EventSource('/api/v1/search', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: 'The future of artificial intelligence',
    focus_mode: 'web',
    stream: true
  })
});

eventSource.onmessage = function(event) {
  const data = JSON.parse(event.data);
  console.log('Response chunk:', data.content);
};

🎯 Key Components

Search Service (SearchService)

  • Coordinates SearXNG client and LLM service
  • Handles regular and streaming search requests
  • Context extraction and source management

SearXNG Client (SearxngClient)

  • Asynchronous communication with SearXNG instance
  • Support for various search categories
  • Browser user agent emulation
  • Comprehensive error handling

LLM Service (LLMService)

  • OpenAI GPT model integration through LangChain
  • Regular and streaming response generation
  • Uses custom prompt templates
  • Ready to support multiple LLM providers

Exception Handling

Fine-grained error handling through custom exception classes:

  • SearxngConnectionError: SearXNG connection failure
  • SearxngRateLimitError: Rate limit exceeded
  • SearxngSearchError: Search service error
  • LLMProcessingError: LLM processing error

πŸ”§ Developer Guide

Custom OpenAI API URL

You can use various OpenAI-compatible services:

# Azure OpenAI
OPENAI_API_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment

# OpenAI proxy server
OPENAI_API_URL=https://your-proxy-server.com/v1

# Local LLM (e.g., Ollama, LocalAI)
OPENAI_API_URL=http://localhost:11434/v1

# Default OpenAI API
OPENAI_API_URL=https://api.openai.com/v1

Adding New LLM Providers

You can add new providers in app/services/llm_service.py:

elif model_provider == "anthropic":
    from langchain_anthropic import ChatAnthropic
    self.llm = ChatAnthropic(
        model=self.model_name,
        api_key=self.api_key
    )

Modifying Prompt Templates

You can modify prompt templates in app/core/prompts.py to adjust the style and format of AI responses.

Adding New Search Categories

You can add new search engines or categories by modifying the SearXNG configuration (searchxng/settings.yml).

πŸ§ͺ Testing

Test framework setup:

pip install pytest pytest-asyncio
pytest tests/

πŸ“Š Logging and Monitoring

BPlexica provides a comprehensive logging system to monitor and debug the application's behavior.

Logging Features

  • Date-based Log Files: Automatically created in log/bplexica_YYYY-MM-DD.log format
  • Error-only Logs: Error level and above logs recorded in log/bplexica_error_YYYY-MM-DD.log
  • Log Rotation: Automatic rotation when file size reaches the set threshold
  • Automatic Cleanup: Automatic deletion of old log files (default 30 days)

Log Categories

  1. HTTP Request Logs

    • Client IP, request method, URL, response code, processing time
    • Search query and configuration information
  2. Search Service Logs

    • Search request start/completion time
    • SearXNG search result collection information
    • Context extraction and source management
  3. LLM Service Logs

    • Model name used, prompt length, response length
    • Processing time and token usage tracking
    • Number of streaming chunks and total response length
  4. System Logs

    • Application start/stop
    • Configuration changes
    • Error and exception information

Log Configuration

You can control logging behavior through environment variables:

# Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
LOG_LEVEL=INFO

# Log file storage directory
LOG_DIR=log

# Maximum log file size (bytes)
LOG_MAX_BYTES=10485760  # 10MB

# Number of backup files
LOG_BACKUP_COUNT=30

# Automatic deletion period for old log files (days)
LOG_CLEANUP_DAYS=30

Log Monitoring

# Real-time log monitoring
tail -f log/bplexica_$(date +%Y-%m-%d).log

# Monitor error logs only
tail -f log/bplexica_error_$(date +%Y-%m-%d).log

# Search for specific keywords
grep "search request" log/bplexica_*.log

# Find requests with long processing times
grep "processing time.*[5-9]\.[0-9]" log/bplexica_*.log

Log Analysis Examples

# Daily request count statistics
grep "search API call" log/bplexica_2025-07-18.log | wc -l

# Calculate average response time
grep "processing time" log/bplexica_2025-07-18.log | \
  grep -o '[0-9]\+\.[0-9]\+s' | sed 's/s//' | \
  awk '{sum+=$1; count++} END {print "Average:", sum/count, "seconds"}'

# Error frequency analysis
grep "ERROR" log/bplexica_error_2025-07-18.log | \
  cut -d'|' -f4 | sort | uniq -c | sort -nr

Performance Monitoring

You can monitor the following performance metrics through logs:

  • Response Time: Processing time for each API request
  • LLM Performance: Processing time and token usage by model
  • Search Performance: SearXNG search and result processing time
  • Error Rate: Error frequency by time period/function

πŸ›‘οΈ Privacy Protection

BPlexica provides automatic detection and masking of Korean personally identifiable information (PII). It detects and safely masks personal information before processing any API request.

Supported Personal Information Types

  • Resident Registration Number: 901212-1234567 β†’ 901212-*******
  • Foreign Registration Number: 901212-5123456 β†’ 901212-*******
  • Passport Number: M12345678 β†’ M1****78
  • Driver's License Number: 11-12-123456-78 β†’ 11-12-****-**78
  • Email: test@example.com β†’ te**@e******.com
  • Mobile Phone Number: 010-1234-5678 β†’ 010-****-5678
  • Phone Number: 02-1234-5678 β†’ 02-****-5678
  • Card Number: 1234-5678-9012-3456 β†’ 1234-****-****-3456
  • Account Number: 123-45-678901 β†’ 123-**-**8901
  • Business Registration Number: 123-45-67890 β†’ 123-**-***90
  • Corporate Registration Number: 123456-1234567 β†’ 123456-****567
  • IP Address: 192.168.1.100 β†’ 192.168.*******

Privacy Protection Features

  1. Automatic Detection: Real-time personal information detection using regular expressions
  2. Safe Masking: Mask sensitive parts and keep only necessary parts
  3. Detailed Logging: Record detected personal information types and locations in logs
  4. Response Headers: Indicate protection status with X-Privacy-Protected, X-PII-Detected headers
  5. Multi-layer Processing: Support for URL parameters, JSON body, and nested structures

Privacy Protection Testing

# Run privacy protection feature test
./tests/test_privacy_protection.sh

# Real-time monitoring of personal information detection logs
tail -f log/bplexica_$(date +%Y-%m-%d).log | grep "personal_information"

# Search for specific personal information types
grep "personal_information detection.*resident_registration_number" log/bplexica_*.log

Privacy Protection Settings

The privacy protection feature is enabled by default, and the following paths are excluded:

  • /docs - API documentation
  • /redoc - API documentation (ReDoc)
  • /openapi.json - OpenAPI schema
  • /favicon.ico - Favicon

Privacy Middleware Features

  • Priority Processing: Executed as the very first step of all requests
  • Lossless Masking: Maintains the structure and form of the original data as much as possible
  • Performance Optimization: Fast regex matching and efficient string processing
  • Detailed Auditing: Records all personal information detection and masking processes in logs

πŸ“„ License

This project is distributed under the MIT License. See the LICENSE file for more information.

πŸ™ Acknowledgements

  • FastAPI - Modern Python web framework
  • LangChain - LLM application development framework
  • SearXNG - Privacy-focused meta search engine
  • OpenAI - GPT model provider

πŸ“ž Contact

If you have questions or suggestions about the project, please create an issue or contact us.


BPlexica - A smarter search experience with AI

About

BPlexica is an AI-based search service. It performs web searches through SearXNG, and uses LangChain and Large Language Models (LLMs) to synthesize and analyze search results to provide users with detailed and accurate answers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published