BPlexica - AI-based Search Service

BPlexica is an AI-based search service. It performs web searches through SearXNG, and uses LangChain and Large Language Models (LLMs) to synthesize and analyze search results to provide users with detailed and accurate answers.

🚀 Key Features

AI-based Search: Perform web searches through SearXNG instance
LLM Integration: Connect with OpenAI GPT models using LangChain (supports custom API URLs)
Real-time Streaming: Real-time response streaming via Server-Sent Events (SSE)
RESTful API: Modern API design based on FastAPI
Various Search Modes: Support for various focus modes including web, image, news, maps, etc.
Korean Language Support: Korean interface and response support
🛡️ Privacy Protection: Automatic detection and masking of Korean personally identifiable information (PII)
📊 Comprehensive Logging: Date-based log file management and automatic cleanup
Comprehensive Error Handling: Custom exceptions and detailed error messages

🏗️ Project Structure

bplexica/
├── app/                          # Core application code
│   ├── __init__.py
│   ├── main.py                   # FastAPI application entry point
│   ├── api/                      # API endpoints
│   │   ├── __init__.py
│   │   └── routers/
│   │       ├── __init__.py
│   │       └── search.py         # Search API router
│   ├── core/                     # Core configuration and utilities
│   │   ├── __init__.py
│   │   ├── config.py             # Environment configuration
│   │   ├── exceptions.py         # Custom exceptions
│   │   ├── logger.py             # Logging system
│   │   ├── middleware.py         # Privacy protection middleware
│   │   ├── privacy_filter.py     # Personal information detection and masking
│   │   └── prompts.py            # LLM prompt templates
│   ├── schemas/                  # Pydantic schemas
│   │   ├── __init__.py
│   │   └── search.py             # Search request/response schemas
│   └── services/                 # Business logic
│       ├── __init__.py
│       ├── llm_service.py        # LLM service
│       ├── search_service.py     # Search service coordination
│       └── searxng_client.py     # SearXNG client
├── searchxng/                    # SearXNG configuration
│   ├── settings.yml              # SearXNG settings file
│   ├── uwsgi.ini                 # uWSGI configuration
│   ├── limiter.toml              # Rate limiting configuration
│   └── searxng-docker/           # Docker container configuration
│       ├── docker-compose.yaml   # Docker Compose configuration
│       ├── Caddyfile             # Caddy web server configuration
│       └── searxng/              # SearXNG container configuration
├── tests/                        # Test files
│   ├── test_privacy_filter.py    # Personal information masking tests
│   └── test_privacy_protection.sh # Privacy protection integration tests
├── log/                          # Log files directory
│   ├── bplexica_YYYY-MM-DD.log   # General logs (by date)
│   └── bplexica_error_YYYY-MM-DD.log # Error logs (by date)
├── requirements.txt              # Python dependencies
└── README.md                     # Project documentation

🔧 Technology Stack

Web Framework: FastAPI (high-performance asynchronous web framework)
AI/ML: LangChain, OpenAI GPT (or other LLM providers)
Search Engine: SearXNG (meta search engine)
HTTP Client: HTTPX (asynchronous HTTP requests)
Data Validation: Pydantic (type safety and data validation)
Configuration Management: python-dotenv, pydantic-settings
HTML Parsing: BeautifulSoup4, lxml
Web Server: Uvicorn

📦 Installation and Setup

1. Clone Repository

git clone https://github.com/sh2orc/bplexica.git
cd bplexica

2. Create and Activate Virtual Environment

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

3. Install Dependencies

# Basic installation
pip install -r requirements.txt

# Or use Makefile (recommended)
make install

# Include development/testing dependencies
make install-dev

4. Set Environment Variables

Create a .env file in the project root and add the following settings:

# SearXNG Configuration
SEARXNG_BASE_URL=http://localhost:8080

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_URL=https://api.openai.com/v1
LLM_MODEL=gpt-3.5-turbo

# Embedding Model Configuration (optional)
EMBEDDING_MODEL=text-embedding-ada-002

# Logging Configuration (optional)
LOG_LEVEL=INFO                    # DEBUG, INFO, WARNING, ERROR, CRITICAL
LOG_DIR=log                       # Directory where log files will be stored
LOG_MAX_BYTES=10485760           # Maximum log file size (10MB)
LOG_BACKUP_COUNT=30              # Number of backup files
LOG_CLEANUP_DAYS=30              # Automatic deletion period for old log files (days)

5. SearXNG Configuration

Run a SearXNG instance using Docker:

# Direct execution
cd searchxng/searxng-docker
docker-compose up -d

# Or use Makefile (recommended)
make run-searxng

Verify that SearXNG is running at http://localhost:8080.

🚀 Running the Application

Development Mode

# Direct execution
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Or use Makefile (recommended)
make run-server

Production Mode

uvicorn app.main:app --host 0.0.0.0 --port 8000

Quick Start (Using Makefile)

# Set up the entire development environment
make quickstart

# Run the API server in a separate terminal
make run-server

Using Docker

# Run development environment with Docker (recommended)
make quickstart-docker

# Or manually
make docker-build
make docker-dev

# Production environment
make docker-run

# Check logs
make docker-logs

# Stop containers
make docker-stop

The application will run at http://localhost:8000.

📚 API Usage

Interactive API Documentation

You can test the API through Swagger UI at the following URLs:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Search API Endpoint

POST /api/v1/search

Performs regular or streaming search.

Request Example:

{
  "query": "What is FastAPI?",
  "focus_mode": "web",
  "stream": false
}

Response Example:

{
  "message": "FastAPI is a modern, fast web framework for Python...",
  "sources": [
    {
      "title": "FastAPI Official Documentation",
      "url": "https://fastapi.tiangolo.com/",
      "snippet": "FastAPI is a modern, fast web framework for Python 3.7+..."
    }
  ]
}

GET /api/v1/search

Simple search using query parameters.

Request Example:

GET /api/v1/search?query=Python&focus_mode=web

Streaming Search

Set stream: true to receive real-time responses via Server-Sent Events (SSE).

JavaScript Client Example:

const eventSource = new EventSource('/api/v1/search', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: 'The future of artificial intelligence',
    focus_mode: 'web',
    stream: true
  })
});

eventSource.onmessage = function(event) {
  const data = JSON.parse(event.data);
  console.log('Response chunk:', data.content);
};

🎯 Key Components

Search Service (SearchService)

Coordinates SearXNG client and LLM service
Handles regular and streaming search requests
Context extraction and source management

SearXNG Client (SearxngClient)

Asynchronous communication with SearXNG instance
Support for various search categories
Browser user agent emulation
Comprehensive error handling

LLM Service (LLMService)

OpenAI GPT model integration through LangChain
Regular and streaming response generation
Uses custom prompt templates
Ready to support multiple LLM providers

Exception Handling

Fine-grained error handling through custom exception classes:

SearxngConnectionError: SearXNG connection failure
SearxngRateLimitError: Rate limit exceeded
SearxngSearchError: Search service error
LLMProcessingError: LLM processing error

🔧 Developer Guide

Custom OpenAI API URL

You can use various OpenAI-compatible services:

# Azure OpenAI
OPENAI_API_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment

# OpenAI proxy server
OPENAI_API_URL=https://your-proxy-server.com/v1

# Local LLM (e.g., Ollama, LocalAI)
OPENAI_API_URL=http://localhost:11434/v1

# Default OpenAI API
OPENAI_API_URL=https://api.openai.com/v1

Adding New LLM Providers

You can add new providers in app/services/llm_service.py:

elif model_provider == "anthropic":
    from langchain_anthropic import ChatAnthropic
    self.llm = ChatAnthropic(
        model=self.model_name,
        api_key=self.api_key
    )

Modifying Prompt Templates

You can modify prompt templates in app/core/prompts.py to adjust the style and format of AI responses.

Adding New Search Categories

You can add new search engines or categories by modifying the SearXNG configuration (searchxng/settings.yml).

🧪 Testing

Test framework setup:

pip install pytest pytest-asyncio
pytest tests/

📊 Logging and Monitoring

BPlexica provides a comprehensive logging system to monitor and debug the application's behavior.

Logging Features

Date-based Log Files: Automatically created in log/bplexica_YYYY-MM-DD.log format
Error-only Logs: Error level and above logs recorded in log/bplexica_error_YYYY-MM-DD.log
Log Rotation: Automatic rotation when file size reaches the set threshold
Automatic Cleanup: Automatic deletion of old log files (default 30 days)

Log Categories

HTTP Request Logs
- Client IP, request method, URL, response code, processing time
- Search query and configuration information
Search Service Logs
- Search request start/completion time
- SearXNG search result collection information
- Context extraction and source management
LLM Service Logs
- Model name used, prompt length, response length
- Processing time and token usage tracking
- Number of streaming chunks and total response length
System Logs
- Application start/stop
- Configuration changes
- Error and exception information

Log Configuration

You can control logging behavior through environment variables:

# Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
LOG_LEVEL=INFO

# Log file storage directory
LOG_DIR=log

# Maximum log file size (bytes)
LOG_MAX_BYTES=10485760  # 10MB

# Number of backup files
LOG_BACKUP_COUNT=30

# Automatic deletion period for old log files (days)
LOG_CLEANUP_DAYS=30

Log Monitoring

# Real-time log monitoring
tail -f log/bplexica_$(date +%Y-%m-%d).log

# Monitor error logs only
tail -f log/bplexica_error_$(date +%Y-%m-%d).log

# Search for specific keywords
grep "search request" log/bplexica_*.log

# Find requests with long processing times
grep "processing time.*[5-9]\.[0-9]" log/bplexica_*.log

Log Analysis Examples

# Daily request count statistics
grep "search API call" log/bplexica_2025-07-18.log | wc -l

# Calculate average response time
grep "processing time" log/bplexica_2025-07-18.log | \
  grep -o '[0-9]\+\.[0-9]\+s' | sed 's/s//' | \
  awk '{sum+=$1; count++} END {print "Average:", sum/count, "seconds"}'

# Error frequency analysis
grep "ERROR" log/bplexica_error_2025-07-18.log | \
  cut -d'|' -f4 | sort | uniq -c | sort -nr

Performance Monitoring

You can monitor the following performance metrics through logs:

Response Time: Processing time for each API request
LLM Performance: Processing time and token usage by model
Search Performance: SearXNG search and result processing time
Error Rate: Error frequency by time period/function

🛡️ Privacy Protection

BPlexica provides automatic detection and masking of Korean personally identifiable information (PII). It detects and safely masks personal information before processing any API request.

Supported Personal Information Types

Resident Registration Number: 901212-1234567 → 901212-*******
Foreign Registration Number: 901212-5123456 → 901212-*******
Passport Number: M12345678 → M1****78
Driver's License Number: 11-12-123456-78 → 11-12-****-**78
Email: test@example.com → te**@e******.com
Mobile Phone Number: 010-1234-5678 → 010-****-5678
Phone Number: 02-1234-5678 → 02-****-5678
Card Number: 1234-5678-9012-3456 → 1234-****-****-3456
Account Number: 123-45-678901 → 123-**-**8901
Business Registration Number: 123-45-67890 → 123-**-***90
Corporate Registration Number: 123456-1234567 → 123456-****567
IP Address: 192.168.1.100 → 192.168.*******

Privacy Protection Features

Automatic Detection: Real-time personal information detection using regular expressions
Safe Masking: Mask sensitive parts and keep only necessary parts
Detailed Logging: Record detected personal information types and locations in logs
Response Headers: Indicate protection status with X-Privacy-Protected, X-PII-Detected headers
Multi-layer Processing: Support for URL parameters, JSON body, and nested structures

Privacy Protection Testing

# Run privacy protection feature test
./tests/test_privacy_protection.sh

# Real-time monitoring of personal information detection logs
tail -f log/bplexica_$(date +%Y-%m-%d).log | grep "personal_information"

# Search for specific personal information types
grep "personal_information detection.*resident_registration_number" log/bplexica_*.log

Privacy Protection Settings

The privacy protection feature is enabled by default, and the following paths are excluded:

/docs - API documentation
/redoc - API documentation (ReDoc)
/openapi.json - OpenAPI schema
/favicon.ico - Favicon

Privacy Middleware Features

Priority Processing: Executed as the very first step of all requests
Lossless Masking: Maintains the structure and form of the original data as much as possible
Performance Optimization: Fast regex matching and efficient string processing
Detailed Auditing: Records all personal information detection and masking processes in logs

📄 License

This project is distributed under the MIT License. See the LICENSE file for more information.

🙏 Acknowledgements

FastAPI - Modern Python web framework
LangChain - LLM application development framework
SearXNG - Privacy-focused meta search engine
OpenAI - GPT model provider

📞 Contact

If you have questions or suggestions about the project, please create an issue or contact us.

BPlexica - A smarter search experience with AI

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt

sh2orc/bplexica

Folders and files

Latest commit

History

Repository files navigation