Skip to content

Intent classification framework based on Dempster-Shafer (DS) theory, designed for uncertainty-aware, hierarchical decision-making

Notifications You must be signed in to change notification settings

ksauka/DS-Model2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Interactive Intent Classification via Dempster-Shafer Theory

We propose an extensible intent classification framework based on Dempster-Shafer (DS) theory, designed for uncertainty-aware, hierarchical decision-making. The framework is implemented entirely using Jupyter notebooks and organized around benchmark datasets such as CLINC150, BANKING77, SNIPS, ATIS, and TOPv2, with experiments structured by base classifier (Logistic Regression or SVM). It supports multi-turn clarification, belief propagation, and threshold optimization, enabling robust classification even when input is ambiguous or incomplete.


Project Structure

experiment/
├── ATIS/
│   ├── logistic/          # Jupyter notebook for Logistic Regression-based DS classification
│   └── svm/               # Jupyter notebook for Support Vector Machine(SVM)-based DS classification
├── SNIPS/
│   ├── logistic/
│   └── svm/
├── CLINC150/
│   ├── logistic/
│   └── svm/
├── BANKING77/
│   ├── logistic/
│   └── svm/
├── TOPv2/
│   ├── logistic/
│   └── svm/



┌──────────────────────────────────────────────────────────────────────────────┐
│   Dempster-Shafer Conformal Intent Classification System                     │
├───────────────────────────────┬──────────────────────────────────────────────┤
│        Core Architecture      │        Interactive Evidence Refinement       │
│                               │                                              │
│  ┌────────────────────────┐   │                                              │
│  │     User Query         │   │  ┌───────────────────────────────────────┐   │
│  └──────────┬─────────────┘   │  │ “I may have lost my card...”          │   │
│             ▼                 │  └───────────────────────────────────────┘   │
│  ┌────────────────────────┐   │                                              │
│  │ Sentence Embedding     │   │  ┌───────────────────────────────────────┐   │
│  │ (LLM: intfloat/e5-base)│   │  │ Conformal Check + DS Mass Calculation │   │
│  └──────────┬─────────────┘   │  │ • belief(m), pl(m), similarity calc.  │   │
│             ▼                 │  └──────────┬────────────────────────────┘   │
│  ┌────────────────────────┐   │             ▼                                │
│  │ Base Classifier        │   │  ┌───────────────────────────────────────┐   │
│  │ Logistic / SVM         │   │  │ Threshold Evaluation                  │   │
│  └──────────┬─────────────┘   │  │ • Bel ≥ 1-α?                          │   │
│             ▼                 │  └──────────┬────────────────────────────┘   │
│  ┌────────────────────────┐   │             ▼                                │
│  │ Mass Function Generator│   │  ┌───────────────────────────────────────┐   │
│  │ (DS Framework)         │   │  │ Clarification Generation              │   │
│  └──────────┬─────────────┘   │  │ • "Bel(lost)=0.6, Pl(comp)=0.3..."    │   │
│             ▼                 │  └──────────┬────────────────────────────┘   │
│  ┌────────────────────────┐   │             ▼                                │
│  │ Belief Computation     │   │  ┌───────────────────────────────────────┐   │
│  │ Hierarchy-Aware        │   │  │ User Response Processing              │   │
│  └──────────┬─────────────┘   │  │ • Confidence Update                   │   │
│             ▼                 │  │ • Mass Function Update                │   │
│  ┌────────────────────────┐   │  │   (entity_mass.py, negation_handler)  │   │
│  │ Final Intent Decision  │   │  └──────────┬────────────────────────────┘   │
│  │ Bel ≥ threshold?       │   │             ▼                                │
│  └──────────┬─────────────┘   │  ┌───────────────────────────────────────┐   │
│             │ Yes              │  │ Recheck Final Intent Decision        │   │
│             ▼                 ▼  │ • Bel ≥ 1-α OR Pl - Bel < ε           │   │
│  ┌────────────────────────┐   │  └───────────────────────────────────────┘   │
│  │ Confirmed Intent       │   │                                              │
│  └────────────────────────┘   │                                              │
└───────────────────────────────┴──────────────────────────────────────────────┘


Usage Examples

Each dataset (‘CLINC150,’ ‘BANKING77’, ‘SNIPS’, ‘ATIS’, ‘TOPv2’) contains two subfolders:

  • logistic/: Logistic Regression-based DS experiments
  • svm/’: SVM-based DS experiments

To start:

cd experiment/CLINC150/logistic

1. Train Classifier

  • Uses intfloat/e5-base to generate:
  • train_embeddings = model.encode(train_texts)
  • test_embeddings = model.encode(test_texts)
  • Trains LogisticRegression or SVM classifier
  • Evaluates classification accuracy

2. Save Artifacts

  • classifier.pkl: Trained classifier

3. Customer Agent Simulation

  • Sample queries from the test set
  • Responds to clarification prompts

4. DS Mass Function Execution

  • Computes initial mass from classifier output
  • Adds entity-based mass (if available)
  • Integrates clarification-derived mass (if needed)

5. Belief Propagation

  • Belief values are recursively computed over the hierarchy

6. Clarification Handling

  • If belief is inconclusive:
  • Generate a clarification query
  • Update mass based on user response
  • Recalculate belief distribution

7. Hierarchy Structure:

  • Defined hierarchy Hierarchy
  • Create hierarchy descriptions based on the training data text

8. Threshold Optimization


Deciding Thresholds For Different Intents

1. Initial thresholds (adjustable parameters):

  • For leaves: 0.05
  • For non-leaf nodes: 0.1

2. Find the best thresholds from training data as follows:

 For any INTENT in a set of intents, generate training data as follows:
       For any query:
           if INTENT is true intent of query or the parent of the true intent:
                Classify query as positive
           else:
                Classify query as negative
           Run the algorithm with 0 clarification queries and obtain mass function MASS on the set of intents
       Find the best threshold on MASS values of INTENT for classification of queruis as positive and negative (Minimizing loss for classifictaion)

Configuration

Customize these cells for your use case:

1. hierarchical_intents and hierarchy

  • Intent relationship definitions
  • Domain-specific hierarchy

2. load dataset and Optimal Thresholds

  • Dataset preprocessing options
  • Custom split configurations
  • Domain-specific calculation

3. DSMassFunction/.chatbot_question

  • Clarification question phrasing

4. Customer Agent api_key configuration

  • modify .env file

Requirements

Core dependencies (see requirements.txt for full list):

Python 3.8+
pandas >= 1.3.0
numpy >= 1.21.0
transformers >= 4.12.0
spacy >= 3.0.0

Install with:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

About

Intent classification framework based on Dempster-Shafer (DS) theory, designed for uncertainty-aware, hierarchical decision-making

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published