This repository contains the code and resources necessary to implement the techniques described in the paper A Method to Scale-Up Interpretative Qualitative Analysis, with an Application to Aspirations in Cox’s Bazaar, Bangladesh. The iQual package is designed for qualitative analysis of open-ended interviews and aims to extend a small set of interpretative human-codes to a much larger set of documents using natural language processing. The package provides a method for assessing the robustness and reliability of this approach. The iQual package has been applied to analyze 2,200 open-ended interviews on parent’s aspirations for children from Rohingya refugees and their Bangladeshi hosts in Cox’s Bazaar, Bangladesh. It draws on work in anthropology and philosophy to expand conceptions of aspirations in economics to distinguish between material goals, moral and religious values, and navigational capacity—the ability to achieve goals and aspirations, showing that they have very different correlates.
With iQual, researchers can efficiently analyze large amounts of qualitative data while maintaining the nuance and accuracy of human interpretation.
- To install
iQualusing pip, use the following command:
pip install -U iQual- Alternatively, you can install
iQualfrom source. To do so, use the following commands:
git clone https://github.com/worldbank/iQual.git
cd iQual
pip install -e .iQual requires Python 3.7+ and the following dependencies:
iQual is a package designed for qualitative analysis of open-ended interviews. It allows researchers to efficiently analyze large amounts of qualitative data while maintaining the nuance and accuracy of human interpretation.
-
Customizable pipelines using scikit-learn pipelines
-
Text-vectorization using:
- Any of the scikit-learn text feature extraction method.
- Any sentence-transformers compatible model.
- Any spaCy model with a
doc.vectorattribute.
-
Classification using any scikit-learn classification method
-
Feature Transformation:
-
Dimensionality reduction using any scikit-learn
decompositionmethod, or UMAP using umap-learn. -
Feature scaling using any scikit-learn
preprocessingmethod.
-
-
Model selection and performance evaluation using scikit-learn methods.
-
Model performance evaluation using scikit-learn metrics.
-
Tests for bias and interpretability, with statsmodels.
The following code demonstrates the basic usage of the iQual package. It shows how to construct a pipeline, fit it to the data, and use it to classify new data.
Import the iqual package and initiate the model class.
from iqual import iqualnlp # Import `iqualnlp` from the `iqual` package
iqual_model = iqualnlp.Model() # Initiate the model classAdd text features to the model. The add_text_features method takes the following arguments:
question_col: The name of the column containing the question text.answer_col: The name of the column containing the answer text.model: Name of a scikit-learn, spaCy, sentence-transformers, or a precomputed vector (picklized dictionary) model. The default isTfidfVectorizer.env: The environment or package which is being used. The default isscikit-learn. Available options arescikit-learn,spacy,sentence-transformers, andsaved-dict.**kwargs: Additional keyword arguments to pass to the model.
# Use a scikit-learn feature extraction method
iqual_model.add_text_features(question_col,answer_col,model='TfidfVectorizer',env='scikit-learn')
# OR - Use a sentence-transformers model
iqual_model.add_text_features(question_col,answer_col,model='all-mpnet-base-v2',env='sentence-transformers')
# OR - Use a spaCy model
iqual_model.add_text_features(question_col,answer_col,model='en_core_web_lg',env='spacy')
# OR - Use a precomputed vector (picklized dictionary)
iqual_model.add_text_features(question_col,answer_col,model='qa_precomputed.pkl',env='saved-dict') (OPTIONAL) Add a feature transformation layer. The add_feature_transformer method takes the following arguments:
name: The name of the feature transformation layer.transformation: The type of transformation. Available options areFeatureScalerandDimensionalityReduction.
To add a feature scaling layer, use the following code:
iqual_model.add_feature_transformer(name='Normalizer', transformation="FeatureScaler") # or any other scikit-learn scalerTo add a dimensionality reduction layer, use the following code:
iqual_model.add_feature_transformer(name='UMAP', transformation="DimensionalityReduction") # supports UMAP or any other scikit-learn decomposition methodAdd a classifier layer. The add_classifier method takes the following arguments:
name: The name of the classifier layer. The default isLogisticRegression.**kwargs: Additional keyword arguments to pass to the classifier.
iqual_model.add_classifier(name = "LogisticRegression") # Add a classifier layer from scikit-learn(OPTIONAL) Add a threshold layer for the classifier using add_threshold
iqual_model.add_threshold() # Add a threshold layer for the classifier, recommended for imbalanced dataCompile the model with compile.
iqual_model.compile() # Compile the modelFit the model to the data using fit. The fit method takes the following arguments:
X_train: The training data. (pandas dataframe)y_train: The training labels. (pandas series)
iqual_model.fit(X_train,y_train) # Fit the model to the dataPredict the labels for new data using predict. The predict method takes the following arguments:
X_test: The test data. (pandas dataframe)
y_pred = iqual_model.predict(X_test) # Predict the labels for new dataFor examples on cross-validation fitting, model selection & performance evaluation, bias, interpretability and measurement tests, refer to the notebooks folder.
The notebooks folder contains detailed examples on using iQual. The notebooks are organized into the following categories:
-
Basic Modelling These notebooks demonstrates the basic usage of the package, the pipeline construction, and the vectorization and classification options.
-
Advanced Modelling These notebooks demonstrate advanced pipeline construction, mixing and matching of feature extraction and classification methods, and model selection.
-
Interpretability These notebooks demonstrate the interpretability and related tests for measurement and comparison of interpretability across human and enhanced (machine + human) codes.
-
Bias and Efficiency These notebooks demonstrate the bias and efficiency tests for determining the value and validity of enhanced codes.
If you use this package, please cite the following paper:
Ashwin,Julian; Rao,Vijayendra; Biradavolu,Monica Rao; Chhabra,Aditya; Haque,Arshia; Khan,Afsana Iffat; Krishnan,Nandini.
A Method to Scale-Up Interpretative Qualitative Analysis, with an Application to Aspirations in Cox’s Bazaar, Bangladesh (English). (Policy Research Working Paper No. WPS 10046)
Paper is funded by the Knowledge for Change Program (KCP) Washington, D.C. : World Bank Group.
http://documents.worldbank.org/curated/en/099759305162210822/IDU0a357362e00b6004c580966006b1c2f2e3996