MRE-MI: A Multi-image Dataset for Multimodal Relation Extraction in Social Media Posts

The pytorch implementation of "MRE-MI: A Multi-image Dataset for Multimodal Relation Extraction in Social Media Posts" and publication of the MRE-MI dataset (NAACL 2025).

MRE-MI is the abbreviation of Multimodal Relation Extraction with Multiple Images, a MRE task with multi-image scenarios.

The repository contains the dataset MRE-MI proposed in the paper as well as an implementation of the GLRA framework.

The examples of MRE-MI are shown in the following figure:

The framework of GLRA is shown in the following figure:

Enviroment

We recommend the following actions to create the environment:

conda create -n  MRE-MI python==3.9.19
conda activate MRE-MI
pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu121
pip install transformers ==4.51.3
pip install tokenizers ==0.21.1

Dataset Download & Introduction

In this paper, we introduce a dataset called MRE-MI.

You can download this dataset via Baidu Netdisk, and the download links are as follows: https://pan.baidu.com/s/1R8J7FDmMdp8jRNpdmmktLg?pwd=83xh.

After downloading, you need to unzip the downloaded files and place them in the dataset folder, and the file structure should appear as follows:

dataset
|-- images  # images
|-- images_vg # images acquired through visual grounding tools
|-- images_vg.pth # dictionary corresponding to images and vg_images
|-- train.txt # training set
|-- val.txt # validation set
|-- test.txt # test set

The format of each instance in text file is as follows:

{"id": "11573", "tokens": ["Jason", "Isaacs", "and", "Emma", "Hewitt", "Married", "for", "21", "years", ",", "happy", "and", "in", "love", ".", "(", "Getty", "Image", ")"], "h": {"name": "Jason Isaacs", "pos": [0, 2]}, "t": {"name": "Emma Hewitt", "pos": [3, 5]}, "images": ["11573_0.jpg"], "relation": "/per/per/couple"}

The id represents the data identifier, tokens contains the words in text, h is the head entity name and its position, t is the tail entity name and its positionm images corresponds to the images associated with the instance, relation represents the label.

Required pre-trained models

In our paper, we use BERT and VIT as text encoder and image encoder, respectively.

For the code implementation, we utilized the models and weights provided by Hugging Face. The respective Hugging Face links are: https://huggingface.co/google-bert/bert-base-uncased and https://huggingface.co/google/vit-large-patch32-384.

Running

After you have prepared the required enviroment, data, and models, you can run python run.py to train a model. You should re-search the hyperparameters to obtain the best results.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
MRE-MI/dataset		MRE-MI/dataset
models		models
modules		modules
README.md		README.md
example.png		example.png
framework.png		framework.png
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MRE-MI: A Multi-image Dataset for Multimodal Relation Extraction in Social Media Posts

Enviroment

Dataset Download & Introduction

Required pre-trained models

Running

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MRE-MI: A Multi-image Dataset for Multimodal Relation Extraction in Social Media Posts

Enviroment

Dataset Download & Introduction

Required pre-trained models

Running

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages