LaMP-Cap: Personalized Scientific Figure Captioning Dataset

This is the Github repo of the arXiv preprint paper, LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles.

LaMP-Cap was built based on SciCap Challenge dataset, designed specifically for personalized and context-aware scientific figure caption generation. Unlike traditional caption datasets, LaMP-Cap provides not only target figures and their metadata, but also profile figures from the same scientific paper, enabling research into leveraging multimodal context for improved captioning.

LaMP-Cap is intended for non-commercial research only and is released under the CC BY-NC-SA 4.0 license. By using LaMP-Cap, you agree to the terms in the license.

How to Cite?

@inproceedings{ng2025lamp,
  title= "LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles",
  author= "Ng, 'Sam' Ho Yin and Hsu, Ting-Yao and Anantha Ramakrishnan, Aashish and Kveton, Branislav and Lipka, Nedim and Dernoncourt, Franck and Lee, Dongwon and Yu, Tong and Kim, Sungchul and Rossi, Ryan A and Huang, 'Kenneth' Ting-hao",
  booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
  month = nov,
  year = "2025"
}

Note: The paper is acepted to EMNLP 2025 Findings. The final BibTeX citation will be available upon publication. In the meantime, please cite the pre-print version.

Dataset Description

LaMP-Cap is curated from the SciCap Challenge Dataset and focuses on personalized captioning, where profile figures (related images and captions from the same paper) provide rich context for the target figure. This design supports the study of context-aware and user-personalized caption generation in scientific domains.

Download the SciCap Challenge Dataset

You can dowload the SciCap Challenge dataset from Hugging Face here: Download Link.

Our LaMP-Cap data is based on metadata files from the SciCap Challenge dataset. Each arXiv paper groups its figures together. We randomly picked one figure per paper as the target, and the rest became profile figures. Our metadata only includes papers with at least two figures (so each target has at least one profile figure). This means only part of the original SciCap dataset is included in our data. Our metadata files help filter and organize figures for personalized caption generation in our use case.

So in this github you can find the following:

Folder Structure:

.
├── README.md
├── img                                   # Contains related table/figures from our arXiv paper
├── metadata                              # Contains the annotations for Dataset splits
    ├── train-metadata.json               #target-profile pairing metadata for training set
    ├── test-metadata.json                #target-profile pairing metadata for test set 
    └── val-metadata.json                 #target-profile pairing metadata for validation set

Example Data Instance:

An actual JSON object from LaMP-Cap:

{
  "arXiv_id": 1707.05196,
  "categories": "physics.acc-ph",
  "target": {
    "image_id": 757913,
    "caption_id": 1092105,
    "caption_length": 35,
    "figure_type": "Graph Plot"
  },
  "profile": [
    {
      "image_id": 501525,
      "caption_id": 835717,
      "caption_length": 42,
      "figure_type": "Graph Plot"
    },
    {
      "image_id": 519953,
      "caption_id": 854145,
      "caption_length": 33,
      "figure_type": "Graph Plot"
    },
    {
      "image_id": 586922,
      "caption_id": 921114,
      "caption_length": 14,
      "figure_type": "Graph Plot"
    }
  ]
}

JSON Schema:

arXiv_id: Unique identifier for the paper (from arXiv).
categories: arXiv primary category of the source paper (e.g., "physics.acc-ph")
target: Metadata for the target figure to be captioned:
- image_id: Figure image ID (matches SciCap).
- caption_id: Caption ID (matches SciCap).
- caption_length: Number of tokens in the caption.
- figure_type: Type of figure (e.g., "Graph Plot").
profile: List of profile figures providing context for personalized captioning.
- Each entry contain the same fields as target.

Baseline Performance

The caption quality was measured by BLEU and ROUGE score, using the test set of the corresponding data collection as a reference. We measure the data for the similarity test from the generated caption against the original caption. We also measure the performance with variations such as no profile, 1 profile and all profiles.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and inherits the licensing terms of the SciCap Challenge Dataset.

LaMP-Cap is only for non-commercial use, and is released under CC BY-NC-SA 4.0. By using LaMP-Cap, you agree to the terms in the license.

The original SciCap dataset is based on the arXiv dataset, which uses the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license for the metadata, which grants permission to remix, remake, annotate, and publish the metadata.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LaMP-Cap: Personalized Scientific Figure Captioning Dataset

How to Cite?

Dataset Description

Download the SciCap Challenge Dataset

Folder Structure:

Example Data Instance:

JSON Schema:

Baseline Performance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
img		img
metadata		metadata
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LaMP-Cap: Personalized Scientific Figure Captioning Dataset

How to Cite?

Dataset Description

Download the SciCap Challenge Dataset

Folder Structure:

Example Data Instance:

JSON Schema:

Baseline Performance

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages