sign-language-processing · AmitMY · Apr 28, 2026 · Apr 28, 2026
diff --git a/src/index.md b/src/index.md
@@ -438,6 +438,7 @@ retaining the original sign language content.
 Using a conditional variational autoencoder framework, they first extracted pose information from the source video to remove the original signer appearance,
 then generated a photo-realistic sign language video of a novel appearance from the pose sequence. 
 The authors proposed a novel style loss that ensures style consistency in the anonymized sign language videos. 
+Extending this line of work, @xia-etal-2024-diffusion proposed DiffSLVA, which leverages pre-trained large-scale text-guided latent diffusion models with ControlNet conditioned on Holistically-Nested Edge (HED) maps to circumvent the need for accurate pose estimation, and adds a dedicated facial expression enhancement module to preserve linguistically essential non-manual features.
 
 ##### Sign Language Avatars
 

diff --git a/src/references.bib b/src/references.bib
@@ -4354,6 +4354,15 @@ @inproceedings{uchida-etal-2024-hamnosys
     author = "Uchida, Tsubasa  and
       Miyazaki, Taro  and
       Kaneko, Hiroyuki",
+}
+
+@inproceedings{xia-etal-2024-diffusion,
+    title = "Diffusion Models for Sign Language Video Anonymization",
+    author = "Xia, Zhaoyang  and
+      Zhou, Yang  and
+      Han, Ligong  and
+      Neidle, Carol  and
+      Metaxas, Dimitris N.",
     editor = "Efthimiou, Eleni  and
       Fotinea, Stavroula-Evita  and
       Hanke, Thomas  and
@@ -4368,3 +4377,7 @@ @inproceedings{uchida-etal-2024-hamnosys
     url = "https://aclanthology.org/2024.signlang-1.42/",
     pages = "376--385"
 }
+
+    url = "https://aclanthology.org/2024.signlang-1.44/",
+    pages = "395--407"
+}