Skip to content

Conversation

@lionel42
Copy link

@lionel42 lionel42 commented Nov 6, 2025

Hello,

We are the laboratory for Air Pollution of Empa and we would like to contribute to MassBank with our spectras.

I wanted to test the format locally but ran into issues with the check software. see MassBank/MassBank-web#414 and MassBank/MassBank-web#413

This is just a draft for now, we have hundreds of spectra to upload, but we wanted first to ask about the format and the metatdata.

I created names and identifiers for our lab: EAP for Empa Air Pollution

Happy to receive any feedback ;)

@lionel42
Copy link
Author

lionel42 commented Nov 6, 2025

I have opened an issue for asking help.

I will be away for one week (holidays) so I will continue working on this later on.

@lionel42
Copy link
Author

lionel42 commented Nov 6, 2025

One point that i find wierd is that the validator seems to not like the Accession strings

@schymane
Copy link
Member

schymane commented Nov 6, 2025

One point that i find wierd is that the validator seems to not like the Accession strings

You can find the details about how to construct the Accession IDs here:
https://github.com/MassBank/MassBank-web/blob/main/Documentation/MassBankRecordFormat.md#2.1.1

It appears that you've put the name in the Accession, whereas we expect a number, e.g.: ACCESSION: MSBNK-AAFC-AC000101

@schymane
Copy link
Member

schymane commented Nov 6, 2025

I have opened an issue for asking help.

I will be away for one week (holidays) so I will continue working on this later on.

Please note that we have detailed record specifications to help explain what is needed in the various record entries:
https://github.com/MassBank/MassBank-web/blob/main/Documentation/MassBankRecordFormat.md#table-1--massbank-record-format-summary
...and then lots of details and examples in subsequent subsections.

It seems from the validation output that at least one other compulsory field is missing: AC$INSTRUMENT

The IPB Halle team are at BioHackEU25 this week, so they are a bit distracted, but will look into this once they are back.

@lionel42
Copy link
Author

@schymane Thanks for the answers, I managed to fix the format of our files.

Before I add the whole library, is it possible to confirm/register our laboratory and the prefix ? do you need any additional information from our side ?

@meier-rene
Copy link
Collaborator

Hi Lionel,
do you consider this contribution as complete? At the moment there are just two little issues left. One space too much and an empty table with peak annotations which needs to go. If yes, I can finish this minor things and merge your contribution. We also maintain a table with our contributors: https://github.com/MassBank/MassBank-data/blob/dev/List_of_Contributors_Prefixes_and_Projects.md. It would be welcome if you tell me what you want to see there or I will guess something for you.
Best, Rene

@lionel42
Copy link
Author

Hi Lionel, do you consider this contribution as complete? At the moment there are just two little issues left. One space too much and an empty table with peak annotations which needs to go. If yes, I can finish this minor things and merge your contribution. We also maintain a table with our contributors: https://github.com/MassBank/MassBank-data/blob/dev/List_of_Contributors_Prefixes_and_Projects.md. It would be welcome if you tell me what you want to see there or I will guess something for you. Best, Rene

Hi Rene,

Thanks for reaching out,

we would still need more time (we want to go manually though all files to do a quality check.
Also we build them automatically, so I will try to fix the 2 issues in our code.

About the table of contributors, we discussed and suggest the following :

  • Database: Empa_Air_Pollution
  • Research Group / Research Project: Empa - Laboratory for Air Pollution / Environmental Technology
  • Country: Switzerland
  • Prefix of ID: EAP
  • Project Tag: HALOHUNTER

I had initially also changed in the file in the PR, should I do it this way or do you want to update it from a separate PR ?

We will notify you when ready to merge ;)

AC$CHROMATOGRAPHY: KOVATS_RTI 818
PK$SPLASH: splash10-000t-9000000000-90ef1466a5c67cf33c97
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
49.98421 1 49.99178 151.43 H3CCl+ 0.76

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete or review. unlikely to be H3CCl+ from structure and if so missing isotope signal

69.94142 1 69.93716 -60.95 Cl2+ 1.00
71.93848 1 71.93421 -59.40 Cl[37Cl]+ 0.77
81.94018 1 81.93716 -36.89 CCl2+ 1.00
83.94540 1 83.93421 -133.34 CCl[37Cl]+ 0.60

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is weird, twice in a row 83.94540 m/z with two different assignments? Also, NIST spectrum has strong signal at 83 m/z HCCl2, but here it is absent?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are generally more than one formula assigned to a given mass? I see that up to 3 formulas assigned per formula for this compound (some other compounds have up to 4 assignments. Do we want that? Seems weird to me

81.94018 1 81.93716 -36.89 CCl2+ 1.00
83.94540 1 83.93421 -133.34 CCl[37Cl]+ 0.60
83.94540 2 83.95281 88.23 H2CCl2+ 0.86
85.94626 1 85.94986 41.85 H2CCl[37Cl]+ 1.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIST has a strong 85 m/z signal (maybe H3CCl2)...but it is absent here? --> OH no I see that the unassigned peaks are listed separately below...but this seems silly to me that two of the most abundant peaks 83 and 85 m/z are not assigned and not listed here...

93.93877 1 93.93716 -17.17 C2Cl2+ 0.57
94.94653 1 94.94498 -16.31 HC2Cl2+ 1.00
95.95457 1 95.94834 -64.96 HC[13C]Cl2+ 0.01
95.95457 2 95.95281 -18.37 H2C2Cl2+ 1.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is number 1 and number 2 assignment decided? It looks like intensity_fraction shows how much of the mass is assignable to the formula, wouldn't it make more sense to have the higher intensity_fraction assigned as 1?

AC$CHROMATOGRAPHY: KOVATS_RTI 566
PK$SPLASH: splash10-002o-9000000000-17c33adb4eb05f58d77f
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
23.98798 1 0.00000 0.00 - 0.00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's happening here? something seems wrong...

AC$CHROMATOGRAPHY: KOVATS_RTI 396
PK$SPLASH: splash10-0udi-3900000000-5d7701f39c27b4d50277
PK$ANNOTATION: m/z formula_count exact_mass error(ppm) tentative_formula intensity_fraction
42.99847 1 42.99785 -14.31 C2F+ 0.98

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so many peaks and so few assignments, what's going on here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but weird how the ppm error is so much worse for the higher masses --> maybe need to redo postprocessing calibration for this spectrum?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Copy link

@Alina-beal Alina-beal Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare to E isomer

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, BUT 64.01073 H2C2F2+, 51.00434 HCF2+ missing the 13C isotope peak

Copy link

@Alina-beal Alina-beal Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HFO-1234zeE

grafik

compare to HFO-1234yf:
grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but why does 83.00812 H2C2F3+ not have 13C-isotope peak?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

compare to HFO-1336mzzZ Z-isomer
grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. E- and Z- isomers should probably be identifical. If there are differences that might scew i.d. of one or the other in pipeline, but really shouldn't be able to tell from spectrum

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, see E-isomer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok BUT what are those missing peaks at ca. 30 m/z?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok BUT why missing 13C-isotope signals??? re-extract?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some missing 13C isotope signals and what are those very high masses at low intensity?
232.9605
251.95248
253.94708
269.90825

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably missing ions because of notching?

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only see double ions? maybe retract this spectrum

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing 33 m/z ion from F+, filtered out?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but missing some 13C isotope signals

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we amend that it is PFC-318 for the mygu spectrum to be consistent with NIST? that's also how we have it stored on the instruments

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

@Alina-beal
Copy link

There are quite a few spectra missing from the MC-2021A and B batches:
bromoethane, c-1,1,2,2-tetrafluoropropane, 1,1-dichloro-2,2-difluoroethene (CFO-1112a), 1,2-dichloro-1,2-difluoroetheneE (CFO-1112E), bromofluoromethane (can't find in the mygu_db for some reason..), 1-chloro-2-fluoroetheneZ (HCFO-1131Z), etc.

Can you double check that

  1. all the spectra in mygu_db are also uploaded here, excluding of course the Mr_ and Ms_
  2. if there are any spectra in mygu_db missing that are listed in the Gaspro_RT file as having a reference standard in the column "confirmed"

Copy link

@Alina-beal Alina-beal Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suprised there is no nist....

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

91.92426 CHBr and 63.00276 CH2F2 are a little more off 20 ppm should also be annotated as large peaks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some missing 13C isotope signals and what are those very high masses at low intensity?
232.9605
251.95248
253.94708
269.90825

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only see double ions? maybe retract this spectrum

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing 33 m/z ion from F+, filtered out?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but missing some 13C isotope signals

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the double 37Cl isotope peak from 93.93638 C2Cl2+ at +4 and all the Cl-containing signals below m/z 70 are missing their 37Cl isotope peaks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing 13C isotope signals

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing masses below 68 m/z??

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing most 13C isotope signals

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing 34S isotope signals for lower masses

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing some 13C isotope signals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants