Skip to content

Added the ImmuScope class II prediction algorithm#1371

Open
ldhtnp wants to merge 19 commits intogriffithlab:7.0.0from
ldhtnp:add-immunoscope
Open

Added the ImmuScope class II prediction algorithm#1371
ldhtnp wants to merge 19 commits intogriffithlab:7.0.0from
ldhtnp:add-immunoscope

Conversation

@ldhtnp
Copy link
Copy Markdown
Contributor

@ldhtnp ldhtnp commented Mar 7, 2026

I created a fork of the original ImmuScope repository to refine it for use in pVACtools. This involved creating a wrapper to allow ImmuScope to output an immunogenicity score given a peptide + hla pair. I based the implementation off of BigMHC_IM and updated the documentation.

This tool was suggested by Malachi in issue #1330

@ldhtnp ldhtnp marked this pull request as draft March 7, 2026 16:46
@susannasiebert susannasiebert linked an issue Mar 11, 2026 that may be closed by this pull request
@ldhtnp ldhtnp marked this pull request as ready for review March 13, 2026 18:45
@susannasiebert susannasiebert changed the base branch from staging to 7.0.0 March 13, 2026 18:47
Copy link
Copy Markdown
Contributor

@susannasiebert susannasiebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few small suggested changes to the code itself.

I would like to see the output from this tool as an input file to the all class ii output parser test. To create this input file you will need to run ImmuScope on the 1-200 fasta chunk of HCC1395 data using allele DRB1*04:05 and length 12. I can send you this file.

The all class ii output file created by this test should then be used as the updated input to the all class ii aggregate report creation test. Updating these two tests will ensure that ImmuScope gets parsed correctly and its detailed data included in the metrics file. Edit to add: I just realized that this test is not available in your branch. I added it as part of #1376 so you can disregard this comment.

Have you run HCC1395 on this PR? If not, I can make a test docker container once these updates have been made and start a run. I like to load to results into pVACview to check that everything looks as expected.

Comment thread docs/install.rst Outdated
Comment thread predictor_tests/test_call_iedb.py
Comment thread pvactools/lib/prediction_class.py Outdated
tmp_input_file.write("allele\tpeptide\tseq_num\tstart\n")
tmp_input_file = tempfile.NamedTemporaryFile('w', dir=tmp_dir, delete=False, newline='')
writer = csv.writer(tmp_input_file, delimiter='\t', lineterminator='\n')
writer.writerow(["allele", "peptide", "seq_num", "start"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required by the predictor to add the seq_num and start to the input file? If not I think these columns can be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are currently required by the predictor/wrapper interface as implemented. If you would rather them be excluded, I can update the fork to make these optional

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think we could generate the file with these columns filled in by creating it in the same block of code above where we read in the fasta file (line 878+). The determine_neoepitopes method returns a hash with the start position as the key and the epitope as the value. The fasta sequence header can be used as the seq_num.

I assume that the output includes these two columns as well so that would then save us from having to map back each epitope to it's seq num and start position (line 934+). This would be at the expense of potentially having duplicate epitopes in that file if there are repetitive regions etc which could make ImmunoScope slower (not sure if they accounted for this).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit that keeps the deduped peptide set for scoring, but captures seq_num and start during the initial FASTA parsing and then merges them back onto the ImmuScope output. This lets us drop the remapping loop while still preserving those fields cleanly.

The performance of ImmuScope would be impacted if we passed every epitope occurrence directly to the wrapper with seq_num/start filled in, since it would score duplicates instead of just unique peptides. This approach avoids that by keeping the input deduplicated and only expanding back afterward.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if Immuscope was being smart and deduplicates epitopes on their end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate ImmuScope for class II predictions

2 participants