Skip to content

Switch from Python 2 to Python 3 causing ints to be printed as floats #10

@cjops

Description

@cjops

Hi, I'm attempting to run the release branch of IsoAnnot on a recent version of Ensembl. I noticed that the output GFF3 contains protein lines like this:

ENST00000000233	tappAS	protein	1	180.0	.	+	.	ID=E

Where previous versions of IsoAnnot appear to have produced lines like this (judging by the Homo_sapiens_GRCh38_Ensembl_86.zip available for download):

ENST00000618881	tappAS	protein	1	409	.	-	.	ID=Q5JRM6; Desc=ENSP00000482525; PosType=P

The float representation of the 5th column causes a problem when running the IsoAnnot Lite script included with SQANTI3. I get the following error:

Running IsoAnnot Lite 2.7.3 (using all gene information) ...
Reading reference annotation file and creating data variables...
Traceback (most recent call last):
  File "/.../workflow/tools/SQANTI3_v5.5.4/src/utilities/IsoAnnotLite_SQ3.py", line 3223, in <module>
    main()
  File "/.../workflow/tools/SQANTI3_v5.5.4/src/utilities/IsoAnnotLite_SQ3.py", line 3059, in main
    run(args)
  File "/.../workflow/tools/SQANTI3_v5.5.4/src/utilities/IsoAnnotLite_SQ3.py", line 3115, in run
    dc_gene_description, dc_GFF3protein = readGFF(gff3) #dc_GFF3exons is sorted
                                          ^^^^^^^^^^^^^
  File "/.../workflow/tools/SQANTI3_v5.5.4/src/utilities/IsoAnnotLite_SQ3.py", line 352, in readGFF
    dc_GFF3protein.update({str(transcript) : [int(start),int(end)]})
                                                         ^^^^^^^^
ValueError: invalid literal for int() with base 10: '180.0'
Traceback (most recent call last):
  File "/.../workflow/tools/SQANTI3_v5.5.4/sqanti3_qc.py", line 55, in <module>
    main()
  File "/.../workflow/tools/SQANTI3_v5.5.4/sqanti3_qc.py", line 52, in main
    run_isoAnnotLite(corrGTF, outputClassPath, outputJuncPath, args.output, args.gff3)
  File "/.../workflow/tools/SQANTI3_v5.5.4/src/qc_pipeline.py", line 198, in run_isoAnnotLite
    if subprocess.check_call(ISOANNOT_CMD, shell=True)!=0:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.snakemake/conda/1e068e0518f05ee84f87b3802147bdfd_/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'python3 /.../workflow/tools/SQANTI3_v5.5.4/src/utilities/IsoAnnotLite_SQ3.py /.../output/12_isoannot/cluster_adjusted.deduped.251833.sqanti3/ensembl.114/cluster_adjusted.deduped.251833_corrected.gtf /.../output/12_isoannot/cluster_adjusted.deduped.251833.sqanti3/ensembl.114/cluster_adjusted.deduped.251833_classification.txt /.../output/12_isoannot/cluster_adjusted.deduped.251833.sqanti3/ensembl.114/cluster_adjusted.deduped.251833_junctions.txt -gff3 refs/isoannot/human_tappas_ensembl_annotation_file.114.gff3 -o /.../output/12_isoannot/cluster_adjusted.deduped.251833.sqanti3/ensembl.114/cluster_adjusted.deduped.251833 -novel -stdout /.../output/12_isoannot/cluster_adjusted.deduped.251833.sqanti3/ensembl.114/cluster_adjusted.deduped.251833.isoAnnotLite_stats.txt' returned non-zero exit status 1.

I believe the change in behavior can be tracked down to the use of a single slash for division in line 67 of the referenceSQANTI.py script:

			orf_length = cds_len/3

Since the environment has been updated from Python 2 to Python 3, a double slash would be more appropriate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions