Skip to content
Xiaotao JIANG (姜小濤) edited this page Apr 10, 2018 · 9 revisions

This page include the frequently asked questions:

1. Why the usearch stop working as out of memory?

A: As 32 bit usearch only work with small data sets with memory requirement less than 4Gbs, if the memory exceeds 4Gbps, 64 bit usearch should be used, or users can split their input files into smaller ones. We suggest users to divide the big input fastq files into smaller ones and finally merge all the output matrix of ARGs abundance to get the final results. Under linux system using split -n 10 will divide the input files into ten parts and then users can rename different parts to make it appropriate for ublastx_stage_one. At current stage user need to buy the 64 bit usearch to remove the memory limitation problem. In the version 2 updation, we added and option -s for big data to solve the out of memory problem.

2. Where to download the CARD and ARDB database?

A: The original sequence files should be download by users from the websites from CARD and ARDB. The links are CARD and ARDB, respectively.

3. Whether Ublastx could process single end fastq files or not?

A: Currently Ublastx could only process pair-end metagenomics sequences. There are optional ways to process single end sequences. As Ublastx currently do not consider the pair-end relationship and process reads separately, one way is that users can split the single end sequences into two files and pretend that the two separated files are a pair, the pipeline could run without any problem.

4. What are those output files in Ublastx stage one?

A: Some new users may be confused about the contents of each generated file in stage one pipeline:

extracted.fa                      Final extracted ARGs-like reads Fasta format                                         
meta_data_online.txt              Output meta data information for stage two                   
STAS_1.16s                        search of 16S reads output BLAST m6 tabular output format for 1.fastq 
STAS_2.16s                        search of 16S reads output BLAST m6 tabular output format for 2.fastq              
STAS_1.us                         search of SARG database output for 1.fastq
STAS_2.us                         search of SARG database output for 2.fastq
STAS.16s_1v6.us                   search of 16S hyper variable region (HVR) database (currently only support V6) for 1.fq         
STAS.16s_2v6.us                   search of 16S hyper variable region database (currently only support V6) for 2.fq
STAS.extract_1.fa                 extracted ARGs-like sequences from 1.fastq
STAS.extract_2.fa                 extracted ARGs-like sequences from 2.fastq
STAS.16s_hyperout.txt             extracted 16S hyper variable region reads fasta format               
STAS.16s_hvr_community.txt        The microbial community structure information derived by assignment of the extracted HVR sequences; the quantification is absolute abundance of that sample, fragment sequences are counted by the ratio of the fragment         
STAS.16s_hvr_normal.copy.txt      The calculated average copy number using community information and amplicon CopyRighter database  
ublastx_bash_Mon-Feb-1-16:20:59-2016.sh   the shell file contains all the commands running in the pipeline 

Clone this wiki locally