Centos生物信息学分析

1 go注释前的比对

*** [zcb1998@localhost ~]$ cd /home/zcb1998/Desktop
[zcb1998@localhost Desktop]$ tar xzf diamond-linux64.tar.gz **
*** [zcb1998@localhost Desktop]$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/swissprot.gz
–2021-12-29 03:58:10– ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/swissprot.gz **
=> “swissprot.gz”

正在解析主机 ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)… 130.14.250.12, 130.14.250.11, 2607:f220:41e:250::13, …
正在连接 ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.12|:21… 已连接。
正在以 anonymous 登录 … 登录成功！
==> SYST … 完成。 ==> PWD … 完成。
==> TYPE I … 完成。 ==> CWD (1) /blast/db/FASTA … 完成。
==> SIZE swissprot.gz … 141082489
==> PASV … 完成。 ==> RETR swissprot.gz … 完成。
长度：141082489 (135M) (非正式数据)

100%[======================================>] 141,082,489 2.40MB/s 用时 53s

2021-12-29 03:59:07 (2.52 MB/s) - “swissprot.gz” 已保存 [141082489]

*** /home/zcb1998/Desktop/diamond help **
diamond v0.9.24.125 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU GPL https://www.gnu.org/licenses/gpl.txt
Check http://github.com/bbuchfink/diamond for updates.

Syntax: diamond COMMAND [OPTIONS]

Commands:
makedb Build DIAMOND database from a FASTA file
blastp Align amino acid query sequences against a protein reference database
blastx Align DNA query sequences against a protein reference database
view View DIAMOND alignment archive (DAA) formatted file
help Produce help message
version Display version information
getseq Retrieve sequences from a DIAMOND database file
dbinfo Print information about a DIAMOND database file

General options:
–threads (-p) number of CPU threads
–db (-d) database file
–out (-o) output file
–outfmt (-f) output format
0 = BLAST pairwise
5 = BLAST XML
6 = BLAST tabular
100 = DIAMOND alignment archive (DAA)
101 = SAM

Value 6 may be followed by a space-separated list of these keywords:

qseqid means Query Seq - id
qlen means Query sequence length
sseqid means Subject Seq - id
sallseqid means All subject Seq - id(s), separated by a ';'
slen means Subject sequence length
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
full_qseq means Query sequence
sseq means Aligned part of subject sequence
full_sseq means Subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive - scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive - scoring matches
qframe means Query frame
btop means Blast traceback operations(BTOP)
staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
qcovhsp means Query Coverage Per HSP
qtitle means Query title
qqual means Query quality values

Default: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore

–verbose (-v) verbose console output
–log enable debug log
–quiet disable console output
–header Write header lines to blast tabular format.

Makedb options:
–in input reference file in FASTA format

Aligner options:
–query (-q) input query file
–strand query strands to search (both/minus/plus)
–un file for unaligned queries
–al file or aligned queries
–unfmt format of unaligned query file (fasta/fastq)
–alfmt format of aligned query file (fasta/fastq)
–unal report unaligned queries (0=no, 1=yes)
–max-target-seqs (-k) maximum number of target sequences to report alignments for
–top report alignments within this percentage range of top alignment score (overrides –max-target-seqs)
–range-culling restrict hit culling to overlapping query ranges
–compress compression for output files (0=none, 1=gzip)
–evalue (-e) maximum e-value to report alignments (default=0.001)
–min-score minimum bit score to report alignments (overrides e-value setting)
–id minimum identity% to report an alignment
–query-cover minimum query cover% to report an alignment
–subject-cover minimum subject cover% to report an alignment
–sensitive enable sensitive mode (default: fast)
–more-sensitive enable more sensitive mode (default: fast)
–block-size (-b) sequence block size in billions of letters (default=2.0)
–index-chunks (-c) number of chunks for index processing
–tmpdir (-t) directory for temporary files
–gapopen gap open penalty
–gapextend gap extension penalty
–frameshift (-F) frame shift penalty (default=disabled)
–long-reads short for –range-culling –top 10 -F 15
–matrix score matrix for protein alignment (default=BLOSUM62)
–custom-matrix file containing custom scoring matrix
–lambda lambda parameter for custom matrix
–K K parameter for custom matrix
–comp-based-stats enable composition based statistics (0/1=default)
–masking enable masking of low complexity regions (0/1=default)
–query-gencode genetic code to use to translate query (see user manual)
–salltitles include full subject titles in DAA file
–sallseqid include all subject ids in DAA file
–no-self-hits suppress reporting of identical self hits
–taxonmap protein accession to taxid mapping file
–taxonnodes taxonomy nodes.dmp from NCBI
–taxonlist restrict search to list of taxon ids (comma-separated)

Advanced options:
–algo Seed search algorithm (0=double-indexed/1=query-indexed)
–bin number of query bins for seed search
–min-orf (-l) ignore translated sequences without an open reading frame of at least this length
–freq-sd number of standard deviations for ignoring frequent seeds
–id2 minimum number of identities for stage 1 hit
–window (-w) window size for local hit search
–xdrop (-x) xdrop for ungapped alignment
–ungapped-score minimum alignment score to continue local extension
–hit-band band for hit verification
–hit-score minimum score to keep a tentative alignment
–gapped-xdrop (-X) xdrop for gapped alignment in bits
–band band for dynamic programming computation
–shapes (-s) number of seed shapes (0 = all available)
–shape-mask seed shapes
–index-mode index mode (0=4x12, 1=16x9)
–rank-ratio include subjects within this ratio of last hit (stage 1)
–rank-ratio2 include subjects within this ratio of last hit (stage 2)
–max-hsps maximum number of HSPs per subject sequence to save for each query
–range-cover percentage of query range to be covered for hit culling (default=50)
–dbsize effective database size (in letters)
–no-auto-append disable auto appending of DAA and DMND file extensions
–xml-blord-format Use gnl|BL_ORD_ID| style format in XML output

View options:
–daa (-a) DIAMOND alignment archive (DAA) file
–forwardonly only show alignments of forward strand

Getseq options:
–seq Sequence numbers to display.

*** [zcb1998@localhost Desktop]$ /home/zcb1998/Desktop/diamond makedb –in swissprot.fa -d swissprot **##建库
diamond v0.9.24.125 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU GPL https://www.gnu.org/licenses/gpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 2
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database file: swissprot.fa
Opening the database file… [0.000323s]
Loading sequences… [1.22648s]
Masking sequences… [19.5554s]
Writing sequences… [0.297977s]
Hashing sequences… [0.080532s]
Loading sequences… [2.1e-05s]
Writing trailer… [0.006372s]
Closing the input file… [3.3e-05s]
Closing the database file… [0.000281s]
Database hash = 686d3f918d042fac076e19e4f210ff7b
Processed 478006 sequences, 181230906 letters.
Total time = 21.1677s
*** [zcb1998@localhost Desktop]$ /home/zcb1998/Desktop/diamond blastp -d swissprot -q Caomuxi.pep.fasta -o matches.m8 **##蛋白比对
diamond v0.9.24.125 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU GPL https://www.gnu.org/licenses/gpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 2
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
Opening the database… [2.8e-05s]
#Target sequences to report alignments for: 25
Opening the input file… [4.6e-05s]
Error: Error detecting input file format. First line seems to be blank.
[zcb1998@localhost Desktop]$ /home/zcb1998/Desktop/diamond blastx -d swissprot -q Caomuxi.cds.fasta -o matches.m8
diamond v0.9.24.125 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU GPL https://www.gnu.org/licenses/gpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 2
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
Opening the database… [2.7e-05s]
#Target sequences to report alignments for: 25
Opening the input file… [0.000312s]
Opening the output file… [5.8e-05s]
Loading query sequences… [0.037517s]
Masking queries…
[0.894515s]
Building query seed set… [0.025912s]
Algorithm: Double-indexed
Building query histograms… [0.069256s]
Allocating buffers… [2.3e-05s]
Loading reference sequences… [0.333567s]
Building reference histograms… [1.93306s]
Allocating buffers… [2e-05s]
Initializing temporary storage… [0.01277s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 0.
Building reference index… [2.69179s]
Building query index… [0.09718s]
Building seed filter… [0.162043s]
Searching alignments… [0.23148s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 1.
Building reference index… [2.73463s]
Building query index… [0.094026s]
Building seed filter… [0.160724s]
Searching alignments… [0.21505s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 2.
Building reference index… [3.1013s]
Building query index… [0.098308s]
Building seed filter… [0.1586s]
Searching alignments… [0.211197s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 3.
Building reference index… [2.4869s]
Building query index… [0.066303s]
Building seed filter… [0.16881s]
Searching alignments… [0.213401s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 0.
Building reference index… [2.52274s]
Building query index… [0.077635s]
Building seed filter… [0.166746s]
Searching alignments… [0.193984s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 1.
Building reference index… [2.7968s]
Building query index… [0.079784s]
Building seed filter… [0.15885s]
Searching alignments… [0.207618s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 2.
Building reference index… [3.01136s]
Building query index… [0.085289s]
Building seed filter… [0.14848s]
Searching alignments… [0.195939s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 3.
Building reference index… [2.50633s]
Building query index… [0.092534s]
Building seed filter… [0.180737s]
Searching alignments… [0.232746s]
Deallocating buffers… [0.05002s]
Computing alignments… [4.62268s]
Deallocating reference… [0.012437s]
Loading reference sequences… [0.000173s]
Deallocating buffers… [0.000118s]
Deallocating queries… [0.000254s]
Loading query sequences… [1.9e-05s]
Closing the input file… [1e-05s]
Closing the output file… [6.7e-05s]
Closing the database file… [6e-06s]
Deallocating taxonomy… [1e-06s]
Total time = 33.544s
Reported 33730 pairwise alignments, 33824 HSPs.
2225 queries aligned.
[zcb1998@localhost Desktop]$
[zcb1998@localhost Desktop]$ /home/zcb1998/Desktop/diamond blastp -d swissprot -q Caomuxi.pep.fasta -o matches2021.xls
diamond v0.9.24.125 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU GPL https://www.gnu.org/licenses/gpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 2
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
Opening the database… [0.002064s]
#Target sequences to report alignments for: 25
Opening the input file… [0.00017s]
Error: Error detecting input file format. First line seems to be blank.
[zcb1998@localhost Desktop]$ /home/zcb1998/Desktop/diamond blastx -d swissprot -q Caomuxi.cds.fasta -o matches.m8
diamond v0.9.24.125 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU GPL https://www.gnu.org/licenses/gpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 2
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
Opening the database… [2.9e-05s]
#Target sequences to report alignments for: 25
Opening the input file… [0.010838s]
Opening the output file… [0.001086s]
Loading query sequences… [0.137648s]
Masking queries… [1.19867s]
Building query seed set… [0.025168s]
Algorithm: Double-indexed
Building query histograms… [0.124588s]
Allocating buffers… [2.2e-05s]
Loading reference sequences… [1.11715s]
Building reference histograms… [3.64051s]
Allocating buffers… [4.9e-05s]
Initializing temporary storage… [0.014819s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 0.
Building reference index… [5.65281s]
Building query index… [0.142622s]
Building seed filter… [0.215267s]
Searching alignments… [0.440092s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 1.
Building reference index… [4.36867s]
Building query index… [0.131952s]
Building seed filter… [0.277002s]
Searching alignments… [0.36881s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 2.
Building reference index… [4.52613s]
Building query index… [0.115202s]
Building seed filter… [0.253503s]
Searching alignments… [0.290895s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 3.
Building reference index… [3.95865s]
Building query index… [0.128909s]
Building seed filter… [0.219909s]
Searching alignments… [0.479451s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 0.
Building reference index… [3.85583s]
Building query index… [0.116726s]
Building seed filter… [0.21339s]
Searching alignments… [0.402212s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 1.
Building reference index… [4.14518s]
Building query index… [0.122619s]
Building seed filter… [0.216045s]
Searching alignments… [0.274904s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 2.
Building reference index… [4.33934s]
Building query index… [0.143687s]
Building seed filter… [0.194082s]
Searching alignments… [0.250045s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 3.
Building reference index… [3.55633s]
Building query index… [0.101393s]
Building seed filter… [0.245083s]
Searching alignments… [0.26321s]
Deallocating buffers… [0.033141s]
Computing alignments… [6.5803s]
Deallocating reference… [0.010486s]
Loading reference sequences… [0.000244s]
Deallocating buffers… [0.000767s]
Deallocating queries… [0.000737s]
Loading query sequences… [4.2e-05s]
Closing the input file… [1.4e-05s]
Closing the output file… [4.3e-05s]
Closing the database file… [8e-06s]
Deallocating taxonomy… [2e-06s]
Total time = 52.9132s
Reported 33730 pairwise alignments, 33824 HSPs.
2225 queries aligned.
[zcb1998@localhost Desktop]$

PS C:\Users\ZHANGCAIBIN\Desktop\uniprot_sprot.fasta> makeblastdb -in uniprot_sprot.fasta -dbtype prot -title uniprot_sprot-parse_seqids -out uniprot_sprot -logfile uniprot_sprot.log
PS C:\Users\ZHANGCAIBIN\Desktop\uniprot_sprot.fasta> cat uniprot_sprot.log

Building a new DB, current time: 12/29/2021 17:17:23
New DB name: C:\Users\ZHANGCAIBIN\Desktop\uniprot_sprot.fasta\uniprot_sprot
New DB title: uniprot_sprot-parse_seqids
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 565928 sequences in 15.5581 seconds.

**PS C:\Users\ZHANGCAIBIN\Desktop\uniprot_sprot.fasta> blastp -query caomuxi.pep.fasta -out swiss-prot.tab -db uniprot_sprot -evalue 1e-5 -outfmt 7 **