Creating Custom Database using Standalone NCBI BLAST+
Basic Local Alignment Search Tool (BLAST) is a collection of programs developed using heuristic algorithm in C++ for comparing DNA, RNA, and protein sequences. The standalone command-line interface (CLI) of BLAST is named as BLAST+. The latest version of NCBI BLAST+ can be downloaded from the FTP server of NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/blast+/LATEST). This is a simple tutorial for creating a custom database, accessing the database, and performing a sequence search using BLAST+.
1. Creating a Custom Database
A nucleotide (nucl
) or protein (prot
) database can be created using -dbtype
parameter in makeblastdb
program. We can create two types of database using command-line below,
Non-indexed Database: ./makeblastdb -in DBX.fasta -out DBX -dbtype prot
Building a new DB, current time: 12/04/2020 10:10:06 New DB name: C:\NCBI\blast-2.6.0+\bin\DBX New DB title: DBX.fasta Sequence type: Protein Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 20 sequences in 0.0041614 seconds.
Indexed Database: ./makeblastdb -in DB.fasta -out DB -dbtype prot -parse_seqids
Building a new DB, current time: 12/03/2020 17:29:38 New DB name: C:\NCBI\blast-2.11.0+\bin\DB New DB title: DB.fasta Sequence type: Protein Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 20 sequences in 0.0277056 seconds.
The sequence files DB.fasta and DBX.fasta were given at end of this page.
2. List of Records in the Database
List of entries in the database can be viewed using -entry all
parameter in blastdbcmd
program. The command-line to get the list of sequence identifiers assigned to the non-indexed and indexed database are below,
Non-indexed Database: ./blastdbcmd -db DBX -entry all -outfmt "OID: %o GI: %g ACC: %a IDENTIFIER: %i"
OID: 0 GI: N/A ACC: BL_ORD_ID:0 IDENTIFIER: gnl|BL_ORD_ID|0 OID: 1 GI: N/A ACC: BL_ORD_ID:1 IDENTIFIER: gnl|BL_ORD_ID|1 OID: 2 GI: N/A ACC: BL_ORD_ID:2 IDENTIFIER: gnl|BL_ORD_ID|2 OID: 3 GI: N/A ACC: BL_ORD_ID:3 IDENTIFIER: gnl|BL_ORD_ID|3 OID: 4 GI: N/A ACC: BL_ORD_ID:4 IDENTIFIER: gnl|BL_ORD_ID|4 OID: 5 GI: N/A ACC: BL_ORD_ID:5 IDENTIFIER: gnl|BL_ORD_ID|5 OID: 6 GI: N/A ACC: BL_ORD_ID:6 IDENTIFIER: gnl|BL_ORD_ID|6 OID: 7 GI: N/A ACC: BL_ORD_ID:7 IDENTIFIER: gnl|BL_ORD_ID|7 OID: 8 GI: N/A ACC: BL_ORD_ID:8 IDENTIFIER: gnl|BL_ORD_ID|8 OID: 9 GI: N/A ACC: BL_ORD_ID:9 IDENTIFIER: gnl|BL_ORD_ID|9 OID: 10 GI: N/A ACC: BL_ORD_ID:10 IDENTIFIER: gnl|BL_ORD_ID|10 OID: 11 GI: N/A ACC: BL_ORD_ID:11 IDENTIFIER: gnl|BL_ORD_ID|11 OID: 12 GI: N/A ACC: BL_ORD_ID:12 IDENTIFIER: gnl|BL_ORD_ID|12 OID: 13 GI: N/A ACC: BL_ORD_ID:13 IDENTIFIER: gnl|BL_ORD_ID|13 OID: 14 GI: N/A ACC: BL_ORD_ID:14 IDENTIFIER: gnl|BL_ORD_ID|14 OID: 15 GI: N/A ACC: BL_ORD_ID:15 IDENTIFIER: gnl|BL_ORD_ID|15 OID: 16 GI: N/A ACC: BL_ORD_ID:16 IDENTIFIER: gnl|BL_ORD_ID|16 OID: 17 GI: N/A ACC: BL_ORD_ID:17 IDENTIFIER: gnl|BL_ORD_ID|17 OID: 18 GI: N/A ACC: BL_ORD_ID:18 IDENTIFIER: gnl|BL_ORD_ID|18 OID: 19 GI: N/A ACC: BL_ORD_ID:19 IDENTIFIER: gnl|BL_ORD_ID|19
Indexed Database: ./blastdbcmd -db DB -entry all -outfmt "OID: %o GI: %g ACC: %a IDENTIFIER: %i"
OID: 0 GI: N/A ACC: Sequence1 IDENTIFIER: lcl|Sequence1 OID: 1 GI: N/A ACC: Sequence2 IDENTIFIER: lcl|Sequence2 OID: 2 GI: N/A ACC: Sequence3 IDENTIFIER: lcl|Sequence3 OID: 3 GI: N/A ACC: Sequence4 IDENTIFIER: lcl|Sequence4 OID: 4 GI: N/A ACC: Sequence5 IDENTIFIER: lcl|Sequence5 OID: 5 GI: N/A ACC: Sequence6 IDENTIFIER: lcl|Sequence6 OID: 6 GI: N/A ACC: Sequence7 IDENTIFIER: lcl|Sequence7 OID: 7 GI: N/A ACC: Sequence8 IDENTIFIER: lcl|Sequence8 OID: 8 GI: N/A ACC: Sequence9 IDENTIFIER: lcl|Sequence9 OID: 9 GI: N/A ACC: Sequence10 IDENTIFIER: lcl|Sequence10 OID: 10 GI: N/A ACC: Sequence11 IDENTIFIER: lcl|Sequence11 OID: 11 GI: N/A ACC: Sequence12 IDENTIFIER: lcl|Sequence12 OID: 12 GI: N/A ACC: Sequence13 IDENTIFIER: lcl|Sequence13 OID: 13 GI: N/A ACC: Sequence14 IDENTIFIER: lcl|Sequence14 OID: 14 GI: N/A ACC: Sequence15 IDENTIFIER: lcl|Sequence15 OID: 15 GI: N/A ACC: Sequence16 IDENTIFIER: lcl|Sequence16 OID: 16 GI: N/A ACC: Sequence17 IDENTIFIER: lcl|Sequence17 OID: 17 GI: N/A ACC: Sequence18 IDENTIFIER: lcl|Sequence18 OID: 18 GI: N/A ACC: Sequence19 IDENTIFIER: lcl|Sequence19 OID: 19 GI: N/A ACC: Sequence20 IDENTIFIER: lcl|Sequence20
The identifier BLAST Ordinal Identifiers (BL_ORD_ID) and General (GNL) represents non-indexed database and Local (LCL) represents indexed database.
3. Searching Sequence from the Database
Sequence in the database can be accessed through entry ID using -entry
parameter in blastdbcmd
program. The command-line to access entry from the non-indexed and indexed database are below,
Non-indexed Database: ./blastdbcmd -db DBX -entry 'gnl|BL_ORD_ID|1'
>AAA40590.1 insulin [Octodon degus] MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRPHDRRELEDLQVEQAELGLEAGGLQPSALE MILQKRGIVDQCCNNICTFNQLQNYCNVP
The latest BLAST+ does not permit access to first entry (index number ‘0’) in the non-indexed database; since the starting index number is ‘1’. Moreover, it does not recognize entries of a non-indexed database. I have used BLAST+ version 2.6.0 to construct a non-indexed database.
./blastdbcmd -db DBX -entry 'gnl|BL_ORD_ID|0' Error: [blastdbcmd] CObject_id::GetId(): Invalid choice selection: NCBI-General::Object-id.str
Indexed Database: ./blastdbcmd -db DB -entry Sequence1
Sequence1 NP_001191615.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRFLAKYMVKRDT ENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISRTGRSNSGHAQLEDNFS
4. Retrieving Sequence from the Database
Sequence from the database can be retrieved through entry ID using -entry
parameter in blastdbcmd
program. The command-line to retrieve the sequence to file from the non-indexed and indexed database are below,
Non-indexed Database: ./blastdbcmd -db DBX -entry 'gnl|BL_ORD_ID|1' -out Sequence2.fasta
Indexed Database: ./blastdbcmd -db DB -entry Sequence1 -out Sequence1.fasta
5. Performing Pairwise Alignment
Pairwise sequence alignment can be performed be passing query as a input file (in fasta file format) through parameters, or raw sequence (not supported in BLAST+ old versions) through command-line. The command-line to perform pairwise alignment are below,
./blastp -db DB -query sequence.fasta
, OR
echo ALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE | ./blastp -db DB
The sequence alignment output is below,
BLASTP 2.11.0+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for composition-based statistics: Alejandro A. Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Database: DB.fasta 20 sequences; 2,178 total letters Query= Length=59 Score E Sequences producing significant alignments: (Bits) Value Sequence3 KAB1251309.1 Insulin [Camelus dromedarius] 120 3e-41 Sequence16 AAA59172.1 insulin [Homo sapiens] 91.3 6e-30 Sequence10 AAA19033.1 insulin [Oryctolagus cuniculus] 89.7 3e-29 Sequence4 NP_001035835.1 insulin, isoform 2 precursor [Homo sapiens] 85.1 2e-26 Sequence8 AAB60625.1 insulin [Ovis aries] 82.0 2e-26 Sequence20 ELK28555.1 Insulin [Myotis davidii] 77.0 1e-23 Sequence7 pir||INHY insulin - hamster 65.5 3e-20 Sequence13 pir||INEL insulin - elephant 64.7 6e-20 Sequence15 pir||INTK insulin - turkey (tentative sequence) 63.5 1e-19 Sequence12 pir||INOS insulin - ostrich 63.5 1e-19 Sequence11 pir||INMKSQ insulin - common squirrel monkey 60.1 3e-18 Sequence2 AAA40590.1 insulin [Octodon degus] 60.8 6e-18 Sequence6 pir||INCD insulin - cod (Gadus sp.) 55.8 2e-16 Sequence5 NP_571131.1 insulin preproprotein [Danio rerio] 53.9 3e-15 Sequence9 XP_014388588.1 PREDICTED: insulin [Myotis brandtii] 48.9 1e-12 Sequence19 BAS32722.1 insulin, partial [Varanus exanthematicus] 38.5 2e-09 Sequence17 QBX89050.1 insulin, partial [Nephrops norvegicus] 19.6 0.042 >Sequence3 KAB1251309.1 Insulin [Camelus dromedarius] Length=110 Score = 120 bits (300), Expect = 3e-41, Method: Compositional matrix adjust. Identities = 59/59 (100%), Positives = 59/59 (100%), Gaps = 0/59 (0%) Query 1 ALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59 ALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE Sbjct 9 ALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 67 >Sequence16 AAA59172.1 insulin [Homo sapiens] Length=110 Score = 91.3 bits (225), Expect = 6e-30, Method: Compositional matrix adjust. Identities = 42/50 (84%), Positives = 42/50 (84%), Gaps = 0/50 (0%) Query 10 APTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59 P PA AF NQHLCGSHLVEALYLVCGERGFFYTPK RRE ED QVG VE Sbjct 18 GPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVE 67 >Sequence10 AAA19033.1 insulin [Oryctolagus cuniculus] Length=110 Score = 89.7 bits (221), Expect = 3e-29, Method: Compositional matrix adjust. Identities = 40/47 (85%), Positives = 43/47 (91%), Gaps = 0/47 (0%) Query 13 PARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59 PA+AF NQHLCGSHLVEALYLVCGERGFFYTPK+RREVE+ QVG E Sbjct 21 PAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKSRREVEELQVGQAE 67 >Sequence4 NP_001035835.1 insulin, isoform 2 precursor [Homo sapiens] Length=200 Score = 85.1 bits (209), Expect = 2e-26, Method: Compositional matrix adjust. Identities = 38/44 (86%), Positives = 38/44 (86%), Gaps = 0/44 (0%) Query 11 PTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQ 54 P PA AF NQHLCGSHLVEALYLVCGERGFFYTPK RRE ED Q Sbjct 19 PDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQ 62 >Sequence8 AAB60625.1 insulin [Ovis aries] Length=105 Score = 82.0 bits (201), Expect = 2e-26, Method: Compositional matrix adjust. Identities = 38/43 (88%), Positives = 39/43 (91%), Gaps = 0/43 (0%) Query 16 AFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGV 58 AF NQHLCGSHLVEALYLVCGERGFFYTPKARREVE QVG + Sbjct 24 AFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGAL 66 >Sequence20 ELK28555.1 Insulin [Myotis davidii] Length=168 Score = 77.0 bits (188), Expect = 1e-23, Method: Compositional matrix adjust. Identities = 34/40 (85%), Positives = 36/40 (90%), Gaps = 0/40 (0%) Query 15 RAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQ 54 +AF NQHLCGSHLVEALYLVCGERGFFYTPK RRE+ D Q Sbjct 23 QAFVNQHLCGSHLVEALYLVCGERGFFYTPKDRRELPDPQ 62 Score = 44.3 bits (103), Expect = 5e-11, Method: Compositional matrix adjust. Identities = 24/52 (46%), Positives = 32/52 (62%), Gaps = 2/52 (4%) Query 8 LGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59 L + P+++ +QHLCG LV AL + CG+RG FY P A E +D Q VE Sbjct 80 LASVDPSQS-QDQHLCGDELVNALTITCGDRG-FYNPMAPLEQDDLQEEEVE 129 >Sequence7 pir||INHY insulin - hamster Length=51 Score = 65.5 bits (158), Expect = 3e-20, Method: Compositional matrix adjust. Identities = 28/30 (93%), Positives = 29/30 (97%), Gaps = 0/30 (0%) Query 17 FANQHLCGSHLVEALYLVCGERGFFYTPKA 46 F NQHLCGSHLVEALYLVCGERGFFYTPK+ Sbjct 1 FVNQHLCGSHLVEALYLVCGERGFFYTPKS 30 >Sequence13 pir||INEL insulin - elephant Length=51 Score = 64.7 bits (156), Expect = 6e-20, Method: Compositional matrix adjust. Identities = 28/30 (93%), Positives = 28/30 (93%), Gaps = 0/30 (0%) Query 17 FANQHLCGSHLVEALYLVCGERGFFYTPKA 46 F NQHLCGSHLVEALYLVCGERGFFYTPK Sbjct 1 FVNQHLCGSHLVEALYLVCGERGFFYTPKT 30 >Sequence15 pir||INTK insulin - turkey (tentative sequence) Length=51 Score = 63.5 bits (153), Expect = 1e-19, Method: Compositional matrix adjust. Identities = 28/29 (97%), Positives = 29/29 (100%), Gaps = 0/29 (0%) Query 18 ANQHLCGSHLVEALYLVCGERGFFYTPKA 46 ANQHLCGSHLVEALYLVCGERGFFY+PKA Sbjct 2 ANQHLCGSHLVEALYLVCGERGFFYSPKA 30 >Sequence12 pir||INOS insulin - ostrich Length=51 Score = 63.5 bits (153), Expect = 1e-19, Method: Compositional matrix adjust. Identities = 28/29 (97%), Positives = 29/29 (100%), Gaps = 0/29 (0%) Query 18 ANQHLCGSHLVEALYLVCGERGFFYTPKA 46 ANQHLCGSHLVEALYLVCGERGFFY+PKA Sbjct 2 ANQHLCGSHLVEALYLVCGERGFFYSPKA 30 >Sequence11 pir||INMKSQ insulin - common squirrel monkey Length=51 Score = 60.1 bits (144), Expect = 3e-18, Method: Compositional matrix adjust. Identities = 26/30 (87%), Positives = 26/30 (87%), Gaps = 0/30 (0%) Query 17 FANQHLCGSHLVEALYLVCGERGFFYTPKA 46 F NQHLCG HLVEALYLVCGERGFFY PK Sbjct 1 FVNQHLCGPHLVEALYLVCGERGFFYAPKT 30 >Sequence2 AAA40590.1 insulin [Octodon degus] Length=109 Score = 60.8 bits (146), Expect = 6e-18, Method: Compositional matrix adjust. Identities = 28/49 (57%), Positives = 35/49 (71%), Gaps = 1/49 (2%) Query 11 PTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59 P +A+++QHLCGS+LVEALY+ CG G FY P RRE+ED QV E Sbjct 19 PNSVQAYSSQHLCGSNLVEALYMTCGRSG-FYRPHDRRELEDLQVEQAE 66 >Sequence6 pir||INCD insulin - cod (Gadus sp.) Length=51 Score = 55.8 bits (133), Expect = 2e-16, Method: Compositional matrix adjust. Identities = 23/27 (85%), Positives = 25/27 (93%), Gaps = 0/27 (0%) Query 20 QHLCGSHLVEALYLVCGERGFFYTPKA 46 QHLCGSHLV+ALYLVCG+RGFFY PK Sbjct 5 QHLCGSHLVDALYLVCGDRGFFYNPKG 31 >Sequence5 NP_571131.1 insulin preproprotein [Danio rerio] Length=108 Score = 53.9 bits (128), Expect = 3e-15, Method: Compositional matrix adjust. Identities = 25/32 (78%), Positives = 27/32 (84%), Gaps = 2/32 (6%) Query 20 QHLCGSHLVEALYLVCGERGFFYTPKARREVE 51 QHLCGSHLV+ALYLVCG GFFY PK R+VE Sbjct 27 QHLCGSHLVDALYLVCGPTGFFYNPK--RDVE 56 >Sequence9 XP_014388588.1 PREDICTED: insulin [Myotis brandtii] Length=183 Score = 48.9 bits (115), Expect = 1e-12, Method: Compositional matrix adjust. Identities = 26/51 (51%), Positives = 34/51 (67%), Gaps = 1/51 (2%) Query 9 GAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59 APTPA+AF +HLC L E L ++CG++G F PKA RE+ D Q G V+ Sbjct 17 WAPTPAQAFYFEHLCDEDLAEMLTIICGDQG-FRNPKATRELPDPQEGEVD 66 Score = 46.6 bits (109), Expect = 7e-12, Method: Compositional matrix adjust. Identities = 22/40 (55%), Positives = 28/40 (70%), Gaps = 1/40 (3%) Query 20 QHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59 Q LCG LV+ L +VCG+RG FY+P A RE+ D Q G V+ Sbjct 106 QRLCGEDLVDTLTMVCGDRG-FYSPTALRELPDPQEGEVD 144 >Sequence19 BAS32722.1 insulin, partial [Varanus exanthematicus] Length=88 Score = 38.5 bits (88), Expect = 2e-09, Method: Compositional matrix adjust. Identities = 22/49 (45%), Positives = 29/49 (59%), Gaps = 2/49 (4%) Query 3 LALLALGAPTPARAFA--NQHLCGSHLVEALYLVCGERGFFYTPKARRE 49 L LLA+ APT A + ++HLCGS LVEAL CG+ G + K + Sbjct 1 LVLLAVLAPTAIYATSENDEHLCGSALVEALVSACGKEGIYSFTKRNEQ 49 >Sequence17 QBX89050.1 insulin, partial [Nephrops norvegicus] Length=178 Score = 19.6 bits (39), Expect = 0.042, Method: Compositional matrix adjust. Identities = 9/25 (36%), Positives = 12/25 (48%), Gaps = 2/25 (8%) Query 20 QHLCGSHLVEALYLVCGERGFFYTP 44 + LCG L L VC +G + P Sbjct 25 RRLCGWRLANKLNRVC--KGVYNNP 47 Score = 13.1 bits (22), Expect = 9.6, Method: Compositional matrix adjust. Identities = 6/19 (32%), Positives = 7/19 (37%), Gaps = 0/19 (0%) Query 32 YLVCGERGFFYTPKARREV 50 YL +R TP E Sbjct 90 YLTFSQRASEDTPSEENEA 108 Lambda K H a alpha 0.324 0.139 0.423 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 57052 Database: DB.fasta Posted date: Dec 3, 2020 5:29 PM Number of letters in database: 2,178 Number of sequences in database: 20 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Neighboring words threshold: 11 Window for multiple hits: 40
6. Storing Pairwise Alignment Result
The output of pairwise alignment can be stored to the local disk using command-line below,
./blastp -db DB -query sequence.fasta -outfmt 0 -out output.html -html
The list of sequence alignment output formats (-outfmt
) are:
0 = Pairwise,
1 = Query-anchored showing identities,
2 = Query-anchored no identities,
3 = Flat query-anchored showing identities,
4 = Flat query-anchored no identities,
5 = BLAST XML,
6 = Tabular,
7 = Tabular with comment lines,
8 = Seqalign (Text ASN.1),
9 = Seqalign (Binary ASN.1),
10 = Comma-separated values,
11 = BLAST archive (ASN.1),
12 = Seqalign (JSON),
13 = Multiple-file BLAST JSON,
14 = Multiple-file BLAST XML2,
15 = Single-file BLAST JSON,
16 = Single-file BLAST XML2,
17 = Sequence Alignment/Map (SAM), and
18 = Organism Report
Sequence Files used for Database Creation
The FASTA file formatted multiple sequence file (DB.fasta) is given below:
>Sequence1 NP_001191615.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRF LAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISR TGRSNSGHAQLEDNFS >Sequence2 AAA40590.1 insulin [Octodon degus] MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRPHDRRELEDLQVEQAELGLE AGGLQPSALEMILQKRGIVDQCCNNICTFNQLQNYCNVP >Sequence3 KAB1251309.1 Insulin [Camelus dromedarius] MALWTRLLALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVELGG GPGAGGLQPLGPEGRPQKRGIVEQCCASVCSLYQLENYCN >Sequence4 NP_001035835.1 insulin, isoform 2 precursor [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQASALSLSS STSTWPEGLDATARAPPALVVTANIGQAGGSSSRQFRQRALGTSDSPVLFIHCPGAAGTAQGLEYRGRRV TTELVWEEVDSSPQPQGSESLPAQPPAQPAPQPEPQQAREPSPEVSCCGLWPRRPQRSQN >Sequence5 NP_571131.1 insulin preproprotein [Danio rerio] MAVWLQAGALLVLLVVSSVSTNPGTPQHLCGSHLVDALYLVCGPTGFFYNPKRDVEPLLGFLPPKSAQET EVADFAFKDHAELIRKRGIVEQCCHKPCSIFELQNYCN >Sequence6 pir||INCD insulin - cod (Gadus sp.) MAPPQHLCGSHLVDALYLVCGDRGFFYNPKGIVDQCCHRPCDIFDLQNYCN >Sequence7 pir||INHY insulin - hamster FVNQHLCGSHLVEALYLVCGERGFFYTPKSGIVDQCCTSICSLYQLENYCN >Sequence8 AAB60625.1 insulin [Ovis aries] MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGALELAG GPGAGGLEGPPQKRGIVEQCCAGVCSLYQLENYCN >Sequence9 XP_014388588.1 PREDICTED: insulin [Myotis brandtii] MALWTRLLPLLALLALWAPTPAQAFYFEHLCDEDLAEMLTIICGDQGFRNPKATRELPDPQEGEVDMGAG GQKALTLEQLLQNSDIPARLLALWAPAPAPAQSGEQRLCGEDLVDTLTMVCGDRGFYSPTALRELPDPQE GEVDMGAGGQKALTLEQLLQNSDIVDMCCNNFCSFYQLEYYCN >Sequence10 AAA19033.1 insulin [Oryctolagus cuniculus] MASLAALLPLLALLVLCRLDPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKSRREVEELQVGQAELGG GPGAGGLQPSALELALQKRGIVEQCCTSICSLYQLENYCN >Sequence11 pir||INMKSQ insulin - common squirrel monkey FVNQHLCGPHLVEALYLVCGERGFFYAPKTGVVDQCCTSICSLYQLQNYCN >Sequence12 pir||INOS insulin - ostrich AANQHLCGSHLVEALYLVCGERGFFYSPKAGIVEQCCHNTCSLYQLENYCN >Sequence13 pir||INEL insulin - elephant FVNQHLCGSHLVEALYLVCGERGFFYTPKTGIVEQCCTGVCSLYQLENYCN >Sequence14 AAF80383.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRF LAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISR TGRSNSGHAQLEDNFS >Sequence15 pir||INTK insulin - turkey (tentative sequence) AANQHLCGSHLVEALYLVCGERGFFYSPKAGIVEQCCHNTCSLYQLENYCN >Sequence16 AAA59172.1 insulin [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG GPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN >Sequence17 QBX89050.1 insulin, partial [Nephrops norvegicus] VVVVVVGSSRASRRTYPTSEEEPRRRLCGWRLANKLNRVCKGVYNNPGSTGNYLFYRSRRDGESEPGLPP EEYLDLLPDPEEERGLRHHYLTFSQRASEDTPSEENEAPGSFFGSLSPQDSPHQSAVQEDEASSVQFPFL TEEEASQMVRVRPRSKRGLSAECCRKVCTVSELVGYCY >Sequence18 ACQ91106.1 insulin, partial [Haliotis corrugata] DLHVIISNLCSSLGGNRRFLAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRI FELAQYCRLPDHFFSRISRTG >Sequence19 BAS32722.1 insulin, partial [Varanus exanthematicus] LVLLAVLAPTAIYATSENDEHLCGSALVEALVSACGKEGIYSFTKRNEQSLGHGLLDNEVPFHLGKRGIV EDCCENICPWSVLQSYCR >Sequence20 ELK28555.1 Insulin [Myotis davidii] MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKDRRELPDPQGESSPLTP RSHPKGTGYLASVDPSQSQDQHLCGDELVNALTITCGDRGFYNPMAPLEQDDLQEEEVEMDEGGLQALTL EGLLQKRGIVEECCTNVCSLYQLERYCNThe FASTA file formatted multiple sequence file (DBX.fasta) is given below:
>NP_001191615.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRF LAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISR TGRSNSGHAQLEDNFS >AAA40590.1 insulin [Octodon degus] MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRPHDRRELEDLQVEQAELGLE AGGLQPSALEMILQKRGIVDQCCNNICTFNQLQNYCNVP >KAB1251309.1 Insulin [Camelus dromedarius] MALWTRLLALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVELGG GPGAGGLQPLGPEGRPQKRGIVEQCCASVCSLYQLENYCN >NP_001035835.1 insulin, isoform 2 precursor [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQASALSLSS STSTWPEGLDATARAPPALVVTANIGQAGGSSSRQFRQRALGTSDSPVLFIHCPGAAGTAQGLEYRGRRV TTELVWEEVDSSPQPQGSESLPAQPPAQPAPQPEPQQAREPSPEVSCCGLWPRRPQRSQN >NP_571131.1 insulin preproprotein [Danio rerio] MAVWLQAGALLVLLVVSSVSTNPGTPQHLCGSHLVDALYLVCGPTGFFYNPKRDVEPLLGFLPPKSAQET EVADFAFKDHAELIRKRGIVEQCCHKPCSIFELQNYCN >pir||INCD insulin - cod (Gadus sp.) MAPPQHLCGSHLVDALYLVCGDRGFFYNPKGIVDQCCHRPCDIFDLQNYCN >pir||INHY insulin - hamster FVNQHLCGSHLVEALYLVCGERGFFYTPKSGIVDQCCTSICSLYQLENYCN >AAB60625.1 insulin [Ovis aries] MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGALELAG GPGAGGLEGPPQKRGIVEQCCAGVCSLYQLENYCN >XP_014388588.1 PREDICTED: insulin [Myotis brandtii] MALWTRLLPLLALLALWAPTPAQAFYFEHLCDEDLAEMLTIICGDQGFRNPKATRELPDPQEGEVDMGAG GQKALTLEQLLQNSDIPARLLALWAPAPAPAQSGEQRLCGEDLVDTLTMVCGDRGFYSPTALRELPDPQE GEVDMGAGGQKALTLEQLLQNSDIVDMCCNNFCSFYQLEYYCN >AAA19033.1 insulin [Oryctolagus cuniculus] MASLAALLPLLALLVLCRLDPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKSRREVEELQVGQAELGG GPGAGGLQPSALELALQKRGIVEQCCTSICSLYQLENYCN >pir||INMKSQ insulin - common squirrel monkey FVNQHLCGPHLVEALYLVCGERGFFYAPKTGVVDQCCTSICSLYQLQNYCN >pir||INOS insulin - ostrich AANQHLCGSHLVEALYLVCGERGFFYSPKAGIVEQCCHNTCSLYQLENYCN >pir||INEL insulin - elephant FVNQHLCGSHLVEALYLVCGERGFFYTPKTGIVEQCCTGVCSLYQLENYCN >AAF80383.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRF LAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISR TGRSNSGHAQLEDNFS >pir||INTK insulin - turkey (tentative sequence) AANQHLCGSHLVEALYLVCGERGFFYSPKAGIVEQCCHNTCSLYQLENYCN >AAA59172.1 insulin [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG GPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN >QBX89050.1 insulin, partial [Nephrops norvegicus] VVVVVVGSSRASRRTYPTSEEEPRRRLCGWRLANKLNRVCKGVYNNPGSTGNYLFYRSRRDGESEPGLPP EEYLDLLPDPEEERGLRHHYLTFSQRASEDTPSEENEAPGSFFGSLSPQDSPHQSAVQEDEASSVQFPFL TEEEASQMVRVRPRSKRGLSAECCRKVCTVSELVGYCY >ACQ91106.1 insulin, partial [Haliotis corrugata] DLHVIISNLCSSLGGNRRFLAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRI FELAQYCRLPDHFFSRISRTG >BAS32722.1 insulin, partial [Varanus exanthematicus] LVLLAVLAPTAIYATSENDEHLCGSALVEALVSACGKEGIYSFTKRNEQSLGHGLLDNEVPFHLGKRGIV EDCCENICPWSVLQSYCR >ELK28555.1 Insulin [Myotis davidii] MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKDRRELPDPQGESSPLTP RSHPKGTGYLASVDPSQSQDQHLCGDELVNALTITCGDRGFYNPMAPLEQDDLQEEEVEMDEGGLQALTL EGLLQKRGIVEECCTNVCSLYQLERYCN
NICE
ReplyDelete