The SeqBag Project
  Bioinformatics Research Lab.
  IBI Biosolutions Pvt. Ltd.

About SeqBag

Certainly biology didn't start due to today's genome projects occurring all around, but it has definitely started a great acceleration of the accumulation of biological knowledge. To make sense out of the enormous amount of data and knowledge is a huge challenge. The primary part is to parse, simplify, classify and organize this immense richness of sequence data. This era has very well managed to do so and the output is all the databases readily available at a single click. But only capturing and reducing the complexity is not the solution, rather integrating from diverse sources and analyzing is a big task. Hence today sequence analysis plays an important role in biological research. The web resource SeqBag is a comprehensive and compact web based software dedicated for sequence analysis. It proposes a collection of modular tools which provides different ways to perform sequence analysis on nucleotide and protein sequence. SeqBag reads sequence in raw format. It has seven different modules- DNA Properties, Protein Properties, Restriction Analysis, EST Analysis, Sequence Alignment, Six Reading Frame and Pattern Finder.

TASKS AND PROGRAMS
In the following section we describe different tasks and programs of SeqBag, which are useful for sequence analysis:

Sequence Properties: The sequence properties can be calculated by using the DNA properties and Protein properties modules. By using these two modules user can retrieve the properties of nucleotides and protein sequences.

1) DNA Properties Module:
The DNA properties module enables the user to analyze the DNA sequence and retrieve GC% and AT%, length, composition, reverse, complementary, reverse complementary and the transcribed sequence of the given DNA sequence. Another important application of this tool is that it enables the measurement of molecular weights under specified variations in the DNA sequence for eg. Anhydrous molecular weight assuming there is no 5' monophosphate, Anhydrous molecular weight assuming 5'monophosphate is present, molecular weight assuming there is 5' triphosphate and molecular weight of single stranded DNA. It allows to calculate the basic melting temperature of the query DNA sequence.

2) Protein Properties Module:
The other module known as the protein properties module provides analysis of the Protein sequence. The user can retrieve composition, length, total molecular weight of the sequence, exact weight of the sequence, protein volume (approximately), extinction coefficient1 [3] of the sequence, optical density2 of sequence, and grand average of hydropathy3 of the given protein sequence.

3) Restriction Enzyme Analysis:
A restriction enzyme is an enzyme that cuts double-stranded DNA at specific sequences within it known as restriction sites. This restriction enzyme analysis tool contains the 30 different restriction enzymes which are the most commonly used enzymes, to perform the analysis. These restriction enzymes are listed below;

AaatIII

AccIII

Acc65I

AccB7I

Agel

Alul

A/W44I

Apal

BalI

BamHl

Bbul

BclI

BglII

BglIl

BsaMI

BsrBRI

BsrSI

BssHIl

EcoRI

BaMHI

HindIII

TaqI

NotI

HinfI

Sau3A

PovII

SmaI

HaeIII

AluI

EcoRV

4) EST Analysis:
Expressed Sequence Tags (ESTs) are short (usually about 300-500 bp), single-pass sequence reads from mRNA (cDNA). Typically they are produced in large batches. They represent a snapshot of genes expressed in a given tissue and/or at a given developmental stage. They are tags (some coding, others not) of expression for a given cDNA library. The EST analysis tool helps user to concatenate any number of ESTs provided by the user .

5) Sequence Alignment:
Sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences . Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that residues with identical or similar characters are aligned in successive columns. The sequence alignment tool provides the pairwise alignment between the two sequences, there by giving Matches, Mismatch and Similarity.

6) SIX Reading Frame:
A reading frame is a contiguous and non-overlapping set of three-nucleotide codons in DNA or RNA. There are 3 possible reading frames in a mRNA strand and six in a double stranded DNA molecule due to the two strands from which transcription is possible. The SIX reading frame tool helps to find the possible six reading frames of a nucleotide sequence.

7) Pattern Finder:
Any protein structure is not fully deciphered until all the domains and motifs present in it are recognized. It has been known that specific sequence patterns are responsible for a particular domain or motif formation. Pattern finder enables the identification of such specific patterns and hence aiding comparative studies. The Pattern finder accepts a given pattern (can be a domain or motif) and helps to locate the sequence region that matches the given pattern of interest.


1. Extinction Coefficient: The extinction coefficient for a particular substance is a measure of how well it absorbs electromagnetic radiation.
2. Optical Density: Optical density is the absorbance of an optical element for a given wavelength.
3. Hydropathy: It is the hydrophobic character, which may be useful in predicting membrane-spanning domains, potential antigenic sites and regions that are likely exposed on the protein's surface.