In Silico Experience's And Science: March 2007

Focus On Multiple Sclerosis: Need Of The Hour

Multiple Sclerosis (abbreviated as MS, also known as disseminated sclerosis or encephalomyelitis disseminata) is a chronic, inflammatory and potentially debilitating disease that affects the Central Nervous System (CNS), which is made up brain and spinal cord of unknown etiology. Multiple sclerosis is widely believed to be an autoimmune disease, a condition in which one in which the body, through its immune system, launches a defensive attack against its own tissues.

Multiple sclerosis affects neurons, the cells of the brain and spinal cord that carry information, create thought and perception, and allow the brain to control the body. Surrounding and protecting these neurons is a fatty layer known as the myelin sheath, which helps neurons carry electrical signals. In multiple sclerosis, the body mistakenly directs antibodies and white blood cells against proteins in the myelin sheath, a fatty substance that insulates nerve fibers of the brain and spinal cord. This results in inflammation and injury to the sheath and ultimately to the nerves that it surrounds. The result may be multiple areas of scarring (sclerosis). The myelin is broken down in patches throughout the central nervous system and the damaged patches become scarred (this is where the name comes from - sclerosis meaning scars and multiple means many). Without the myelin coating, nerve messages cannot travel normally and they can become garbled or lost so that the instructions sent by the nervous system to different parts of the body are disrupted and subsequent axonal degeneration. Eventually, this damage can slow or block the nerve signals that control muscle coordination, strength, sensation and vision. This scarring causes symptoms which vary widely depending upon which signals are interrupted.

Multiple sclerosis is unpredictable and varies in severity. In some people, multiple sclerosis is a mild illness, but it can lead to permanent disability in others. Treatments can modify the course of the disease and relieve symptoms.

Clinical Manifestations:
The onset of MS may be insidious or sudden. Common presenting symptoms include monocular visual impairment with pain (optic neuritis), paresthesias, weakness, and impaired coordination (Table 1). The most common clinical signs and symptoms at presentation include sensory disturbance of the limbs , partial or complete visual loss , acute and subacute motor dysfunction of the limbs , diplopia and gait dysfunction

MS frequently is overlooked because initial symptoms resolve spontaneously in most patients. Relapses occur within months or years. In some patients, however, MS has a primary progressive course from onset.

TABLE 1: Common Symptoms and Signs of Multiple Sclerosis
Symptoms
Depression
Dizziness or vertigo
Fatigue
Heat sensitivity
Lhermitte's sign (electrical sensation down the spine on neck flexion)
Numbness, tingling, pain
Urinary bladder dysfunction
Visual impairment (monocular or diplopia)
Weakness
Signs
Action tremor
Decreased perception of pain, vibration, or position
Decreased strength
Hyperreflexia, spasticity, Babinski's sign
Impaired coordination and balance
Impaired visual acuity or red color perception with optic disc pallor and afferent pupillary defect; disconjugate eye movements
Nystagmus

These signs and symptoms may occur in isolation or in combination, and have to be present for a minimum of 24 hours to be considered a "clinical attack." As any anatomical location of the CNS may be affected, the clinical presentation of individuals with Multiple Sclerosis is extremely variable.

The course may be relapsing-remitting or progressive, severe or mild, and may involve the entire neuroaxis in a widespread fashion or predominantly affect spinal cord and optic nerves. Very little is known about the underlying cause of disease course variability in Multiple Sclerosis. Individuals can be stable for many months or years, while suddenly experiencing a devastating clinical attack. Currently, no biological markers can assist the clinician in predicting the clinical course and/or the accumulation of disability. Within families, the clinical course of Multiple Sclerosis among affected relatives can span the entire spectrum of possibilities — the clinical course does not run true to type in families .Clinical disease progression is assessed by recording the accumulation of neurological disability with valid methodological tools, including the expanded disability status scale (EDSS).

The Three Clinical Phenotypes Of Multiple Sclerosis:
The course of MS is difficult to predict, and the disease may at times either lie dormant or progress steadily. Several subtypes, or patterns of progression, have been described. Subtypes use the past course of the disease in an attempt to predict the future course. A person diagnosed with a particular subtype may, for unclear reasons, switch from one subtype to another over time. Subtypes are important not only for prognosis but also for therapeutic decisions. In 1996 the National Multiple Sclerosis Society standardized the following four subtype definitions

Relapsing-Remitting Multiple Sclerosis (RR-MS):
Initially, more than 80% of individuals with MS experience a relapsing-remitting disease course with defined clinical exacerbations of neurological symptoms, followed by complete or incomplete remission. This subtype is characterized by unpredictable attacks (relapses) followed by periods of months to years of relative quiet (remission) with no new signs of disease activity. Deficits suffered during the attacks may either resolve or may be permanent. When deficits always resolve between attacks, this is referred to as "benign" MS.
Approximately ten years after disease onset, an estimated 50% of individuals with RR-MS convert to a progressive clinical course called secondary progressive (SP) MS, which is no longer characterized by clinical attacks and remissions, but by insidious progression of clinical symptoms.

Secondary Progressive Multiple Sclerosis:
Secondary progressive describes around 80% of those with initial relapsing-remitting MS, who then begin to have neurologic decline between their acute attacks without any definite periods of remission. This decline may include new neurologic symptoms, worsening cognitive function, or other deficits. Secondary progressive is the most common type of MS and causes the greatest amount of disability.

Primary-Progressive Multiple Sclerosis (PP-MS):
Primary progressive describes the approximately 10% of individuals who never have remission after their initial MS symptoms. Decline occurs continuously without clear attacks. The primary progressive subtype tends to affect people who are older at disease onset.

Progressive Relapsing Multiple Sclerosis (PR-MS):
Progressive relapsing describes those individuals who, from the onset of their MS, have a steady neurologic decline but also suffer superimposed attacks. A significantly rarer form is progressive relapsing MS, which initially presents as PP-MS; however, during the course of the disease, these individuals develop true neurological exacerbations. Individuals with SP-MS who have clinical exacerbations followed by incomplete remission are included in this category.

Knowledge Is Wealth

The past few decades has seen a tremendous growth in the amount of biological data, specifically in the areas of genomics and proteomics. This growth is accompanied by an accelerated increase in the number of biological publications discussing the findings. In the last few years, there has been a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature and find information most relevant and useful for specific analysis.

Several advances in computational and biological methods have improved scale of biomedical research. Complete genomes can now be sequenced within a short span of time (months). Computational methods hasten the identification of numerous genes within the sequenced data. Several automated tools are developed for analyzing properties of these genes and proteins they code. Large-scale experimental methods produce large quantities of data which when processed, can provide information about gene expression patterns, E.g. Which genes are expressed in various tissues, and which ones are over/under expressed at the onset of a disease or during a specific phase of the cell development.

It is to be noted that “The ultimate goal of conducting large-scale biology is to translate these large amounts of information into knowledge of the complex biological processes governing the human body and to utilize this knowledge to advance healthcare and medicine”. All information pertaining to genes, proteins, and their role in biological processes is reported somewhere in the vast amount of published biomedical literature. This clearly shows that the advancement of genome sequencing techniques is always accompanied by a proportionate increase in the literature discussing the discovered genes.
Therefore it is necessary to manage the tremendous amount of literature available, to extract meaningful information from them. This is where Text Mining (knowledge process) plays an important role.

Text Mining alternately referred to as text data mining, is generally referred to the process of deriving high quality information from text. High quality information is typically derived through the divining of patterns and trends through means such as statistical pattern learning. Text mining involves the process of structuring the input text, deriving patterns within the structured data, finally evaluation and interpretation of the output. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, sentiment analysis, document summarization, and relationship between entities. Automated literature mining offers an untapped opportunity to integrate many fragments of information gathered by researchers from multiple fields of expertise into a complete picture exposing the interrelated roles of various genes, proteins, and chemical reactions in cells and organisms.

The last few years has seen a surge of interest in using the biomedical literature, ranging from relatively modest tasks such as finding reported gene location on chromosomes to more ambitious attempts to construct putative gene networks based on gene-name co-occurrence within articles. Since the literature covers all aspects of biology, chemistry, and medicine, there is no limit to the types of information that may be recovered through careful and exhaustive mining. Some possible applications for such efforts include the reconstruction and prediction of pathways, establishing connections between genes and disease, finding the relationships between genes and specific biological functions, and much more. It is important to note that a single mining strategy is unlikely to address this wide spectrum of goals and needs. Regardless of the explicit goal, there are several major hurdles to overcome when using the biomedical literature for finding information. The most obvious is the sheer number of available articles, which is continuously growing. For instance, the most widely used biomedical literature database, NCBI’s PubMed, contains over 12,000,000 abstracts. A query for abstracts mentioning gene or protein returns about 3,000,000 articles, of which nearly two thirds were published just within the past decade. It was noted that this prolific database by no means covers all the publications in all the areas related to biomedicine, but rather, just those meeting certain criteria. Another major problem that arises when searching for the literature relevant to specific entities such as a gene, a protein, or a disease- is the level of ambiguity seen in both the English language and the biomedical jargon; were we may miss relevant papers, as well as retrieve irrelevant ones.

Text mining and knowledge extraction are ways to aid researchers in coping up with the information overload. Text mining can be differentiated from information retrieval (IR) and text summarization (TS). Information retrieval and Text Summarization focus on the larger units of text such as documents; while Text Mining operates at a finer level of granularity and examines the relationships between specific kinds of information contained both within and between documents. Text mining is also differentiated from Natural Language Processing (NLP) in that NLP attempts to understand the meaning of text as a whole, while text mining and knowledge extraction concentrate on solving a specific problem in a specific domain identified a priori (possibly using some NLP techniques in the process). For example, text mining can aid database curators by selecting articles most likely to contain information of interest or potential new treatments for migraine may be determined by looking for pharmacological substances that are associated with biological processes associated with migraine.

Past
Labour-intensive manual text-mining approaches first surfaced in the mid-1980s, but technological advances have enabled the field to advance swiftly during the past decade. Text mining is an interdisciplinary field which draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. As most information (over 80%) is currently stored as text, text mining is believed to have a high commercial potential value.

Present Scenario
The present day text mining tools aims in identifying the named entity within a collection of text. For e.g. all the drug names within a group of articles. The idea behind this is that by recognizing biological entities within a group of articles allows further extraction of relationship and other information by identifying the key concept of interest; by doing so they can be represented in a normalized form. This has however been challenging due to several reasons.

Since there is no complete dictionary for most type of biological entities, apart from this some phrases can refer to two different things depending on the context. A known fact is that biological entities have more than a single name. To top up this is that many biological entities have several multi-words; which complicates the process for defining name boundaries that would overlap the candidate gene.

Text Classification attempts to automatically determine if the articles have the characteristics of the search performed.Extract information relating to the search.Applying them to candidate using decision making process.Giving results related to the query, thus helping retrieve connected information pertaining to the query

Synonym And Abbreviation
There has been a tremendous growth seen in the biological terminologies seen accompanied with the increase in biological literature. Complicating this is that many biological entities have multiple names and abbreviation. By including these synonyms and abbreviations in the search would result in higher efficiency of the text mining tool. This is one area which has been developed recently, and improvements are being made. One such would be by collecting these synonyms and abbreviation to aid the user to perform literature searches.

Relationship Extraction
This helps in detecting specific relationship between a pair of named entities or more. Though the entities are related and specific, the relationship established between the two might be either very specific or general. Depending on the type of entity; the extraction of relationship between them is found on text. This helps to uncover the preciously unrecognized relationship between the two entities.

Future Challenges
From all of the foregoing, it is clear that biomedical text mining has great potential. It is indeed a sad state that potential is yet unrealized. Text-mining tools are not part of the standard arsenal of the biomedical researcher in the way that search engines and sequence alignment tools are. The major challenge for the next 5–10 years of text-mining work is the creation of text-mining tools to provide a clear benefit to these researchers, allowing them to be more productive given increasing challenges due to information growth. The focus must be more on helping biomedical researchers to solve real-world problems that are inhibiting the pace of research and less on evaluations based on system output independent of meeting user needs. Advances on several fronts are necessary for this to become a reality.

In Silico Experience's And Science

Tuesday, March 20, 2007

Multiple Sclerosis: An Overview

Friday, March 9, 2007

The Knowledge Process

Blog Archive

About Me