Saturday, April 28, 2007

PRESENT TEXT MINING FOCUS ON ENTITIES

Named Entity Recognizer:

The present day text mining tools aims in identifying the named entity within a collection of text. For e.g. all the drug names within a group of articles. The goal is to identify, within a collection of text, all of the instances of a name for a specific type of thing: for example, the entire set of drug names within a collection of journal articles, or all of the gene names and symbols within a collection of abstracts.

The idea behind this is that by recognizing biological entities within a group of articles allows further extraction of relationship and other information by identifying the key concept of interest; by doing so they can be represented in a normalized form. This has however been challenging due to several reasons.

Since there is no complete dictionary for most type of biological entities so simple text matching algorithms do not suffice, apart from this some phrases can refer to two different things depending on the context. A known fact is that biological entities have more than a single name. To top up this is that many biological entities have several multi-words; which complicates the process for defining name boundaries that would overlap the candidate gene.


Text Classification

Text classification attempts to automatically determine whether a document or part of a document has particular characteristics of interest, usually based on whether the document discusses a given topic or contains a certain type of information. Typically the information of interest is not specified explicitly by the users and, instead, they provide a set of documents that have been found to contain the characteristics of interest (the positive training set), and another set that does not (the negative training set). Text classification systems must automatically extract the features that help determine positives from negatives and apply those features to candidate documents using some kind of decision-making process. Accurate text classification systems can be especially valuable to database curators, who may have to review many documents to find a few that contain the kind of information they are collecting in their database. Because more biomedical information is being created in text form then ever before, and because there are more ongoing database curation efforts to organise this information into coded databases than before, there is a strong need to find useful ways to apply text classification methods to biomedical text.

Synonym And Abbreviation

There has been a tremendous growth seen in the biological terminologies seen accompanied with the increase in biological literature. Complicating this is that many biological entities have multiple names and abbreviation. By including these synonyms and abbreviations in the search would result in higher efficiency of the text mining tool. This is one area which has been developed recently, and improvements are being made. One such would be by collecting these synonyms and abbreviation to aid the user to perform literature searches.

Relationship Extraction

This helps in detecting specific relationship between a pair of named entities or more. Though the entities are related and specific, the relationship established between the two might be either very specific or general. Depending on the type of entity; the extraction of relationship between them is found on text. This helps to uncover the preciously unrecognized relationship between the two entities.

Natural Language Processing (NLP) For Text Mining

The field of Natural Language Processing is concerned with the analysis of free textual information and has been applied recently in the context of molecular biology. Text-mining approaches involve analyzing and extracting information from large collections of free textual data by using automatic or semiautomatic systems. Currently, text-mining applications are being employed in the identification of biological entities such as protein or gene names, automated protein annotation, analysis of microarrays and extraction of protein–protein interactions. In general, text-mining applications take advantage of a range of domain-independent methods such as part-of speech (POS) taggers, which label each word with its corresponding part of speech (e.g. noun, verb or adjective), or stemmers, which are algorithms that return the morphological root of a word form. Also, domain-specific tools and resources such as protein taggers and ontologies are employed.

Tuesday, April 3, 2007

Schizophrenia

Schizophrenia is often described in terms of "positive" and "negative" symptoms. Positive symptoms include delusions, auditory hallucinations and thought disorder and are typically regarded as manifestations of psychosis. Negative symptoms are so named because they are considered to be the loss or absence of normal traits or abilities, and include features such as flat, blunted or constricted affect and emotion, poverty of speech and lack of motivation. Additionally, a 'disorganization syndrome' and neurocognitive deficits may be present. These may take the form of reduced or impaired psychological functions such as memory, attention, problem-solving, executive function or social cognition.

Onset of schizophrenia typically occurs in late adolescence or early adulthood, with males tending to show symptoms earlier than females.

In 1893 psychiatrist Emil Kraepelin was the first to draw a distinction between what he termed dementia praecox ("premature dementia") and other psychotic illnesses. In 1908, "dementia praecox" was renamed "schizophrenia" by psychiatrist Eugen Bleuler, who discovered that the disorder is not a form of dementia.

The diagnostic category of schizophrenia has been widely criticised as lacking in scientific validity or reliability, consistent with evidence of poor levels of consistency in diagnostic practices and the use of criteria. One alternative suggests that the problems and issues making up the diagnosis of schizophrenia would be better addressed as individual dimensions along which everyone varies, such that there is a spectrum or continuum rather than a cut-off between normal and ill. This approach appears consistent with research on schizotypy and of a relatively high prevalence of psychotic experiences and delusional beliefs amongst the general public.

Although no common cause of schizophrenia has been identified in all individuals diagnosed with the condition, currently most researchers and clinicians believe it results from a combination of both brain vulnerabilities (either inherited or acquired) and stressful life-events. This widely-adopted approach is known as the 'stress-vulnerability' model, and much scientific debate now focuses on how much each of these factors contributes to the development and maintenance of schizophrenia.

It is also thought that processes in early neurodevelopment are important, particularly prenatal processes. In adult life, particular importance has been placed upon the function (or malfunction) of dopamine in the mesolimbic pathway in the brain. This theory, known as the dopamine hypothesis of schizophrenia largely resulted from the accidental finding that a drug group which blocks dopamine function, known as the phenothiazines, reduced psychotic symptoms. However, this theory is now thought to be overly simplistic as a complete explanation. These drugs have now been developed further and antipsychotic medication is commonly used as a first-line treatment. Although effective in many cases, these medications are not well tolerated by some patients due to significant side-effects. The positive symptoms are more responsive to medications; negative symptoms being less so.

Differences in brain structure have been found between people with schizophrenia and those without. However, these tend only to be reliable on the group level and, due to the significant variability between individuals, may not be reliably present in any particular individual. Significant brain atrophy and enlarged ventricles are the most conspicuous of such differences.

Causes

The causes of schizophrenia are not known. However, an interplay of genetic, biological, environmental, and psychological factors are thought to be involved. We do not yet understand all the causes and other issues involved, but current research is making steady progress towards elucidating and defining causes of schizophrenia.

In biological models of schizophrenia, genetic (familial) predisposition, infectious agents, allergies, and disturbances in metabolism have all been investigated.

Schizophrenia is known to run in families. Thus, the risk of illness in an identical twin of a person with schizophrenia is 40-50%. A child of a parent suffering from schizophrenia has a 10% chance of developing the illness. The risk of schizophrenia in the general population is about 1%.

The current concept is that multiple genes are involved in the development of schizophrenia and that factors such as prenatal (intrauterine), perinatal, and nonspecific stressors are involved in creating a disposition or vulnerability to develop the illness. Neurotransmitters (chemicals allowing the communication between nerve cells) have also been implicated in the development of schizophrenia. The list of neurotransmitters under scrutiny is long, but special attention has been given to dopamine, serotonin, and glutamate.

Also, recent studies have identified subtle changes in brain structure and function, indicating that, at least in part, schizophrenia could be a disorder of the development of the brain.

It is important for doctors to investigate all reasonable medical causes for any acute change in someone’s mental health or behavior. Sometimes a medical condition that might be treated easily, if diagnosed, is responsible for symptoms that resemble those of schizophrenia.


Symptoms

Usually with schizophrenia, the person's inner world and behavior change notably. Behavior changes might include the following:

  • Social withdrawal

  • Depersonalization (intense anxiety and a feeling of being unreal)

  • Loss of appetite

  • Loss of hygiene

  • Delusions

  • Hallucinations (eg, hearing things not actually present)

  • The sense of being controlled by outside forces

A person with schizophrenia may not have any outward appearance of being ill. In other cases, the illness may be more apparent, causing bizarre behaviors. For example, a person with schizophrenia may wear aluminum foil in the belief that it will stop one's thoughts from being broadcasted and protect against malicious waves entering the brain.

People with schizophrenia vary widely in their behavior as they struggle with an illness beyond their control. In active stages, those affected may ramble in illogical sentences or react with uncontrolled anger or violence to a perceived threat. People with schizophrenia may also experience relatively passive phases of the illness in which they seem to lack personality, movement, and emotion (also called a flat affect). People with schizophrenia may alternate in these extremes. Their behavior may or may not be predictable.

In order to better understand schizophrenia, the concept of clusters of symptoms is often used. Thus, people with schizophrenia can experience symptoms that may be grouped under the following categories:

  • Positive symptoms - Hearing voices, suspiciousness, feeling under constant surveillance, delusions, or making up words without a meaning (neologisms).

  • Negative (or deficit) symptoms - Social withdrawal, difficulty in expressing emotions (in extreme cases called blunted affect), difficulty in taking care of themselves, inability to feel pleasure (These symptoms cause severe impairment and are often mistaken for laziness.)

  • Cognitive symptoms - Difficulties attending to and processing of information, in understanding the environment, and in remembering simple tasks

  • Affective (or mood) symptoms - Most notably depression, accounting for a very high rate of attempted suicide in people suffering from schizophrenia

Helpful definitions in understanding schizophrenia include the following:

  • Psychosis: Psychosis is defined as being out of touch with reality. During this phase, one can experience delusions or prominent hallucinations. People with psychoses are not aware that what they are experiencing or some of the things that they believe are not real. Psychosis is a prominent feature of schizophrenia but is not unique to this illness.

  • Schizoid: This term is often used to describe a personality disorder characterized by almost complete lack of interest in social relationships and a restricted range of expression of emotions in interpersonal settings, making a person with this disorder appear cold and aloof.

  • Schizotypal: This term defines a more severe personality disorder characterized by acute discomfort with close relationships as well as disturbances of perception and bizarre behaviors, making people with schizophrenia seem odd and eccentric because of unusual mannerisms.

  • Hallucinations: A person with schizophrenia may have strong sensations of objects or events that are real only to him or her. These may be in the form of things that they believe strongly that they see, hear, smell, taste, or touch. Hallucinations have no outside source, and are sometimes described as "the person's mind playing tricks" on him or her.

  • Illusion: An illusion is a mistaken perception for which there is an actual external stimulus. For example, a visual illusion might be seeing a shadow and misinterpreting it as a person. The words "illusion" and "hallucination" are sometimes confused with each other.

  • Delusion: A person with a delusion has a strong belief about something despite evidence that the belief is false. For instance, a person may listen to a radio and believe the radio is giving a coded message about an impending extraterrestrial invasion. All of the other people who listen to the same radio program would hear, for example, a feature story about road repair work taking place in the area.

Types of schizophrenia are as follows:

  • Paranoid-type schizophrenia is characterized by delusions and auditory hallucinations but relatively normal intellectual functioning and expression of affect. The delusions can often be about being persecuted unfairly or being some other person who is famous. People with paranoid-type schizophrenia can exhibit anger, aloofness, anxiety, and argumentativeness.

  • Disorganized-type schizophrenia is characterized by speech and behavior that are disorganized or difficult to understand, and flattening or inappropriate emotions. People with disorganized-type schizophrenia may laugh at the changing color of a traffic light or at something not closely related to what they are saying or doing. Their disorganized behavior may disrupt normal activities, such as showering, dressing, and preparing meals.

  • Catatonic-type schizophrenia is characterized by disturbances of movement. People with catatonic-type schizophrenia may keep themselves completely immobile or move all over the place. They may not say anything for hours, or they may repeat anything you say or do senselessly. Either way, the behavior is putting these people at high risk because it impairs their ability to take care of themselves.

  • Undifferentiated-type schizophrenia is characterized by some symptoms seen in all of the above types but not enough of any one of them to define it as another particular type of schizophrenia.

  • Residual-type schizophrenia is characterized by a past history of at least one episode of schizophrenia, but the person currently has no positive symptoms (delusions, hallucinations, disorganized speech or behavior). It may represent a transition between a full-blown episode and complete remission, or it may continue for years without any further psychotic episodes.

Tuesday, March 20, 2007

Multiple Sclerosis: An Overview

Focus On Multiple Sclerosis: Need Of The Hour




Multiple Sclerosis (abbreviated as MS, also known as disseminated sclerosis or encephalomyelitis disseminata) is a chronic, inflammatory and potentially debilitating disease that affects the Central Nervous System (CNS), which is made up brain and spinal cord of unknown etiology. Multiple sclerosis is widely believed to be an autoimmune disease, a condition in which one in which the body, through its immune system, launches a defensive attack against its own tissues.




Multiple sclerosis affects neurons, the cells of the brain and spinal cord that carry information, create thought and perception, and allow the brain to control the body. Surrounding and protecting these neurons is a fatty layer known as the myelin sheath, which helps neurons carry electrical signals. In multiple sclerosis, the body mistakenly directs antibodies and white blood cells against proteins in the myelin sheath, a fatty substance that insulates nerve fibers of the brain and spinal cord. This results in inflammation and injury to the sheath and ultimately to the nerves that it surrounds. The result may be multiple areas of scarring (sclerosis). The myelin is broken down in patches throughout the central nervous system and the damaged patches become scarred (this is where the name comes from - sclerosis meaning scars and multiple means many). Without the myelin coating, nerve messages cannot travel normally and they can become garbled or lost so that the instructions sent by the nervous system to different parts of the body are disrupted and subsequent axonal degeneration. Eventually, this damage can slow or block the nerve signals that control muscle coordination, strength, sensation and vision. This scarring causes symptoms which vary widely depending upon which signals are interrupted.

Multiple sclerosis is unpredictable and varies in severity. In some people, multiple sclerosis is a mild illness, but it can lead to permanent disability in others. Treatments can modify the course of the disease and relieve symptoms.


Clinical Manifestations:
The onset of MS may be insidious or sudden. Common presenting symptoms include monocular visual impairment with pain (optic neuritis), paresthesias, weakness, and impaired coordination (Table 1). The most common clinical signs and symptoms at presentation include sensory disturbance of the limbs , partial or complete visual loss , acute and subacute motor dysfunction of the limbs , diplopia and gait dysfunction

MS frequently is overlooked because initial symptoms resolve spontaneously in most patients. Relapses occur within months or years. In some patients, however, MS has a primary progressive course from onset.


TABLE 1: Common Symptoms and Signs of Multiple Sclerosis
Symptoms
Depression
Dizziness or vertigo
Fatigue
Heat sensitivity
Lhermitte's sign (electrical sensation down the spine on neck flexion)
Numbness, tingling, pain
Urinary bladder dysfunction
Visual impairment (monocular or diplopia)
Weakness
Signs
Action tremor
Decreased perception of pain, vibration, or position
Decreased strength
Hyperreflexia, spasticity, Babinski's sign
Impaired coordination and balance
Impaired visual acuity or red color perception with optic disc pallor and afferent pupillary defect; disconjugate eye movements
Nystagmus


These signs and symptoms may occur in isolation or in combination, and have to be present for a minimum of 24 hours to be considered a "clinical attack." As any anatomical location of the CNS may be affected, the clinical presentation of individuals with Multiple Sclerosis is extremely variable.

The course may be relapsing-remitting or progressive, severe or mild, and may involve the entire neuroaxis in a widespread fashion or predominantly affect spinal cord and optic nerves. Very little is known about the underlying cause of disease course variability in Multiple Sclerosis. Individuals can be stable for many months or years, while suddenly experiencing a devastating clinical attack. Currently, no biological markers can assist the clinician in predicting the clinical course and/or the accumulation of disability. Within families, the clinical course of Multiple Sclerosis among affected relatives can span the entire spectrum of possibilities — the clinical course does not run true to type in families .Clinical disease progression is assessed by recording the accumulation of neurological disability with valid methodological tools, including the expanded disability status scale (EDSS).

The Three Clinical Phenotypes Of Multiple Sclerosis:
The course of MS is difficult to predict, and the disease may at times either lie dormant or progress steadily. Several subtypes, or patterns of progression, have been described. Subtypes use the past course of the disease in an attempt to predict the future course. A person diagnosed with a particular subtype may, for unclear reasons, switch from one subtype to another over time. Subtypes a­­­re important not only for prognosis but also for therapeutic decisions. In 1996 the National Multiple Sclerosis Society standardized the following four subtype definitions

Relapsing-Remitting Multiple Sclerosis (RR-MS):
Initially, more than 80% of individuals with MS experience a relapsing-remitting disease course with defined clinical exacerbations of neurological symptoms, followed by complete or incomplete remission. This subtype is characterized by unpredictable attacks (relapses) followed by periods of months to years of relative quiet (remission) with no new signs of disease activity. Deficits suffered during the attacks may either resolve or may be permanent. When deficits always resolve between attacks, this is referred to as "benign" MS.
Approximately ten years after disease onset, an estimated 50% of individuals with RR-MS convert to a progressive clinical course called secondary progressive (SP) MS, which is no longer characterized by clinical attacks and remissions, but by insidious progression of clinical symptoms.

Secondary Progressive Multiple Sclerosis:
Secondary progressive describes around 80% of those with initial relapsing-remitting MS, who then begin to have neurologic decline between their acute attacks without any definite periods of remission. This decline may include new neurologic symptoms, worsening cognitive function, or other deficits. Secondary progressive is the most common type of MS and causes the greatest amount of disability.

Primary-Progressive Multiple Sclerosis (PP-MS):
Primary progressive describes the approximately 10% of individuals who never have remission after their initial MS symptoms. Decline occurs continuously without clear attacks. The primary progressive subtype tends to affect people who are older at disease onset.

Progressive Relapsing Multiple Sclerosis (PR-MS):
Progressive relapsing describes those individuals who, from the onset of their MS, have a steady neurologic decline but also suffer superimposed attacks. A significantly rarer form is progressive relapsing MS, which initially presents as PP-MS; however, during the course of the disease, these individuals develop true neurological exacerbations. Individuals with SP-MS who have clinical exacerbations followed by incomplete remission are included in this category.

Friday, March 9, 2007

The Knowledge Process

Knowledge Is Wealth


The past few decades has seen a tremendous growth in the amount of biological data, specifically in the areas of genomics and proteomics. This growth is accompanied by an accelerated increase in the number of biological publications discussing the findings. In the last few years, there has been a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature and find information most relevant and useful for specific analysis.

Several advances in computational and biological methods have improved scale of biomedical research. Complete genomes can now be sequenced within a short span of time (months). Computational methods hasten the identification of numerous genes within the sequenced data. Several automated tools are developed for analyzing properties of these genes and proteins they code. Large-scale experimental methods produce large quantities of data which when processed, can provide information about gene expression patterns, E.g. Which genes are expressed in various tissues, and which ones are over/under expressed at the onset of a disease or during a specific phase of the cell development.

It is to be noted that “The ultimate goal of conducting large-scale biology is to translate these large amounts of information into knowledge of the complex biological processes governing the human body and to utilize this knowledge to advance healthcare and medicine”. All information pertaining to genes, proteins, and their role in biological processes is reported somewhere in the vast amount of published biomedical literature. This clearly shows that the advancement of genome sequencing techniques is always accompanied by a proportionate increase in the literature discussing the discovered genes.
Therefore it is necessary to manage the tremendous amount of literature available, to extract meaningful information from them. This is where Text Mining (knowledge process) plays an important role.

Text Mining alternately referred to as text data mining, is generally referred to the process of deriving high quality information from text. High quality information is typically derived through the divining of patterns and trends through means such as statistical pattern learning. Text mining involves the process of structuring the input text, deriving patterns within the structured data, finally evaluation and interpretation of the output. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, sentiment analysis, document summarization, and relationship between entities. Automated literature mining offers an untapped opportunity to integrate many fragments of information gathered by researchers from multiple fields of expertise into a complete picture exposing the interrelated roles of various genes, proteins, and chemical reactions in cells and organisms.

The last few years has seen a surge of interest in using the biomedical literature, ranging from relatively modest tasks such as finding reported gene location on chromosomes to more ambitious attempts to construct putative gene networks based on gene-name co-occurrence within articles. Since the literature covers all aspects of biology, chemistry, and medicine, there is no limit to the types of information that may be recovered through careful and exhaustive mining. Some possible applications for such efforts include the reconstruction and prediction of pathways, establishing connections between genes and disease, finding the relationships between genes and specific biological functions, and much more. It is important to note that a single mining strategy is unlikely to address this wide spectrum of goals and needs. Regardless of the explicit goal, there are several major hurdles to overcome when using the biomedical literature for finding information. The most obvious is the sheer number of available articles, which is continuously growing. For instance, the most widely used biomedical literature database, NCBI’s PubMed, contains over 12,000,000 abstracts. A query for abstracts mentioning gene or protein returns about 3,000,000 articles, of which nearly two thirds were published just within the past decade. It was noted that this prolific database by no means covers all the publications in all the areas related to biomedicine, but rather, just those meeting certain criteria. Another major problem that arises when searching for the literature relevant to specific entities such as a gene, a protein, or a disease- is the level of ambiguity seen in both the English language and the biomedical jargon; were we may miss relevant papers, as well as retrieve irrelevant ones.

Text mining and knowledge extraction are ways to aid researchers in coping up with the information overload. Text mining can be differentiated from information retrieval (IR) and text summarization (TS). Information retrieval and Text Summarization focus on the larger units of text such as documents; while Text Mining operates at a finer level of granularity and examines the relationships between specific kinds of information contained both within and between documents. Text mining is also differentiated from Natural Language Processing (NLP) in that NLP attempts to understand the meaning of text as a whole, while text mining and knowledge extraction concentrate on solving a specific problem in a specific domain identified a priori (possibly using some NLP techniques in the process). For example, text mining can aid database curators by selecting articles most likely to contain information of interest or potential new treatments for migraine may be determined by looking for pharmacological substances that are associated with biological processes associated with migraine.

Past
Labour-intensive manual text-mining approaches first surfaced in the mid-1980s, but technological advances have enabled the field to advance swiftly during the past decade. Text mining is an interdisciplinary field which draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. As most information (over 80%) is currently stored as text, text mining is believed to have a high commercial potential value.

Present Scenario
The present day text mining tools aims in identifying the named entity within a collection of text. For e.g. all the drug names within a group of articles. The idea behind this is that by recognizing biological entities within a group of articles allows further extraction of relationship and other information by identifying the key concept of interest; by doing so they can be represented in a normalized form. This has however been challenging due to several reasons.

Since there is no complete dictionary for most type of biological entities, apart from this some phrases can refer to two different things depending on the context. A known fact is that biological entities have more than a single name. To top up this is that many biological entities have several multi-words; which complicates the process for defining name boundaries that would overlap the candidate gene.

Text Classification attempts to automatically determine if the articles have the characteristics of the search performed.Extract information relating to the search.Applying them to candidate using decision making process.Giving results related to the query, thus helping retrieve connected information pertaining to the query


Synonym And Abbreviation
There has been a tremendous growth seen in the biological terminologies seen accompanied with the increase in biological literature. Complicating this is that many biological entities have multiple names and abbreviation. By including these synonyms and abbreviations in the search would result in higher efficiency of the text mining tool. This is one area which has been developed recently, and improvements are being made. One such would be by collecting these synonyms and abbreviation to aid the user to perform literature searches.

Relationship Extraction
This helps in detecting specific relationship between a pair of named entities or more. Though the entities are related and specific, the relationship established between the two might be either very specific or general. Depending on the type of entity; the extraction of relationship between them is found on text. This helps to uncover the preciously unrecognized relationship between the two entities.

Future Challenges
From all of the foregoing, it is clear that biomedical text mining has great potential. It is indeed a sad state that potential is yet unrealized. Text-mining tools are not part of the standard arsenal of the biomedical researcher in the way that search engines and sequence alignment tools are. The major challenge for the next 5–10 years of text-mining work is the creation of text-mining tools to provide a clear benefit to these researchers, allowing them to be more productive given increasing challenges due to information growth. The focus must be more on helping biomedical researchers to solve real-world problems that are inhibiting the pace of research and less on evaluations based on system output independent of meeting user needs. Advances on several fronts are necessary for this to become a reality.