| |
CURRENT PROJECTS WITH RECENT PUBLICATIONS
Image Interpretation
Image interpretation, which is effortless and instantaneous for human
beings, is the grand challenge of computer vision. The dream is to
build a "description machine" which produces a rich semantic
description of the underlying scene, including the names and poses of
the objects that are present, even "recognizing" other things, such as
actions and context. Mathematical frameworks are advanced from time
to time, but none is yet widely accepted, and none clearly points the
way to closing the gap with natural vision. Together with students
and colleagues in the Center for Imaging Science, I am pursuing an
approach to image interpretation based on hierarchical structures. We
develop practical vision algorithms (e.g., for detecting faces) as
well as theoretical results (e.g., on the efficiency of
hierarchical search).
In our view, no advances in computers or statistical learning will
overcome the "small sample dilemma" in vision; some global
organizational framework is necessary. As in "twenty questions", the
enabling assumption is that interpretations may be grouped into
natural subsets with shared features. This suggests a coarse-to-fine
modeling of both the interpretations and the computational process,
uniting representation and processing using a hierarchy of
classifiers. It also provides for progressive annotation and
concentrates processing on ambiguous areas. Recently, we have
introduced the "trace model," a graphical network based on the history
of processing; using the trace as a sufficient statistic,
coarse-to-fine search can be characterized as an exact implementation
of a likelihood ratio test.
- S. Gangaputra and D. Geman, "A design principle for coarse-to-fine classification,"
Proceedings
CVPR 2006, 2, 1877-1884, 2006.
(pdf)
- S. Gangaputra and D. Geman, "The trace model for object detection and tracking," Towards Category-Level
Object Recognition , Lecture Notes in
Computer Science, 4170, 401-420, 2006.
(pdf)
- H. Sahbi and D. Geman, "A hierarchy of support vector machines
for pattern detection," Journal of Machine Learning Research, 7, 2087-2123, 2006.
(pdf)
- Y. Amit, D. Geman and X. Fan, "A coarse-to-fine strategy for
multi-class shape detection," IEEE Trans. PAMI, 28,
1606-1621,
2004.
(pdf)
Molecular Cancer Diagnosis
The mRNA counts in gene mircroarrays provide a global view of cellular
activity by simultaneously recording the expression levels of
thousands of genes. Nonetheless, there are technical and practical
limitations in using these data for analyzing disease. Accurate
statistical inference is difficult due to the small number of
observations, typically tens, relative to the large number of genes,
typically thousands. Moreover, conventional methods from machine
learning lead to decisions which are usually very difficult to
interpret in simple or biologically meaningful terms.
Together with researchers at the Institute for Computational Medicine
I am investigating a new approach to molecular classification based
entirely on pairwise mRNA (or protein) comparisons ("relative
expression reversals"). This provides simple classification rules
which involve very few genes and generate specific hypotheses for
follow-up studies. Since rank- based methods are largely invariant to
normalization, the small-sample problem can be addressed by
integrating inter-study microarray data, leading to the discovery of
more reliable markers. In particular, in the case of prostate cancer,
cross-platform validation indicates that simply comparing the relative
expression values of HPN and STAT6 achieves high sensitivity and
specificity.
- L. Xu, A-C Tan, D. Naiman, D. Geman and R. Winslow, "Robust prostate cancer marker genes emerge
from direct integration of inter-study microarray data," Bioinformatics 2005 21: 3905-3911.
(pdf)
- A-C Tan, D. Naiman, L. Xu, R. Winslow and D. Geman, "Simple decision rules for classifying human cancers
from gene expression profiles," Bioinformatics 2005 21: 3896-3904.
(pdf)
- D. Geman, C. d'Avignon, D. Naiman and R. Winslow, "Classifying gene
expression profiles from pairwise mRNA comparisons," Statist. Appl. in
Genetics and Molecular Biology, 3, 2004.
(pdf)
Small Sample Learning
Very high-dimensional data sets are ubiquitous in computational
vision, speech and biology, and raise serious challenges in
statistical learning. The difficulties are especially pronounced when
the objective is to uncover a complex statistical dependency structure
within a large number of variables and yet the number of samples
available for learning is relatively limited. This situation
frequently arises in computational biology, for instance in predicting
phenotypes (e.g., diagnosing cancers) from gene expression data or
reverse engineering regulatory networks; when measured against the
complexity of the systems being studied, the amount of data available
for modeling and inference is minuscule. Similarly, in computational
vision, full-scale scene interpretation is likely to be hopeless
without substantial "hardwiring" to compensate for the small number of
samples relative to the large number of possible descriptions.
Simply penalizing model complexity often biases estimation towards the
absence of interactions. Consequently, discovering rich structure
from small samples is nearly impossible unless the search is severely
restricted, in which case statistical learning may become feasible due
to variance reduction. At the Institute for Computational Medicine
and the Center for Imaging Science, we are pursuing several projects
involving a new class of ``decomposable'' graphical models designed
for stepwise dependency pursuit, and for which the number of
parameters grows slowly with the number of variables. In the case of
modeling gene networks, the embedded hierarchical structure is meant
to capture the organization of gene regulation in mammals, especially
cascades of transcription factors.
Twenty Questions Theory
Together with Gilles Blanchard, I have explored the theoretical
foundations of a "twenty questions" approach to pattern
recognition. The object of analysis is the computational process
itself rather than probability distributions (generative modeling) or
decision boundaries (predictive learning). Our formulation is
motivated by applications to scene interpretation in which there are a
great many possible explanations for the data, one ("background") is
statistically dominant, and it is imperative to restrict intensive
computation to genuinely ambiguous regions. We consider sequential
testing strategies in which decisions are made iteratively, based on
past outcomes, about which test to perform next and when to stop
testing.
The key structure is a hierarchy of tests, one binary test
("question") for each cell in a tree-structured, recursive
partitioning of the space of hypotheses. A central role in the
mathematical analysis is played by the ratio of the "cost" of a test,
meaning the amount of computation necessary for execution, to the
"selectivity" of the test, meaning the probability of correcting
declaring background (i.e., not hallucinating). Our main result is
that, under a natural model for total computation and certain
statistical assumptions on the joint distribution of the tests,
coarse-to-fine strategies are optimal whenever the cost-to-selectivity
ratio of the test at every cell is less than the sum of the
cost-to-selectivity ratios for the children of the cell. As might be
expected, under mild assumptions, good designs exhibit a steady
progression from broad scope coupled with low power to high power
coupled with dedication to specific explanations.
- G. Blanchard and D. Geman, "Sequential testing designs for pattern
recognition," Annals of Statistics, 33, 1155-1202, June, 2005.
(pdf)
Mental Image Matching
The standard problem in image retrieval is "query-by-visual-example":
the "query image" resides in an image database and is matched by the
system with other images. At INRIA in Paris, France, I am working
with researchers in the project IMEDIA on a different scenario: the
query image is "external" and the matching is "mental." For instance,
an image resides only in the mind of the user (or the user has an
actual object or photograph), who may seek versions of the same image,
e.g., the same face, or may seek images belonging to the same class,
e.g., similar landscapes, and responds to a sequence of
machine-generated queries designed to accelerate the search. For
example, the user declares which of several displayed images is
"closest" to his query. These similarity decisions are entirely
subjective. Also, since the images are generally indexed by low-level
intensity-based features rather than high-level semantic content, the
answers provided by the user may not cohere with the measures of
proximity employed by the retrieval system.
We are developing an interactive search engine which is based on
information theory and statistical inference. The display algorithm
involves a Bayesian relevance feedback model and an optimality
criterion based on conditional entropy. The modeling challenge is to
account for psycho-visual factors and sources of variability in human
decision making. Performance is measured by the expected number of
iterations necessary to match the identity (target search) or the
class (category search) of the query. Designing metrics and response
models which are consistent with human behavior is essential for
achieving practical results with large databases (art, faces, etc.).
- Y. Fang and D. Geman, "Experiments in mental face retrieval," Proceedings AVBPA 2005, Lecture Notes
in Computer Science, 637-646, July 2005. (Best Student Paper Award)
(pdf)
|
|