Approximate string matching:
Ricardo Baeza-Yates, University of Chile
ABSTRACT
Lecture 1:
String searching algorithms We will cover all the main paradigms
to search strings with and without mistakes, from classical ones such
as Knuth-Morris-Pratt and Boyer-Moore to algorithms based in bit parallelism
such as shitf-ot and shift-and. The first part will be conceptual
mentioning applications to the Web ana other type of data, while the
second part will be more technical.
Lecture 2:
BIO:
Ricardo Baeza-Yates received his Ph.D. in CS from U. of Waterloo,
Canada, in 1989. During 1993, he received the Organization of American
States award for young researchers in exact sciences. In 1994 obtained
the Chilean Engineers Institute award for his research work. In 1997,
with two Brazilian colleagues obtained the COMPAQ prize to the best
Brazilian CS research article in 1996. In 2002 was the first computer
scientist incorporated to the Chilean Academy of Sciences. He is the
current president of CLEI (Latin American CS Association) and member
of the board of governors of the IEEE-CS. Currently he is professor
and chair of the CS department at the University of Chile, as well
as director of the Center for Web Research. Among other publications,
he is co-author of Modern Information Retrieval (Addison-Wesley, 1999)
and the 2nd edition of the Handbook of Algorithms and Data Structures
(Addison-Wesley, 1991), and co-editor of Information Retrieval: Algorithms
and Data Structures (Prentice-Hall, 1992). His research interests
include algorithms and data structures, text retrieval, web mining,
and visualization applied to databases. He is member of the ACM, EATCS,
IEEE (senior), SCCC and SIAM.
http://www.dcc.uchile.cl/~rbaeza/
Introduction
to Discourse Representation Theory
Alistair Knott, University of Otago
ABSTRACT
In this seminar I will provide an introduction to Hans Kamp's Discourse
Representation Theory (DRT). DRT has been around for over 20 years
now: it originated as a relatively small extension of the predicate
calculus, and as a result of the many people who have used DRT in
their work since then, it has increased in scope quite considerably.
It is now probably the dominant framework for formal treatments of
natural language semantics.
I will begin by outlining the motivations
for 'classical' DRT, which have their origins in hoary philosophical
questions about referring expressions in natural language. I will
then informally describe the syntax and semantics of DRT. Then I will
discuss some extensions of DRT which are now more or less assimilated
into the formalism, in particular treatments of plurality and presupposition.
I will conclude by looking at some current uses of DRT in language
technology applications.
Reference: Hans Kamp, Josef van Genabith
and Uwe Reyle "Discourse Representation Theory: An updated survey".
DRAFT of an article to appear in the new edition of the Handbook of
Philosophical Logic, available at http://www.ims.uni-stuttgart.de/~hans/hpl-drt.pdf
BIO
http://www.cs.otago.ac.nz/staff/ali.html
top
Text planning
in the Large: Discourse Structure
Text Planning in the Small: Referring Expressions
Robert Dale, Macquarie University
ABSTRACT
In this series of two lectures we will look at what is involved in
getting a computer to plan the content of a text.
Lecture 1:
Text Planning in the Large: Discourse Structure
We introduce the architecture of natural language generation systems,
identifying the key components and the information they use. We focus
in on the task of taking a body of content to be expressed and organising
this into a coherent discourse, and examine a number of different
approaches that have been taken to the structural organisation of
text.
Lecture 2:
Text Planning in the Small: Referring Expressions
Natural language generation systems not only need to plan how to organise
the body of information they want to express into coherent paragraphs.
They also need to reason about the detail within individual sentences,
and nowhere is this more evident than in the generation of referring
expressions. Here the need is to determine how to refer to entities
and sets in the domain of discourse so that the hearer knows what
is being talked about.
BIO
Professor Robert Dale is Director of the Centre for Language Technology
at Macquarie University in Sydney, where he teaches on various aspects
of language technology. After completing his PhD in Computational
Linguistics at the University of Edinburgh in 1989, he taught in the
Centre for Cognitive Science at Edinburgh, before taking up a position
with Microsoft in Sydney in 1994. He was Director of the Microsoft
Research Institute at Macquarie University (1996-1999). His research
interests include intelligent text processing; natural language generation;
spoken language dialog systems; and reference and anaphora. He is
author or editor of five books and around 60 papers in various aspects
of natural language processing, and is editor of the Journal of Computational
Linguistics.
http://www.ics.mq.edu.au/~rdale/
top
Language technologies
and HCI:
Cecile Paris, CSIRO
ABSTRACT
Human-Computer Interaction (HCI) is concerned with studying
how to best design technology so as to ensure that it will fit naturally
into the users' environment and be appropriate for the task at hand,
and that the interaction between humans and the machine will be smooth.
Language is now often employed as a means of interaction. It is thus
important to understand some of the principles and methodologies employed
in HCI to help design systems. In this lecture, we will look at some
of the techniques employed in HCI, such as task analysis and task
centred design. We will also discuss the potential for cross-fertilization
between the two disciplines (HCI and Natural Language Processing).
BIO
Cécile Paris is a Principal Research Scientist at CSIRO/ICT Centre,
leading the area of research concerned with Delivering Information
in Context. She is also an Honorary Associate in the Language Technology
Centre at Macquarie University (Division of Information and Communication
Sciences), and at the School of Information Technologies at Sydney
University. Her main research interests lie in the areas of Language
Technology, User Modelling and HCI. Cécile did her PhD in Computational
Linguistics at Columbia University (New York). She worked 7 years
at USC/ISI (Marina del Rey, Los Angeles) after her PhD, leading a
research programme in text planning and generation. Before coming
to Australia, she was at ITRI (Brighton, UK), leading work on multilingual
generation systems. Cécile is currently the chair of CHISIG, the Computer
Human Interaction Special Interest Group of the Ergonomics Society
of Australia.
http://www.cmis.csiro.au/Cecile.Paris/
top
Linguistic annotation
and the Annotation Graph Toolkit
Steven Bird, University of Melbourne
ABSTRACT
Annotated corpora have been a critical component of research
in the speech and language sciences for some years. Today, these corpora
are being created and deployed for a rapidly expanding set of languages,
disciplines and technologies. A wealth of formats and tools have sprung
up around this enterprise, many of which are documented on the Linguistic
Annotation page [http://www.ldc.upenn.edu/annotation/]. Linguistic
annotation is a term which covers any descriptive or analytic notations
applied to raw language data. The basic data may be in the form of
time functions - audio, video and/or physiological recordings - or
it may be textual. The added notations may include transcriptions
of all sorts (from phonetic features to discourse structures), part-of-speech
and sense tagging, syntactic analysis, "named entity" identification,
co-reference annotation, and so on. This lecture will present a model
of linguistic annotation which provides a simple framework for representing
and manipulating complex, heterogeneous, multi-layered annotations.
The model uses "annotation graphs": directed acyclic graphs having
labels on the edges and time-offsets on the nodes. The lecture will
cover the formalism, the software infrastructure, and practical applications.
BIO
Steven Bird is Associate Professor of Computer Science and Software
Engineering, and he teaches human language technology and supervises
several research students working in this area. His research focuses
on formal and computational models for linguistic information, with
application to human language technologies and to the description
of the world's ~7,000 languages. Before coming to Melbourne University
he did doctoral and post-doctoral research at the University of Edinburgh
(1987-94). From 1995-97 he conducted linguistic fieldwork on the languages
of western Cameroon, published a dictionary, and helped develop several
new writing systems. From 1998-2002 he was associate director of the
Linguistic Data Consortium at the University of Pennsylvania, where
he led an R&D team working on open-source software for linguistic
annotation.
http://www.cs.mu.oz.au/~sb/
top
Linguistic
Annotation and the Emu Speech Database System
Steve Cassidy, Macquarie University
ABSTRACT
BIO
http://www.ics.mq.edu.au/~cassidy/
top
Language in
a Social Setting: an agent based perspective on Language Technology
Peter Wallis, University of Melbourne
ABSTRACT
As described by Lochbaum, Grosz and Sidner, approaches to computer
generated conversation fall into one of two broad categories: the
intentional model, in which dialogue structure is analysed from the
perspective of user goals, and the more conventional informational
model, in which the purpose of language is to convey information.
This presentation starts with arguments for taking the intentional
perspective and goes on to show how it is applied to developing descriptions
of dialogue structure that explicitly address factors commonly grouped
under the banner of social intelligence. The motivation for the work
described has been the so called agent based approach to AI in which
the focus is on the situated and autonomous nature of some software
entities. Rather than focusing on what was said, the focus should
be on what to say next in order to achieve goals. This line of investigation
has highlighted the need for conversational computers to understand
social conventions. What is the effect on a human if the machine follows
or breaks these conversational norms? The talk is intended for both
practitioners currently working with applied language technology such
as chat bots or automated call handling, and for graduate students
looking for potential research topics in what is traditionally called
conversation analysis.
BIO
Peter Wallis has a bachelor of arts from Flinders University with
majors in Computer Science and Philosophy, and a Ph.D. from RMIT in
semantics for search engines. He has a long history of working on
applied natural language: In his Ph.D. work, he used the Longman's
Dictionary of Contemporary English (LDOCE) to produce canonical versions
of text meaning. He then worked at Defence Science and Technology
Organization on information extraction, and initiated the "fact extractor"
architecture that has gone operational at several sites within Defence.
Since 1998 he has been developing an interest in dialogue and is currently
trying to commercialize some ideas on dialogue management for the
VoiceXML community. Key publications have been on the evaluation of
Language Technology, and when pressed he will argue for a functional
view of semantics. http://www.cs.mu.oz.au/~peter/
top
Deep Lexical Acquisition
Timothy Baldwin, CSLI Stanford
ABSTRACT
ABSTRACT: Deep processing involves applying "precision grammars"
(i.e. linguistically-precise grammars, such as HPSGs) to natural language
analysis, and has significant advantages over shallow methods in terms
of its ability to capture fine-grained lexical and constructional
interactions and produce a rich semantic representation. The main
limitation of deep processing is coverage, which tends to be restricted
due to the detailed annotation required to encode individual lexical
items in precision grammars. This talk will tackle the question of
how to expand the coverage of a precision grammar through the automatic
acquisition of lexical features and ultimate type classification of
a given word. I will use the English Resource Grammar as a test case,
and outline a range of methods by which new lexical items can be acquired
either directly through the application of the grammar, or indirectly
through techniques drawing on corpus data and/or semantic ontologies.
BIO
Timothy Baldwin
is a Senior Researcher at the Centre for the Study of Language and
Information (CSLI), Stanford University. He is a member of the CSLI
LinGO Multiword Expression Project, specialising in the lexical acquisition,
semantic classification and machine translation of multiword expressions.
Other recent research interests include computational lexical semantics,
the interface between theoretical and computational linguistics, and
computer-assisted language learning applications for computational
linguistics. http://www-csli.stanford.edu/~tbaldwin/
top