Information Retrieval – A Multilingual Perspective

Since the advent of the World Wide Web (WWW), Information Retrieval (or IR, of which design of search algorithms and search engines is a major topic) has gained attention and popularity. One major impact of the WWW is that users have fundamentally changed their previous information seeking expectations and behavior: from “going to a library”, to expecting “the library comes to you”. A consequence is that foreign language resources of any country are often and easily available at one’s desktop. How does IR handle the situation?

This talk will give a brief history of IR research and describe how IR evolves to handle some of the Asian languages such as Chinese/Japanese/Korean for native speakers. For many studies or activities such as commerce, foreign affairs, science & technology, etc., westerners’ also need access to these foreign resources. To address this issue, IR has developed into CLIR – Cross-Lingual IR – by combining with automatic machine translation (MT). For example, users can input English queries to search and access Chinese documents efficiently and effectively. IR has been recognized as an important productivity enhancing tool since the web became popular in society, and CLIR is also considered significant for bridging the language gap between peoples.

Author Bio

Presented By:

Kui-Lam Kwok, a Professor of Computer Science at Queens College, got his B.Sc. from Hong Kong University and Ph.D. from Manchester University, England. His research interest is in Information Retrieval (IR) and associated topics such as automatic indexing, retrieval models, search methodologies. His work also includes translation/transliteration for cross language retrieval (CLIR): i.e. posing queries in one language to search documents in another – in particular for the English/Chinese language pair. IR has been recognized as an important productivity enhancement tool since web searching became popular in society, and CLIR is also considered highly significant for bridging the language gap between peoples.

About a decade and a half ago, the National Institute of Standards and Technology (NIST) has recognized the importance of IR, and designed "blind" retrieval environments called TREC (Text Retrieval Conference) for participants worldwide to experiment with their systems and algorithms in order to push IR technology forward. "Blind" means that before NIST's evaluation announcement in the annual TREC conference, no one knows the retrieval results. Prof. Kwok and his group participated in TREC continuously in the past years using his in-house developed PIRCS retrieval algorithm. New indexing techniques and retrieval approaches were researched and designed. These led to his participation returning top or near top automatic results in many years among many well known groups (see http://trec.nist.gov). PIRCS has been extended to support Chinese monolingual retrieval, which also won top submissions in TREC5 and 6.

In the past five years, the National Institute of Informatics (NII) at Tokyo has taken responsibility for Asian language retrieval from TREC, and initiated similar conferences called NTCIR (http://research.nii.ac.jp/ntcir-ws2 to ws4). Kwok and his group also participated in the last three (NTCIR-2 to 4), studying translation techniques and their influence on retrieval. They also returned top or near-top results for Chinese monolingual and English/Chinese CLIR among Asian participants.

Prof. Kwok's success in IR research and development has attracted over $1.5 million funding by U.S. government agencies and programs since 1992.