As the world we live in is getting more and more networked, the need to understand, manage, and harness the data on the web is becoming critical. While data in traditional databases tends to be highly structured, with a clear notion of schema, data on the web is loosely structured (also called se...
As the world we live in is getting more and more networked, the need to understand, manage, and harness the data on the web is becoming critical. While data in traditional databases tends to be highly structured, with a clear notion of schema, data on the web is loosely structured (also called semi-structured), or worse, unstructured, and is often not accompained by any clear notion of schema. What does it mean to query this data? What do you look for when you mine this data? If there are several data sources containing related information, how do you combine the information in them to answer queries involving them all? How can you index such data for efficient storage and retrieval? What do you do when the data you want to analyze is not stored some place but is streaming through?
My research over the past few years has been concerned with addressing these questions. I am also interested in newer applications which challenge the foundations and technology of databases. A good example is bioinformatics.