Recently, a growing number of the talks within conferences and particularly scientific ones, is being recorded for later retrieval. With the growth of the amount of recorded data, it becomes rather complex to find the appropriate video or video-sequence of a talk. For instance current search engines are not able to answer complex queries such as ”Find a sequence of a recorded talk, in an academic training lecture, in 2007, in Italy, where a colleague of professor X, talked about image indexing after the coffee break”.
This example illustrates the so-called semantic gap which as defined in [1] “is an important issue in many computer vision systems, but particularly for indexing. It refers to the lack of coincidence between machine low-level digital representations of visual data and the human high-level cognitive understanding of the same data”. This is particularly relevant for information retrieval activities. Low-level features can be automatically extracted during a video indexing stage (color, slide transition, slide animation, etc.), while higher-level features are based on rich human semantics (concepts, topics, people, etc.) and therefore involve human intervention [2], [3], [4].

In order to bridge the existing semantic gap, information should be indexed according to the users expectations, allowing tsearch engines to find suitable data matching users requirements. To allow users to submit complex queries such as the one cited above, we argue that the entire conference information refered by OHLA (cOnference High-Level informAtion), should be used to enrich video recording indexing. In our work OHLA refers to the entire information related to a conference: the video recording of the talk with all the information we can extract from its content (video segmentation, keywords, topics, etc.), the presentation of the talk (ppt, pdf, etc.), the speaker and attendee information (name, organization he/she belongs to it, publication), the administrative information (conference planning, logistics, etc.), the related demos, the related events, etc.. OHLA information is generated all along conference life cycle. Such information can be automatically or manually extracted and used to provide higher video content indexing from a user semantic point of view. Thus, OHLA can bridges the semantic gap. Manual annotation, is accurate because the description is based on human perception of the semantic content of the video. Unfortunately, manual annotation is a labor, intensive, time-consuming and tedious process especially with the exponentially growth of the video collection. Automatic annotation, as stated above, often lacks of the semantic dimension users need when searching for information. Semiautomatic approach reduces the burden of the manual annotation by combining the manual approach to the automatic one.

To address the issue of video content description several standards and annotation formats have been defined (MPEG-7, RDF, etc.). As a consequence data is annotated using heterogeneous formats, complicating even further the process of information retrieval.
Several works have been carried out to improve multimedia information retrieval by focusing on either the issue of video content description or heterogeneous metedata format. What is currently missing, is an integrated system that jointly considers these two issues in order to provide efficient information retrieval. Based on this observation, this thesis work presents CALIMERA (Conference Advanced Level Information ManagEment & RetrievAl) an integrated framework for content and context based video retrieval, based on HELO (High-level modEL Ontology) an integrated conference ontology model that models the Conference Integration Information.
-CALIMERA framework global view-

[1] Andrew P. G., Ph.D. thesis, Queen Mary, University of London, 2006.
[2] Ying Liua and al., “A survey of content-based image retrieval ith high-level semantics,” Sciences direct, Pattern recognition, 2007, pp. 262–282.
[3] I.L. Coman I.K. Sethi, “Mining association rules between low-level image features and high-level concepts,” SPIE Data Mining and Knowledge Discovery, 2001, vol. 3, pp. 279–290.
[4] Jonathon S. and al, “Bridging the semantic gap in multimedia information retrieval,” 2006.