The article is on video based educational material where presenters deliver educational content. The authors developed a system which is capable of storing educational video clips with their semantics and retrieving required video clip segments efficiently on their semantics. The system creates profiles of presenters appearing in the video clips based on their facial features and uses these profiles to partition similar video clips into logical meaningful segments. This addresses one of the main problems identified in profile construction and propose a novel approach to create the profiles by introducing a profile normalisation algorithm
At its best, e-Learning is individual, customised learning that allows learners to choose and review material at their own pace at anytime anywhere. At its worst, it can disempower and demotivate learners by leaving them lost and unsupported in an immensely confusing electronic realm. Leveraging the most advanced technology, multimedia have raised the learners' interest and provide methods to learn effectively. Multimedia includes more than one form of media such as text graphics, animation, audio, video and video conferencing. Interactivity (interactive learning) is a term that means a computer is used in the delivery of learning material in the context of education and training. In a computer-based interactive learning environment, a person can navigate through it, select relevant information, respond to questions using input devices such as a keyboard, mouse, touch screen, or voice command system, complete tasks, communicate with others, and receive assessment feedback. Integration of heterogeneous data as content for e-Learning applications is crucial, since the amount and versatility of processable information is the key to a successful system. Multimedia database systems can be used to organise and manage heterogeneous multimedia e-Learning content.
In an attempt to integrate video clips into e-Learning we have realised that building an index on top of the video library is a requirement to provide efficient access to the video library. This will provide an easy mechanism for a student to navigate through the available video clips without downloading the entire clips and thus provide a solution to the limited bandwidth problem as well. To provide content based retrieval of digital video information, we employ a set of tools developed by us to segment video clips semantically into shots by using low level features. Then we identify those segments where presenters appear and extract the relevant information in key frames. These information are then encoded and compared with a database of similarly encoded key frames. The feature information in video frames of a face is represented as an eigenvector which is considered as a profile of a particular person. In our research, we have designed a multimodal multimedia database system to support content-based indexing, archiving, retrieval and on-demand delivery of audiovisual content in an e-learning. In this system, a feature selection and a feature extraction sub-system have been used to construct presenter profiles. The feature extraction process transforms the video key-frame data into a multidimensional feature space as feature vectors. These profiles are then used to construct an index over the video clips to support efficient retrieval of video shots. This allows the end-users to use the available bandwidth more efficiently.
The main components of our architecture are: a Media Server, Meta-Data Database, Ontology and Object Profiles, Keyword Extractor, Keyword Organiser, Feature Extractor, Profile Creator and the Query Processor. The main emphasis of this research is on the profile creation and normalisation components of the system. The first step of the profile constructor is to extract features from the video Key-frames which containing most of the static information present in a shot. The main inputs to the profile constructor are these key-frames stored in the multimedia database.
The presenter detection and recognition process detects the faces in the key frame and try to match it with the presenter profiles available in the profile database. If the presenter in the key-frames matches with a profile then the system annotates the video shot with the presenter identification and maps it with the metadata database. On the other hand, if the current presenter's key-frames do not match with the available profiles then the profile creator will create a new presenter profile and insert it in to the profile database.
The profile construction is based on Principal Component Analysis (PCA). The idea is to represent presenter's facial features in a featurespace where the individual features of a presenter are uncorrelated in the eigenspace. The feature space comprises of eigenvectors of the covariance matrix of the key-frame features. In this approach, PCA is computationally intensive when it is applied to the facespace. Through the experience we gained from our initial experiments we have realized that the efficiency of the PCA process in this context can be improved substantially by limiting the analysis to the largest eigenvectors of related key-frames instead of all eigenvectors of key-frames.
Profile normalisser acquires available profiles from profile database and executes the normalisation algorithm and returns the normalised profiles to the database. Since we get key-frames from different lighting conditions we have to have a proper dynamic profile normalization algorithm to maintain the accuracy of the profile matching algorithm to an acceptable level. Therefore we concentrate on two descriptors, normally the mean intensity and its standard deviation of the data set that we use to construct presenter profiles. After investigating the variation of the illumination and the deviation of the mean intensity and standard deviation of a collection of profiles, we have identified few parameters that can be used to develop an algorithm based on these parameters to normalize the profiles with respect to illumination.
In our previous approach we have constructed presenter profiles by getting the average intensity values of the faces of presenters in the key-frames of the training set. From the results gathered we have realised that, our system performance deteriorates when the video key-frames are captured at different illumination conditions. The effects of illumination changes in key-frames are due to one of the two factors: The inherent amount of light reflected off the skin of the presenter, or the non-linear adjustment in internal camera control. Both of these conditions can have a<