Projects

Real Time Goal Detection in Soccer Match using Hidden Markov Model, LG Soft India

Developed a real time goal detection tool that will generate an automatic cheering effects at every goal hits in the soccer match video. An approaches used for the keywords spotter implementation is to consider individual models for the keywords i.e. actual model and to represent non keywords "background" or "garbage" models are used. Classifier is modelled using hidden Markov models (HMM). Conditional probability of test signal given actual model i.e. P(test | actual_model) and conditional probability of test signal given background model i.e. P(test | background_model) is compared with the theshold set to make final decision.

Development of SMS Compression Techniques for Indian Languages, CDAC Pune, IIT Madras

Developed a standard loss less compression algorithm that can allow more number of characters of Hindi language in SMS text. A novel encoding scheme is proposed along with several modifications to standard schemes making them efficient for transmission of Hindi and multilingual text. The encoding schemes allow the transmission of around 160 characters for pure Hindi, and multilingual text. The efficiency of the proposed schemes is evaluated by conducting experiments on a multilingual database specially collected from twitter using dictionary learning. Performance evaluation shows that these encoding schemes allow nearly 160 characters per SMS for messages in both Hindi and multilingual text. [Link]

Video Analyzer using Command Line Video Quality Metric, LG Soft India

Designed and developed a video analyzer tool which is used for scoring the similarity or difference between two videos on the basis of dropped frames, broken macro blocks, etc. Tool performs automated processing on a pair of video files. One contains an original video sequence (e.g., straight from the reference device) and the other contains a processed video sequence (e.g., after coding and transmission and decoding) and performs video calibration and video quality estimation.

Audio Analyzer using Dynamic Time Warping Algorithm, LG Soft India

Designed and developed an audio analyzer tool which is used for scoring the similarity and difference in audio effects between reference and test device. The proposed algorithm is capable of preprocessing the test music or speech signal for silence removal and proper end pointing and extraction of meaningful parameters (also called feature extraction) for template matching using Dynamic Time Warping (DTW) method and Frame Matching(FM) using Sliding window method.

Domain Specific Speech based Search Engine, DIT

Developed a novel methodology for indexing domain specific audio archives using linguistic information present in the speech signal. In this work a novel methodology for indexing domain specific audio archives using linguistic information present in the speech signal is discussed. The audio indexing system is phone based and can work under limited training data conditions. A training data set that captures the linguistic information within Hindi language at the syllable level is first developed. A reduced phone set is then derived from the super syllabic set of the Hindi language. The system is then bootstrapped at the phone level with domain specific data. The audio indexing itself is then performed using a novel sliding phone protocol technique. The performance of such a audio indexing system is then evaluated for Indian parliament speech and read news. The proposed bootstrapping method with sliding phone search provides reasonable improvements in phone recognition accuracy and in terms of search retrieval efficiency when compared to conventional methods. [Link]

Development of Prosodically Guided Phonetic Engine for Hindi, DIT

Developed a phonetic engine using prosodic and phonetic information for Hindi language. Phonetic engine is a machine which represents all the information present in the speech so that speech can be exactly reproducible. It is thought of different tiers, viz., phonetic transcription, syllabification, pitch marking (pitch index) and break index (break marking). Phonetic transcription tier consists of International Phonetic Alphabet (IPA)-based phonetic symbols along with few diacritic marks. IPA symbols are basically articulatory features. Phonetic transcription resembles speech production information. Context independent monophones are modeled as 3-states of Hidden Markov model and each IPA symbols are modeled by HMM model. A flat start approach is employed since no phone boundaries are present. Syllabification tier consists of syllables formed by phonetic transcription and their timing alignment with respect to waveform. Syllable segmentation can be performed using spectral transition measure or by estimating the minima short term energy profile. Pitch marking and break marking tiers are mostly consist of supra segmental information. Prosodic break and marking are very important information, it is very difficult to mark them on absolute position so they are marked on relative with respect to adjacent context. These are based on source characteristics so F0 estimation is used to design these tiers. To demonstrate this work, a telephone based audio indexing system for searching through audio archive in Hindi is developed. [Link]

Digital Mandi for the Indian Kissan, BSNL

Digital Mandi Application for Indian Kisan is a unique web and cell phone based multimodal agriculture commodity price retrieval system. It has been developed by BITCOE (BSNL IIT-Kanpur Centre of Excellence) at IIT-Kanpur. This service provides a registered farmer alerts through SMS and/or voice on his mobile about selected mandi rates of selected crops. In this way, this presents a unique web and cell phone based multi modal agriculture commodity pricing retrieval system. The query and retrieval can be done via internet kiosks or on any GPRS enabled cell phone For this service a farmer can register either through the help of officials available in Mandis (assisted mode) or directly through his mobile phone by SMS or on WAP portal. The registration menu has been prepared by IIT- Kanpur. This service shall be made available on subscription basis at very economical rates which will be deducted from his mobile phone. The system is multi lingual. English and other Indian Languages namely Hindi, Punjabi and Kannada are supported. Currently, the application has been deployed on the BSNL national network in Orissa and Haryana .It supports around 25000 farmers in Orissa and around 11000 in Haryana benefit from this unique service. The unique aspects of this system are that both the queries and the retrieval are multi modal. The service is now majorly operated from IIT Kanpur itself.[Link]