Content-based Digital Video Processing. Digital Videos Segmentation, Retrieval and Interpretation.
Ipson, Stanley S.
KeywordShot boundary detection
Video copy detection
Intellectual property rights (IPR)
Dgital video processing
Video highlights indexing
The University of Bradford theses are licenced under a Creative Commons Licence.
InstitutionUniversity of Bradford
DepartmentDepartment of Computing
MetadataShow full item record
AbstractRecent research approaches in semantics based video content analysis require shot boundary detection as the first step to divide video sequences into sections. Furthermore, with the advances in networking and computing capability, efficient retrieval of multimedia data has become an important issue. Content-based retrieval technologies have been widely implemented to protect intellectual property rights (IPR). In addition, automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this thesis, a paradigm is proposed to segment, retrieve and interpret digital videos. Five algorithms are presented to solve the video segmentation task. Firstly, a simple shot cut detection algorithm is designed for real-time implementation. Secondly, a systematic method is proposed for shot detection using content-based rules and FSM (finite state machine). Thirdly, the shot detection is implemented using local and global indicators. Fourthly, a context awareness approach is proposed to detect shot boundaries. Fifthly, a fuzzy logic method is implemented for shot detection. Furthermore, a novel analysis approach is presented for the detection of video copies. It is robust to complicated distortions and capable of locating the copy of segments inside original videos. Then, iv objects and events are extracted from MPEG Sequences for Video Highlights Indexing and Retrieval. Finally, a human fighting detection algorithm is proposed for movie annotation.
Showing items related by title, author, creator and subject.
Video extraction for fast content access to MPEG compressed videosJiang, Jianmin; Weng, Y. (2009-06-09)As existing video processing technology is primarily developed in the pixel domain yet digital video is stored in compressed format, any application of those techniques to compressed videos would require decompression. For discrete cosine transform (DCT)-based MPEG compressed videos, the computing cost of standard row-by-row and column-by-column inverse DCT (IDCT) transforms for a block of 8 8 elements requires 4096 multiplications and 4032 additions, although practical implementation only requires 1024 multiplications and 896 additions. In this paper, we propose a new algorithm to extract videos directly from MPEG compressed domain (DCT domain) without full IDCT, which is described in three extraction schemes: 1) video extraction in 2 2 blocks with four coefficients; 2) video extraction in 4 4 blocks with four DCT coefficients; and 3) video extraction in 4 4 blocks with nine DCT coefficients. The computing cost incurred only requires 8 additions and no multiplication for the first scheme, 2 multiplication and 28 additions for the second scheme, and 47 additions (no multiplication) for the third scheme. Extensive experiments were carried out, and the results reveal that: 1) the extracted video maintains competitive quality in terms of visual perception and inspection and 2) the extracted videos preserve the content well in comparison with those fully decompressed ones in terms of histogram measurement. As a result, the proposed algorithm will provide useful tools in bridging the gap between pixel domain and compressed domain to facilitate content analysis with low latency and high efficiency such as those applications in surveillance videos, interactive multimedia, and image processing.
Semantic content analysis for effective video segmentation, summarisation and retrieval.Jiang, Jianmin; Ipson, Stanley S.; Ren, Jinchang (University of BradfordDepartment of Electronic Imaging and Media Communications, 2010-03-10)This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications.
Estimation of LRD present in H.264 video traces using wavelet analysis and proving the paramount of H.264 using OPF technique in wi-fi environment.Jiang, Jianmin; Min, Geyong; Jayaseelan, John (University of BradfordDepartment of Electronic Imaging and Media Communications, 2013-11-28)While there has always been a tremendous demand for streaming video over Wireless networks, the nature of the application still presents some challenging issues. These applications that transmit coded video sequence data over best-effort networks like the Internet, the application must cope with the changing network behaviour; especially, the source encoder rate should be controlled based on feedback from a channel estimator that explores the network intermittently. The arrival of powerful video compression techniques such as H.264, which advance in networking and telecommunications, opened up a whole new frontier for multimedia communications. The aim of this research is to transmit the H.264 coded video frames in the wireless network with maximum reliability and in a very efficient manner. When the H.264 encoded video sequences are to be transmitted through wireless network, it faces major difficulties in reaching the destination. The characteristics of H.264 video coded sequences are studied fully and their capability of transmitting in wireless networks are examined and a new approach called Optimal Packet Fragmentation (OPF) is framed and the H.264 coded sequences are tested in the wireless simulated environment. This research has three major studies involved in it. First part of the research has the study about Long Range Dependence (LRD) and the ways by which the self-similarity can be estimated. For estimating the LRD a few studies are carried out and Wavelet-based estimator is selected for the research because Wavelets incarcerate both time and frequency features in the data and regularly provides a more affluent picture than the classical Fourier analysis. The Wavelet used to estimate the self-similarity by using the variable called Hurst Parameter. Hurst Parameter tells the researcher about how a data can behave inside the transmitted network. This Hurst Parameter should be calculated for a more reliable transmission in the wireless network. The second part of the research deals with MPEG-4 and H.264 encoder. The study is carried out to prove which encoder is superior to the other. We need to know which encoder can provide excellent Quality of Service (QoS) and reliability. This study proves with the help of Hurst parameter that H.264 is superior to MPEG-4. The third part of the study is the vital part in this research; it deals with the H.264 video coded frames that are segmented into optimal packet size in the MAC Layer for an efficient and more reliable transfer in the wireless network. Finally the H.264 encoded video frames incorporated with the Optimal Packet Fragmentation are tested in the NS-2 wireless simulated network. The research proves the superiority of H.264 video encoder and OPF¿s master class.