A Comparative Study of Data Transformations for Efficient XML and JSON Data Compression. An In-Depth Analysis of Data Transformation Techniques, including Tag and Capital Conversions, Character and Word N-Gram Transformations, and Domain-Specific Data Transforms using SMILES Data as a Case Study
View/ Open
PhD Thesis (2.848Mb)
Download
Publication date
2017-12-13Author
Scanlon, Shagufta A.Supervisor
Ridley, Mick J.Cullen, Andrea J.
Rights
The University of Bradford theses are licenced under a Creative Commons Licence.
Institution
University of BradfordDepartment
School of Electrical Engineering and Computer ScienceAwarded
2015
Metadata
Show full item recordAbstract
XML is a widely used data exchange format. The verbose nature of XML leads to the requirement to efficiently store and process this type of data using compression. Various general-purpose transforms and compression techniques exist that can be used to transform and compress XML data. More compact alternatives to XML data have been developed, namely JSON due to the verbosity of XML data. Similarly, there is a requirement to efficiently store and process SMILES data used in Chemoinformatics. General-purpose transforms and compressors can be used to compress this type of data to a certain extent, however, these techniques are not specific to SMILES data. The primary contribution of this research is to provide developers that use XML, JSON or SMILES data, with key knowledge of the best transformation techniques to use with certain types of data, and which compression techniques would provide the best compressed output size and processing times, depending on their requirements. The main study in this thesis, investigates the extent of which using data transforms prior to data compression can further improve the compression of XML and JSON data. It provides a comparative analysis of applying a variety of data transform and data transform variations, to a number of different types of XML and JSON equivalent datasets of various sizes, and applying different general-purpose compression techniques over the transformed data. A case study is also conducted, to investigate data transforms prior to compression to improve the compression of data within a data-specific domain.Type
ThesisQualification name
PhDNotes
The files of software accompanying this thesis are unable to be presented online with the thesis.Collections
Related items
Showing items related by title, author, creator and subject.
-
Video extraction for fast content access to MPEG compressed videosJiang, Jianmin; Weng, Y. (2009-06-09)As existing video processing technology is primarily developed in the pixel domain yet digital video is stored in compressed format, any application of those techniques to compressed videos would require decompression. For discrete cosine transform (DCT)-based MPEG compressed videos, the computing cost of standard row-by-row and column-by-column inverse DCT (IDCT) transforms for a block of 8 8 elements requires 4096 multiplications and 4032 additions, although practical implementation only requires 1024 multiplications and 896 additions. In this paper, we propose a new algorithm to extract videos directly from MPEG compressed domain (DCT domain) without full IDCT, which is described in three extraction schemes: 1) video extraction in 2 2 blocks with four coefficients; 2) video extraction in 4 4 blocks with four DCT coefficients; and 3) video extraction in 4 4 blocks with nine DCT coefficients. The computing cost incurred only requires 8 additions and no multiplication for the first scheme, 2 multiplication and 28 additions for the second scheme, and 47 additions (no multiplication) for the third scheme. Extensive experiments were carried out, and the results reveal that: 1) the extracted video maintains competitive quality in terms of visual perception and inspection and 2) the extracted videos preserve the content well in comparison with those fully decompressed ones in terms of histogram measurement. As a result, the proposed algorithm will provide useful tools in bridging the gap between pixel domain and compressed domain to facilitate content analysis with low latency and high efficiency such as those applications in surveillance videos, interactive multimedia, and image processing.
-
Concerted Molecular Displacements in a Thermally-induced Solid-State Transformation in Crystals of DL-NorleucineAnwar, Jamshed; Kendrick, John; Tuble, S.C. (2007)Martensitic transformations are of considerable technological importance, a particularly promising application being the possibility of using martensitic materials, possibly proteins, as tiny machines. For organic crystals, however, a molecular level understanding of such transformations is lacking. We have studied a martensitic-type transformation in crystals of the amino acid DL-norleucine using molecular dynamics simulation. The crystal structures of DL-norleucine comprise stacks of bilayers (formed as a result of strong hydrogen bonding) that translate relative to each other on transformation. The simulations reveal that the transformation occurs by concerted molecular displacements involving entire bilayers rather than on a molecule-by-molecule basis. These observations can be rationalized on the basis that at sufficiently high excess temperatures, the free energy barriers to concerted molecular displacements can be overcome by the available thermal energy. Furthermore, in displacive transformations, the molecular displacements can occur by the propagation of a displacement wave (akin to a kink in a carpet), which requires the molecules to overcome only a local barrier. Concerted molecular displacements are therefore considered to be a significant feature of all displacive transformations. This finding is expected to be of value toward developing strategies for controlling or modulating martensitic-type transformations.
-
Advanced MIMO-OFDM technique for future high speed braodband wireless communications. A study of OFDM design, using wavelet transform, fractional fourier transform, fast fourier transform, doppler effect, space-time coding for multiple input, multiple output wireless communications systemsAbd-Alhameed, Raed; Jones, Steven M.R.; Anoh, Kelvin O.O. (University of BradfordSchool of Engineering and Informatics, 2015)This work concentrates on the application of diversity techniques and space time block coding for future high speed mobile wireless communications on multicarrier systems. At first, alternative multicarrier kernels robust for high speed doubly-selective fading channel are sought. They include the comparisons of discrete Fourier transform (DFT), fractional Fourier transform (FrFT) and wavelet transform (WT) multicarrier kernels. Different wavelet types, including the raised-cosine spectrum wavelets are implemented, evaluated and compared. From different wavelet families, orthogonal wavelets are isolated from detailed evaluations and comparisons as suitable for multicarrier applications. The three transforms are compared over a doubly-selective channel with the WT significantly outperforming all for high speed conditions up to 300 km/hr. Then, a new wavelet is constructed from an ideal filter approximation using established wavelet design algorithms to match any signal of interest; in this case under bandlimited criteria. The new wavelet showed better performance than other traditional orthogonal wavelets. To achieve MIMO communication, orthogonal space-time block coding, OSTBC, is evaluated next. First, the OSTBC is extended to assess the performance of the scheme over extended receiver diversity order. Again, with the extended diversity conditions, the OSTBC is implemented for a multicarrier system over a doubly-selective fading channel. The MIMO-OFDM systems (implemented using DFT and WT kernels) are evaluated for different operating frequencies, typical of LTE standard, with Doppler effects. It was found that, during high mobile speed, it is better to transmit OFDM signals using lower operating frequencies. The information theory for the 2-transmit antenna OSTBC does not support higher order implementation of multi-antenna systems, which is required for the future generation wireless communications systems. Instead of the OSTBC, the QO-STBC is usually deployed to support the design of higher order multi-antenna systems other than the 2-transmit antenna scheme. The performances of traditional QO-STBC methods are diminished by some off-diagonal (interference) terms such that the resulting system does not attain full diversity. Some methods for eliminating the interference terms have earlier been discussed. This work follows the construction of cyclic matrices with Hadamard matrix to derive QO-STBC codes construction which are N-times better than interference free QO-STBC, where N is the number of transmit antenna branches.