A Comparative Study of Data Transformations for Efficient XML and JSON Data Compression. An In-Depth Analysis of Data Transformation Techniques, including Tag and Capital Conversions, Character and Word N-Gram Transformations, and Domain-Specific Data Transforms using SMILES Data as a Case Study
View/ Open
PhD Thesis (2.848Mb)
Download
Publication date
2017-12-13Author
Scanlon, Shagufta A.Supervisor
Ridley, Mick J.Cullen, Andrea J.
Rights
The University of Bradford theses are licenced under a Creative Commons Licence.
Institution
University of BradfordDepartment
School of Electrical Engineering and Computer ScienceAwarded
2015
Metadata
Show full item recordAbstract
XML is a widely used data exchange format. The verbose nature of XML leads to the requirement to efficiently store and process this type of data using compression. Various general-purpose transforms and compression techniques exist that can be used to transform and compress XML data. More compact alternatives to XML data have been developed, namely JSON due to the verbosity of XML data. Similarly, there is a requirement to efficiently store and process SMILES data used in Chemoinformatics. General-purpose transforms and compressors can be used to compress this type of data to a certain extent, however, these techniques are not specific to SMILES data. The primary contribution of this research is to provide developers that use XML, JSON or SMILES data, with key knowledge of the best transformation techniques to use with certain types of data, and which compression techniques would provide the best compressed output size and processing times, depending on their requirements. The main study in this thesis, investigates the extent of which using data transforms prior to data compression can further improve the compression of XML and JSON data. It provides a comparative analysis of applying a variety of data transform and data transform variations, to a number of different types of XML and JSON equivalent datasets of various sizes, and applying different general-purpose compression techniques over the transformed data. A case study is also conducted, to investigate data transforms prior to compression to improve the compression of data within a data-specific domain.Type
ThesisQualification name
PhDNotes
The files of software accompanying this thesis are unable to be presented online with the thesis.Collections
Related items
Showing items related by title, author, creator and subject.
-
Advanced MIMO-OFDM technique for future high speed braodband wireless communications. A study of OFDM design, using wavelet transform, fractional fourier transform, fast fourier transform, doppler effect, space-time coding for multiple input, multiple output wireless communications systemsAbd-Alhameed, Raed; Jones, Steven M.R.; Anoh, Kelvin O.O. (University of BradfordSchool of Engineering and Informatics, 2015)This work concentrates on the application of diversity techniques and space time block coding for future high speed mobile wireless communications on multicarrier systems. At first, alternative multicarrier kernels robust for high speed doubly-selective fading channel are sought. They include the comparisons of discrete Fourier transform (DFT), fractional Fourier transform (FrFT) and wavelet transform (WT) multicarrier kernels. Different wavelet types, including the raised-cosine spectrum wavelets are implemented, evaluated and compared. From different wavelet families, orthogonal wavelets are isolated from detailed evaluations and comparisons as suitable for multicarrier applications. The three transforms are compared over a doubly-selective channel with the WT significantly outperforming all for high speed conditions up to 300 km/hr. Then, a new wavelet is constructed from an ideal filter approximation using established wavelet design algorithms to match any signal of interest; in this case under bandlimited criteria. The new wavelet showed better performance than other traditional orthogonal wavelets. To achieve MIMO communication, orthogonal space-time block coding, OSTBC, is evaluated next. First, the OSTBC is extended to assess the performance of the scheme over extended receiver diversity order. Again, with the extended diversity conditions, the OSTBC is implemented for a multicarrier system over a doubly-selective fading channel. The MIMO-OFDM systems (implemented using DFT and WT kernels) are evaluated for different operating frequencies, typical of LTE standard, with Doppler effects. It was found that, during high mobile speed, it is better to transmit OFDM signals using lower operating frequencies. The information theory for the 2-transmit antenna OSTBC does not support higher order implementation of multi-antenna systems, which is required for the future generation wireless communications systems. Instead of the OSTBC, the QO-STBC is usually deployed to support the design of higher order multi-antenna systems other than the 2-transmit antenna scheme. The performances of traditional QO-STBC methods are diminished by some off-diagonal (interference) terms such that the resulting system does not attain full diversity. Some methods for eliminating the interference terms have earlier been discussed. This work follows the construction of cyclic matrices with Hadamard matrix to derive QO-STBC codes construction which are N-times better than interference free QO-STBC, where N is the number of transmit antenna branches.
-
Design and analysis of Discrete Cosine Transform-based watermarking algorithms for digital images. Development and evaluation of blind Discrete Cosine Transform-based watermarking algorithms for copyright protection of digital images using handwritten signatures and mobile phone numbers.Qahwaji, Rami S.R.; Al-Ahmad, Hussain; Al-Gindy, Ahmed M.N. (University of BradfordSchool of Computing, Informatics and Media, 2012-06-22)This thesis deals with the development and evaluation of blind discrete cosine transform-based watermarking algorithms for copyright protection of digital still images using handwritten signatures and mobile phone numbers. The new algorithms take into account the perceptual capacity of each low frequency coefficients inside the Discrete Cosine Transform (DCT) blocks before embedding the watermark information. They are suitable for grey-scale and colour images. Handwritten signatures are used instead of pseudo random numbers. The watermark is inserted in the green channel of the RGB colour images and the luminance channel of the YCrCb images. Mobile phone numbers are used as watermarks for images captured by mobile phone cameras. The information is embedded multiple-times and a shuffling scheme is applied to ensure that no spatial correlation exists between the original host image and the multiple watermark copies. Multiple embedding will increase the robustness of the watermark against attacks since each watermark will be individually reconstructed and verified before applying an averaging process. The averaging process has managed to reduce the amount of errors of the extracted information. The developed watermarking methods are shown to be robust against JPEG compression, removal attack, additive noise, cropping, scaling, small degrees of rotation, affine, contrast enhancements, low-pass, median filtering and Stirmark attacks. The algorithms have been examined using a library of approximately 40 colour images of size 512 512 with 24 bits per pixel and their grey-scale versions. Several evaluation techniques were used in the experiment with different watermarking strengths and different signature sizes. These include the peak signal to noise ratio, normalized correlation and structural similarity index measurements. The performance of the proposed algorithms has been compared to other algorithms and better invisibility qualities with stronger robustness have been achieved.
-
Cell and tissue engineering of articular cartilage via regulation and alignment of primary chondrocyte using manipulated transforming growth factors and ECM proteins. Effect of transforming growth factor-beta (TGF-¿1, 2 and 3) on the biological regulation and wound repair of chondrocyte monolayers with and without presence of ECM proteins.Youseffi, Mansour; Denyer, Morgan C.T.; Khaghani, Seyed A. (University of BradfordSchool of Engineering Design and Technology, 2012-01-31)Articular cartilage is an avascular and flexible connective tissue found in joints. It produces a cushioning effect at the joints and provides low friction to protect the ends of the bones from wear and tear/damage. It has poor repair capacity and any injury can result pain and loss of mobility. One of the common forms of articular cartilage disease which has a huge impact on patient¿s life is arthritis. Research on cartilage cell/tissue engineering will help patients to improve their physical activity by replacing or treating the diseased/damaged cartilage tissue. Cartilage cell, called chondrocyte is embedded in the matrix (Lacunae) and has round shape in vivo. The in vitro monolayer culture of primary chondrocyte causes morphological change characterized as dedifferentiation. Transforming growth factor-beta (TGF-¿), a cytokine superfamily, regulates cell function, including differentiation and proliferation. The effect of TGF-¿1, 2, 3, and their manipulated forms in biological regulation of primary chondrocyte was investigated in this work. A novel method was developed to isolate and purify the primary chondrocytes from knee joint of neonate Sprague-Dawley rat, and the effect of some supplementations such as hyaluronic acid and antibiotics were also investigated to provide the most appropriate condition for in vitro culture of chondrocyte cells. Addition of 0.1mg/ml hyaluronic acid in chondrocyte culture media resulted an increase in primary chondrocyte proliferation and helped the cells to maintain chondrocytic morphology. TGF-¿1, 2 and 3 caused chondrocytes to obtain fibroblastic phenotype, alongside an increase in apoptosis. The healing process of the wound closure assay of chondrocyte monolayers were slowed down by all three isoforms of TGF-¿. All three types of TGF-¿ negatively affected the strength of chondrocyte adhesion. TGF-¿1, 2 and 3 up regulated the expression of collagen type-II, but decreased synthesis of collagen type-I, Chondroitin sulfate glycoprotein, and laminin. They did not show any significant change in production of S-100 protein and fibronectin. TGF-¿2, and 3 did not change expression of integrin-¿1 (CD29), but TGF-¿1 decreased the secretion of this adhesion protein. Manipulated TGF-¿ showed huge impact on formation of fibroblast like morphology of chondrocytes with chondrocytic phenotype. These isoforms also decreased the expression of laminin, chondroitin sulfate glycoprotein, and collagen type-I, but they increased production of collagen type-II and did not induce synthesis of fibronectin and S-100 protein. In addition, the strength of cell adhesion on solid surface was reduced by manipulated TGF-¿. Only manipulated form of TGF-¿1 and 2 could increase the proliferation rate. Manipulation of TGF-¿ did not up regulate the expression of integrin-¿1in planar culture system. The implications of this R&D work are that the manipulation of TGF-¿ by combination of TGF-¿1, 2, and 3 can be utilized in production of superficial zone of cartilage and perichondrium. The collagen, fibronectin and hyaluronic acid could be recruited for the fabrication of a biodegradable scaffold that promotes chondrocyte growth for autologous chondrocyte implantation or for formation of cartilage.