Loading...
Thumbnail Image
Publication

Multi agent system for web database processing, on data extraction from online social networks.

Abdulrahman, Ruqayya
Publication Date
2013-02-08
End of Embargo
Rights
Creative Commons License
The University of Bradford theses are licenced under a Creative Commons Licence.
Peer-Reviewed
Open Access status
Accepted for publication
Institution
University of Bradford
Department
Department of Computing
Awarded
2012
Embargo end date
Collections
Additional title
Abstract
In recent years, there has been a ood of continuously changing information from a variety of web resources such as web databases, web sites, web services and programs. Online Social Networks (OSNs) represent such a eld where huge amounts of information are being posted online over time. Due to the nature of OSNs, which o er a productive source for qualitative and quantitative personal information, researchers from various disciplines contribute to developing methods for extracting data from OSNs. However, there is limited research which addresses extracting data automatically. To the best of the author's knowledge, there is no research which focuses on tracking the real time changes of information retrieved from OSN pro les over time and this motivated the present work. This thesis presents di erent approaches for automated Data Extraction (DE) from OSN: crawler, parser, Multi Agent System (MAS) and Application Programming Interface (API). Initially, a parser was implemented as a centralized system to traverse the OSN graph and extract the pro- le's attributes and list of friends from Myspace, the top OSN at that time, by parsing the Myspace pro les and extracting the relevant tokens from the parsed HTML source les. A Breadth First Search (BFS) algorithm was used to travel across the generated OSN friendship graph in order to select the next pro le for parsing. The approach was implemented and tested on two types of friends: top friends and all friends. In case of top friends, 500 seed pro les have been visited; 298 public pro les were parsed to get 2197 top friends pro les and 2747 friendship edges, while in case of all friends, 250 public pro les have been parsed to extract 10,196 friends' pro les and 17,223 friendship edges. This approach has two main limitations. The system is designed as a centralized system that controlled and retrieved information of each user's pro le just once. This means that the extraction process will stop if the system fails to process one of the pro les; either the seed pro le ( rst pro le to be crawled) or its friends. To overcome this problem, an Online Social Network Retrieval System (OSNRS) is proposed to decentralize the DE process from OSN through using MAS. The novelty of OSNRS is its ability to monitor pro les continuously over time. The second challenge is that the parser had to be modi ed to cope with changes in the pro les' structure. To overcome this problem, the proposed OSNRS is improved through use of an API tool to enable OSNRS agents to obtain the required elds of an OSN pro le despite modi cations in the representation of the pro le's source web pages. The experimental work shows that using API and MAS simpli es and speeds up the process of tracking a pro le's history. It also helps security personnel, parents, guardians, social workers and marketers in understanding the dynamic behaviour of OSN users. This thesis proposes solutions for web database processing on data extraction from OSNs by the use of parser and MAS and discusses the limitations and improvements.
Version
Citation
Link to publisher’s version
Link to published version
Link to Version of Record
Type
Thesis
Qualification name
PhD
Notes

Version History

Now showing 1 - 2 of 2
VersionDateSummary
2*
2025-04-09 13:18:18
Edited author entries
2013-02-08 18:00:18
* Selected version