Browsing Theses by Subject "Data extraction"
Now showing items 1-1 of 1
Multi agent system for web database processing, on data extraction from online social networks.In recent years, there has been a ood of continuously changing information from a variety of web resources such as web databases, web sites, web services and programs. Online Social Networks (OSNs) represent such a eld where huge amounts of information are being posted online over time. Due to the nature of OSNs, which o er a productive source for qualitative and quantitative personal information, researchers from various disciplines contribute to developing methods for extracting data from OSNs. However, there is limited research which addresses extracting data automatically. To the best of the author's knowledge, there is no research which focuses on tracking the real time changes of information retrieved from OSN pro les over time and this motivated the present work. This thesis presents di erent approaches for automated Data Extraction (DE) from OSN: crawler, parser, Multi Agent System (MAS) and Application Programming Interface (API). Initially, a parser was implemented as a centralized system to traverse the OSN graph and extract the pro- le's attributes and list of friends from Myspace, the top OSN at that time, by parsing the Myspace pro les and extracting the relevant tokens from the parsed HTML source les. A Breadth First Search (BFS) algorithm was used to travel across the generated OSN friendship graph in order to select the next pro le for parsing. The approach was implemented and tested on two types of friends: top friends and all friends. In case of top friends, 500 seed pro les have been visited; 298 public pro les were parsed to get 2197 top friends pro les and 2747 friendship edges, while in case of all friends, 250 public pro les have been parsed to extract 10,196 friends' pro les and 17,223 friendship edges. This approach has two main limitations. The system is designed as a centralized system that controlled and retrieved information of each user's pro le just once. This means that the extraction process will stop if the system fails to process one of the pro les; either the seed pro le ( rst pro le to be crawled) or its friends. To overcome this problem, an Online Social Network Retrieval System (OSNRS) is proposed to decentralize the DE process from OSN through using MAS. The novelty of OSNRS is its ability to monitor pro les continuously over time. The second challenge is that the parser had to be modi ed to cope with changes in the pro les' structure. To overcome this problem, the proposed OSNRS is improved through use of an API tool to enable OSNRS agents to obtain the required elds of an OSN pro le despite modi cations in the representation of the pro le's source web pages. The experimental work shows that using API and MAS simpli es and speeds up the process of tracking a pro le's history. It also helps security personnel, parents, guardians, social workers and marketers in understanding the dynamic behaviour of OSN users. This thesis proposes solutions for web database processing on data extraction from OSNs by the use of parser and MAS and discusses the limitations and improvements.