Applying intelligence amplification to the problem of schema matching

Buis, J.T.P. (2017) Applying intelligence amplification to the problem of schema matching.

[img]
Preview
PDF
3MB
Abstract:A task often occurring at CAPE Groep is the task of schema matching. Schema matching is the problem of finding pairs of attributes (or groups of attributes) from a source schema and attributes of a target schema such that pairs are likely to be related. At present, this time-consuming task is done manually. This thesis explores the possibilities for partially automating this process thus saving time and, eventually, money. Fully automating the task of schema matching has proved to be difficult. We therefore apply the concept of Intelligence Amplification to the problem of schema matching. Intelligence Amplification is a field which focuses on a symbiotic relationship between human and machine. A clear definition is currently lacking in literature and after assessing extracting key features we created our own definition: “Intelligence Amplification focusses on a close collaboration, with complementary contributions, between human and machine to empower humans in the decision-making process”. For the problem of schema matching we found two major moments where interaction between humans and machine occur: during the stage of pre-processing and during the matching stage. Pre-processing happens at the begin of a matching scenario. Steps included in pre-processing include expanding abbreviations or translation of attribute names. In the matching stage, a machine calculates a set of candidate mappings. In our IA driven approach, the user can opt to invoke several software agents, either get better results or to have a different software agent for a subset of the matching scenario. A reference architecture was developed to aid in development of such tools. Using this reference architecture, we developed our own prototype. This prototype contained a machine learning approach. We trained a neural network to predict candidate mappings. Evaluation of this method has showed there is still room for improvement as for some scenarios the neural network was not able to generate any candidate mappings. Evaluation of the prototype was done using two metrics: effectiveness and efficiency. For effectiveness we look at precision and recall. Precision is a metric for the quality of results. It indicates the percentage of correct predictions that were made by the machine as part of the total amount of predictions made. Recall tells something about the completeness of results. It indicates how many correct predictions were made as part of the total amount of correct predictions which should be made. The second evaluation criteria, efficiency, is looking at the time aspect. First a baseline is established. In our case this is the time it takes a user to manually complete a matching scenario. When using an automated approach, we again look at the total time it takes to complete a scenario and compare this against the baseline. Using this feature a performance improvement score is calculated. It was found the prototype needs several improvements. We tried an approach using a trained neural network and one with a heuristic to create candidate mappings. We have not found a single approach which works best for every situation. For CAPE Groep we recommend the most important next step is to improve the user interface so it is better able to handle the input of an auto-mapping application. Sliders for the various metrics should be included. This allows a user to directly see the effect of any change they make and tweak the settings such that it fits the scenario they work on. This should then be extended by further pre-processing steps to research what the benefit is of certain pre-processing actions.
Item Type:Essay (Master)
Clients:
CAPE Groep
Faculty:BMS: Behavioural, Management and Social Sciences
Subject:54 computer science
Programme:Business Information Technology MSc (60025)
Link to this item:http://purl.utwente.nl/essays/73499
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page