Automated extraction of feature and variability information from natural language requirement specifications

Li, Yang

Please use this identifier to cite or link to this item: http://dx.doi.org/10.25673/35702

Title:	Automated extraction of feature and variability information from natural language requirement specifications
Author(s):	Li, Yang
Referee(s):	Saake, Gunter Nürnberger, Andreas
Granting Institution:	Otto-von-Guericke-Universität Magdeburg, Fakultät für Informatik
Issue Date:	2020
Extent:	xviii, 132 Seiten
Type:	Hochschulschrift
Type:	PhDThesis
Exam Date:	2020
Language:	English
URN:	urn:nbn:de:gbv:ma9:1-1981185920-359205
Subjects:	Software engineering Textverarbeitung Sprachverarbeitung
Abstract:	Software Product Lines support structured reuse of software artifacts to realize the maintenance and evolution of the typically large number of variants, which promotes the industrialization of software development, especially for software-intensive prod-ucts. Feature and variability information extraction from diﬀerent artifacts is an indispensable activity to support the systematic integration of single software sys-tems and software product line. However, for a legacy system, it is non-trivial to gain information about commonalities and diﬀerences of the variants. Beyond manually extracting commonalities and variabilities, a variety of approaches, such as feature location in source code and feature extraction in requirements, has been proposed to provide automatic identiﬁcation of features and their variation points. Compared with source code, requirements contain more complete variability information and provide traceability links to other artifacts from early development phases. In this thesis, we provide a systematic literature review, which contains a multi-dimensional overview of feature extraction approaches from natural language documents. Based on the observations from studies, we provide feasible and accurate approaches to improve the eﬃciency of feature extraction. To achieve this goal, we ﬁrst explore the application of deep learning technologies in feature extraction. Second, we pro-pose a hybrid approach based on multiple natural language processing and data mining techniques to extract features and variability information. Third, in order to provide understandable notations for features, we propose an approach combining keyword extraction and machine learning methods to predict feature-related terms. Fourth, we apply the proposed feature extraction approaches to analyze the require-ments from a real-world scenario in practice, where we adjust the framework and combine other algorithms in terms of the specialities of real-world requirements. We empirically present how our proposed approaches can be used to extract features and variation points, while results show the usage of the proposed approaches can beneﬁt the extraction process. Software-Produktlinien unterstützen die strukturierte Wiederverwendung von Soft-ware Artefakten, um die Wartung und Weiterentwicklung der normalerweise großen Anzahl von Varianten zu realisieren, was die Industrialisierung der Softwareen-twicklung insbesondere für softwareintensive Produkte fördert. Die Extraktion von Feature und Variabilitätsinformationen aus verschiedenen Artefakten ist eine un-verzichtbare Aktivität, um die systematische Integration einzelner Softwaresysteme und Software-Produktlinie zu unterstützen. Für ein Altsystem ist es jedoch nicht trivial, Informationen über Gemeinsamkeiten und Unterschiede der Varianten zu erhalten. Neben dem manuellen Extrahieren von Gemeinsamkeiten und Variabil-itäten wurden vielfältige Ansätze vorgeschlagen, z. B. die Position von Features im Quellcode und die Extraktion von Features in Anforderungen, um Features und ihre Variationspunkte automatisch zu identiﬁzieren. Im Vergleich zum Quellcode enthal-ten die Anforderungen umfassendere Variabilitätsinformationen und bieten Rück-verfolgbarkeitsverknüpfungen zu anderen Artefakten aus frühen Phasen der Softwa-reentwicklung. In dieser Arbeit bieten wir eine systematische Literaturrecherche, die einen multidimensionalen überblick über Ansätze zur Feature-Extraktion aus Doku-menten in natürlicher Sprache enthält. Basierend auf den Beobachtungen aus dieser Studie schlagen wir praktikable und genaue Ansätze zur Verbesserung der Eﬃzienz der Feature-Extraktion vor. Um dieses Ziel zu erreichen, untersuchen wir zunächst die Anwendung von Deep-Learning-Technologien bei der Feature-Extraktion. Zweit-ens schlagen wir einen hybriden Ansatz vor, der auf mehreren Techniken zur Verar-beitung natürlicher Sprache und Data-Mining basiert, um Informationen von Fea-ture und Variabilität zu extrahieren. Darüber hinaus präsentieren wir einen Ansatz, der Schlüsselwortextraktion und Methoden des maschinellen Lernens kombiniert, um feature-bezogene Termini vorherzusagen, damit verständliche Notationen für Features bereitgestellt werden können. Schließlich wenden wir die zuvor präsen-tierten Ansätze zur Feature-Extraktion an, um die Anforderungen aus einem realen Szenario in der Praxis zu analysieren, wobei wir das Framework anpassen und andere Algorithmen im Hinblick auf die Besonderheiten realer Anforderungen kombinieren. Empirisch präsentieren wir, wie von uns gestellte Ansätze verwendet werden können, um Features und Variationspunkte zu extrahieren. Zugleich zeigen die Ergebnisse, dass die Verwendung dieser Ansätze dem Extraktionsprozess zugutekommen kann.
URI:	https://opendata.uni-halle.de//handle/1981185920/35920 http://dx.doi.org/10.25673/35702
Open Access:	Open access publication
License:	(CC BY-SA 4.0) Creative Commons Attribution ShareAlike 4.0
Appears in Collections:	Fakultät für Informatik

Files in This Item:

File	Description	Size	Format
Li_Yang_Dissertation_2020.pdf	Dissertation	1.99 MB	Adobe PDF	View/Open

Show full item record BibTeX EndNote