Skip navigation
Please use this identifier to cite or link to this item: https://libeldoc.bsuir.by/handle/123456789/38712
Full metadata record
DC FieldValueLanguage
dc.contributor.authorNasr S.-
dc.contributor.authorGerman, O. V.-
dc.contributor.authorГерман, О. В.-
dc.date.accessioned2020-03-17T12:44:50Z-
dc.date.available2020-03-17T12:44:50Z-
dc.date.issued2020-
dc.identifier.citationNasr, S. A searching algorithm for text with mistakes / Nasr S., German O. V. // Доклады БГУИР. – 2020. – № 18 (1). – С. 29 – 34. – http://dx.doi.org/10.35596/1729-7648-2020-18-1-29-34.ru_RU
dc.identifier.urihttps://libeldoc.bsuir.by/handle/123456789/38712-
dc.description.abstractThe paper contains a new text searching method representing modification of the Boyer-Moore algorithm and enabling a user to find the places in the text where the given substring occurs maybe with possible errors, that is the string in text and a query may not coincide but nevertheless are identical. The idea consists in division of the searching process in two phases: at the first phase a fuzzy variant of the Boyer–Moore algorithm is performed; at the second phase the Dice metrics is used. The advantage of suggested technique in comparison with the known methods using the fixed value of the mistakes number is that it 1) does not perform precomputation of the auxiliary table of the sizes comparable to the original text sizes and 2) it more flexibly catches the semantics of the erroneous text substrings even for a big number of mistakes. This circumstance extends possibilities of the Boyer–Moore method by addmitting a bigger amount of possible mistakes in text and preserving text semantics. The suggested method provides also more accurate regulation of the upper boundary for the text mistakes which differs it from the known methods with fixed value of the maximum number of mistakes not depending on the text sizes. Moreover, this upper boundary is defined as Levenshtein distance not suitable for evaluating a relevance of the founded text and a query, while the Dice metrics provides such a relevance. In fact, if maximum Levenshtein distanse is 3 then how one can judge if this value is big or small to provide relevance of the search results. Consequently, the suggested method is more flexible, enables one to find relevant answers even in case of a big number of mistakes in text. The efficiency of the suggested method in the worst case is O(nc) with constant c defining the biggest allowable number of mistakes.ru_RU
dc.language.isoenru_RU
dc.publisherБГУИРru_RU
dc.subjectпубликации ученыхru_RU
dc.subjectsearching algorithmru_RU
dc.subjecttext processingru_RU
dc.subjectfinding text with mistakesru_RU
dc.subjectfuzzy Boyer–Moore methodru_RU
dc.subjectmetrics of similarityru_RU
dc.titleA searching algorithm for text with mistakesru_RU
dc.typeСтатьяru_RU
Appears in Collections:№18 (1)

Files in This Item:
File Description SizeFormat 
Nasr_A.pdf477,18 kBAdobe PDFView/Open
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.