Publications

Scientific publications

Рубцов Д.Н., Барахнин В.Б.
О возможности борьбы с дубликатами при запросах к разнородным библиографическим источникам
Roubtsov D.N., Barakhnin V.B. On the possibility of duplicates struggle when performing queries to heterogeneous bibliographic sources // Digital Libraries: Advanced Methods and Technologies, Digital Collections: Proceedings of the XI All-Russian Research Conference RCDL'2009. Petrozavodsk: KRC RAS, 2009. Pp. 293-298
When performing queries to multiple heterogeneous bibliographic sources the problem of repetitive records arises. The problems appearing in the process of detection of fuzzy match between two records are analyzed in this paper. The existing methods and algorithms of duplicate elimination and in particular the approaches to determination and calculation of string similarity function are considered.
Taking into account the requirements of the concrete task of modernization of the information system “Mathematicians of SB RAS” the solution method was realized based on the use of longest common subsequence as string similarity function. The proposed method was tested on three SB RAS databases - Database of publications of Journal “Computational Technologies”, Database of publications of employees of The Institute of Computational Technologies SB RAS and Database of publications of “Web-resources of the mathematical content”. The method showed high efficiency on results of the testing and was applied for the information system “Mathematicians of SB RAS” and the integrated system of remote access to the heterogenous bibliographic resources which is being developed at the present moment.

On the possibility of duplicates struggle when performing queries to heterogeneous bibliographic sources (244 Kb, total downloads: 616)

Last modified: October 16, 2009