Treffer: LSH SimilarityJoin pattern in FastFlow

Title:
LSH SimilarityJoin pattern in FastFlow
Contributors:
University of Pisa Italy = Università di Pisa Italia = Université de Pise Italie (UniPi), Laboratoire d'Informatique Fondamentale d'Orléans (LIFO), Université d'Orléans (UO)-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)
Source:
16TH INTERNATIONAL SYMPOSIUM ON HIGH-LEVEL PARALLEL PROGRAMMING AND APPLICATIONS
https://hal.science/hal-04575238
16TH INTERNATIONAL SYMPOSIUM ON HIGH-LEVEL PARALLEL PROGRAMMING AND APPLICATIONS, Jun 2023, Cluj-Napoca, Romania
Publisher Information:
CCSD
Publication Year:
2023
Collection:
Université d'Orléans: HAL
Subject Geographic:
Document Type:
Konferenz conference object
Language:
English
Accession Number:
edsbas.D1231951
Database:
BASE

Weitere Informationen

International audience ; Similarity joins are recognized to be among the most used dataprocessing and analysis operations. In this work we introduce a C++-basedhigh-level parallel pattern implemented on top of FastFlow Building Blocks toprovide the programmer with ready-to-use similarity joins computations. TheSimilarityJoin pattern is implemented according to the MapReduce paradigmenriched with Locality Sensitive Hashing (LSH) to optimize the whole com-putation. The new parallel pattern can be used with any C++ serializabledata structure and executed on shared- and distributed-memory machines.We present some experimental validation of the proposed solution on two dif-ferent clusters using the original hand-tuned Hadoop implementation of theLSH-based similarity join algorithms as a reference baseline.