Treffer: LSH SimilarityJoin pattern in FastFlow
https://hal.science/hal-04575238
16TH INTERNATIONAL SYMPOSIUM ON HIGH-LEVEL PARALLEL PROGRAMMING AND APPLICATIONS, Jun 2023, Cluj-Napoca, Romania
Weitere Informationen
International audience ; Similarity joins are recognized to be among the most used dataprocessing and analysis operations. In this work we introduce a C++-basedhigh-level parallel pattern implemented on top of FastFlow Building Blocks toprovide the programmer with ready-to-use similarity joins computations. TheSimilarityJoin pattern is implemented according to the MapReduce paradigmenriched with Locality Sensitive Hashing (LSH) to optimize the whole com-putation. The new parallel pattern can be used with any C++ serializabledata structure and executed on shared- and distributed-memory machines.We present some experimental validation of the proposed solution on two dif-ferent clusters using the original hand-tuned Hadoop implementation of theLSH-based similarity join algorithms as a reference baseline.