本文介绍了如何通过MapReduce实施LSH?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们希望通过MapReduce实现本地敏感哈希(LSH).具体来说,假设签名矩阵的大块由列组成,元素是键-值对,其中键是列号,值是签名本身(即值的向量).

Suppose we wish to implement Local Sensitive Hashing(LSH) by MapReduce. Specifically, assume chunks of the signature matrix consist of columns, and elements are key-value pairs where the key is the column number and the value is the signature itself (i.e., a vector of values).

(a)显示如何将所有频段的存储桶作为单个输出输出MapReduce进程.提示:请记住,一个Map函数可以产生单个元素中的几个键值对.

(a) Show how to produce the buckets for all the bands as output of a singleMapReduce process. Hint: Remember that a Map function can produceseveral key-value pairs from a single element.

(b)显示另一个MapReduce进程如何将(a)的输出转换为需要比较的对的列表.具体来说,对于第i列,应该有一个列j> i的列表,我需要与它们比较.

(b) Show how another MapReduce process can convert the output of (a) toa list of pairs that need to be compared. Specifically, for each column i,there should be a list of those columns j > i with which i needs to becompared.

推荐答案

(a)

  • 地图:作为输入的元素及其签名产生键/值对(bucket_id,元素)
  • Reduce:产生所有频段的存储桶作为输出,即(bucket_id,列表(元素))
  • Map: the elements and its signature as input, produce the key-value pairs (bucket_id, element)
  • Reduce: produce the buckets for all the bands as output, i.e.(bucket_id, list(elements))
map(key, value: element):
    split item to bands
    for band in bands:
        for sig in band:
            key = hash(sig) // key = bucket id
        collect(key, value)

reduce(key, values):
    collect(key, values)

(b)

  • 地图:将(a)的输出作为输入,以相同的方式产生组合列表桶,即(bucket_id,list(elements))->(bucket_id,组合(列表(元素))),其中组合()是任意两个元素从同一桶中选择.
  • 减少:输出需要配对的项目对相比之下,具体地说,对于第i列,应该有一个需要与之进行比较的j> i列.
  • Map: output of (a) as input, produce the list of combination in samebucket, i.e. (bucket_id, list(elements)) -> (bucket_id,combination(list(elements))), which combination() is any two elementschosen from same bucket.
  • Reduce: output the item pairs need to becompared, Specifically, for each column i, there should be a list ofthose columns j > i with which i needs to be compared.
map(key, value):
    for itemA, itemB in combinations(value)
        key = (itemA.id, itemB.id)
        collect(key, [itemA, itemB])

reduce(key, values):
    collect(key, values)

这篇关于如何通过MapReduce实施LSH?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 21:50