An inverted index is commonly used in searching. Each term has a record which is a list of documents(doc ids) that the term appears in. The list of document IDs is sorted.
cat 1, 11, 15, 33, 45
dog 2, 11, 33, 46
pig 3, 12, 46
wild 1, 15, 20
Results from intersections are:
wild cat: 1, 15
cat dog: 11, 33
The inverted index is very large and requires N servers to store.
1. How will you partition the inverted index onto N nodes?
2. What's the pros and cons of proposed partition scheme?
3. How will the intersection operation be performed distributively?