Inverted Index Partitioning

An inverted index is commonly used in searching. Each term has a record which is a list of documents(doc ids) that the term appears in. The list of document IDs is sorted.

for example:
cat 1, 11, 15, 33, 45
dog 2, 11, 33, 46
pig  3, 12, 46
wild 1, 15, 20

Results from intersections are:
wild cat: 1, 15
cat dog: 11, 33

The inverted index is very large and requires N servers to store.

1. How will you partition the inverted index onto N nodes?
2. What's the pros and cons of proposed partition scheme?
3. How will the intersection operation be performed distributively?

Login to see Answer and Coaching Session More interview questions