INDIAN INSTITUTE OF MANAGEMENT CALCUTTA
WORKING PAPER SERIES
WPS No.724 / February 2013
A Ranking Algorithm for Online Social Network Search
by
Divya Sharma
Doctoral student, IIM Calcutta D. H. Road, Joka, Kolkata 700104, India
Parthasarathi Dasgupta
Professor, IIM Calcutta, Diamond Harbour Road, Joka, Kolkata 700104, India
&
Debashis Saha
Professor, IIM Calcutta, Diamond Harbour Road, Joka P.O., Kolkata 700 104 India
A Ranking Algorithm for Online Social Network Search
Divya Sharma*, Parthasarathi Dasgupta and Debashis Saha
*Fellow Programme student, MIS Group, IIM Calcutta
Abstract
Online Social Networks have become the new arena for people to stay in touch, pursue their interests
and collaborate. In October 2012, Facebook reported a whopping 1 billion users which is testimony of
fact how online social networks have proliferated and made inroads into the real world. Some of the
obvious advantages of social networks are (1) 24X7 availability allowing users to stay in virtual touch
at any time of the day, (2) get in touch with people who have similar interests and collaborate with
them and (3) the ability to be able to search for users to add to your friend circle. The third
advantage is the focus of this paper. With the increasing number of users on online social networks,
it is important that when a user searches for another user, appropriate results are returned. This
paper identifies three criteria – proximity, similarity and interaction ‐ which can be used to rank
search results so that more appropriate results are presented to the searching user. Also, this
algorithm allows the search ranking to be customized according to the nature of the online social
network in question.
Keywords
Algorithm, Online Social Network, Search Ranking
Introduction
Social networks consist of sites that allow people to interact and share social experiences by
exchange of multimedia objects (text, audio, and video) associated with the people themselves and
their actions [1]. Every user on a social network possesses a user profile that contains all the
information about the user ranging from basic information like name, date of birth, gender, location
to more detailed information capturing educational and professional information and areas of
interest. Online social networks have become abstractions of the real world where users interact,
exchange and keep in touch. A major challenge, therefore, is the ability of a social network site to
allow users to search for potential friends and in doing this, return relevant results. The need for
such a search technique arises due to the inherent structure of social networks and the behaviour of
users on the network. Most searches on a social network are queries containing names of users and
a multitude of users may share the same name, which makes the trivial task of searching for online
friends very cumbersome.
Here we discuss social network search ranking by means of an algorithm which takes into account
three important factors that make search results relevant to a user – proximity, similarity and
interaction. Based on these three factors, search ranks are allotted to search results when one user
searches for another user by name. Also this algorithm takes into account the fact that all social
networks are not of the same type. When a user searches for friends on a network of classmates, it
is more relevant to rank results which are closer to the searching user in the virtual space higher in
the results list. On the other hand, on a social network for football fans, ranking must be done on the
basis of sharing common interests like being fans of the same football club or the same football
players. Still another example is a social network of acquaintances, where ranking on the basis of the
level of interaction among users might be a more useful criteria.
Related Work
Recently, there has been lots of interest in the field of online social network search and ranking. The
work in [1] focuses on the problem of how to improve the search experience of the users. It suggests
seed based ranking instead of text‐based ranking by measuring shortest distances between the
nodes in a friendship graph. This is the first work that makes use of the friendship graph in a big
social network to improve the search experience. The paper in [2] presents a novel social search
model for finding a friend with common interests in OSN (Online Social Network) with the
introduction of trust value and popularity value. The trust value is calculated by the improved
shortest path algorithm with a trust threshold, and the popularity value is obtained by the page rank
algorithm iteratively. In order to ensure more accurate search results, [3] demonstrates an algorithm
called E.L.I.T.E. which has five essential components ‐ Engagement‐U, Lifetime, Impression,
Timeframe and Engagement‐O. Engagement‐U is the affinity between users which is measured by
their relationships and other related interests between them, Lifetime is a trace of users’ past based
on their positive, neutral and even negative interactions and actions with other users, Impression is
the weight of each object determined by the number of positive responses from users, Timeframe is
the timeline scoring technique in which an object naturally loses its value as time passes and
Engagement‐O is the attraction of users to objects which is measured between objects and
associated interests of users.
Emphasizing on tree based search techniques, [4]compares the efficiency of reliable searching
between Maximum Reliable Tree (MRT) algorithm and Optimum Branching Tree (OBT) algorithm and
proposes the use of the MRT algorithm that is newly developed based on a graph‐based method, as
a generic technique which facilitates effective social network search and that can be the most
reliable social network search method for the promptly appearing smart phone
technologies.[5]explores the correlation between preferences of web search results and similarities
among users by presenting an efficient search system called SMART finder. The work provides more
information about SMART finder and publishes a quantitative evaluation of how SMART finder
improves web searching compared to a baseline ranking algorithm. The concept of user tag feedback
scores is employed in [6]. Based on this concept, a tag‐based feedback web ranking algorithm is
designed. The algorithm can efficiently use the user’s feedback.
Motivation
Searching for another person is one of the most fundamental uses to which social networks are put.
However, most of the research work until now does not specifically look into this area. A small
quantum of research work which does delve into this topic either provides just a conceptual
framework or does not discuss the implementation of the framework in detail.
Considering that online social networks are an abstraction of real‐world social networks, it is
important to understand the factors that influence the associations between the users. From
experience it can be said that one person’s association with another person is a function of how
closely linked the two users are, how many interests they share and how often they interact. We call
these three factors proximity, similarity and interaction respectively. Though this is not an
exhaustive list of factors that might influence association between two persons, it does give a fairly
good idea about the same. This work tries to use this concept of association between users to rank
search results when a particular user searches for another user on an online social network.
When any framework is to be implemented, it is required that the various factors which are being
considered are quantified. The proposed algorithm in this work provides such a means to quantify
the factors like proximity, similarity and interaction and then combine them to give a value for
association between two persons. Along with providing a conceptual framework, this work also
details out the implementation of the framework in practice.
Ranking Metrics
This paper defines three metrics for the purpose of ranking search results:
1. Proximity: On a social network, a user may be linked directly and indirectly to millions of
users. A social network is a myriad web of interconnections and a node which is closer to the
searching node in terms of number of hops is likely to be a better search result in
comparison to a node which is several hops further away. Proximity measures the closeness
of a node from the searching node. It is calculated by running a shortest distance algorithm
and finding the minimum distance of the various nodes in the search result from the
searching node. Let dab be the shortest distances between nodes a and b, then proximity pab
is calculated as
ൌ 1݀
2. Similarity: Social networks provide users a platform for interacting with other users who
share similar interests, listen to the same kind of music, read books from the same author,
follow the same sport, share the same hobbies, etc. On a social network all these details are
captured in the user profiles. When a user issues a search for another user, a user who is
more similar to the searching user is likely to be a more relevant result in comparison to a
user who is less similar. We define a user profile of user a as
ܲ ൌ ሼ݇, ݈, ݉, ݊ … ሽ
where k, l, m, n … are the interests. Similarity S, between two nodes a and b in a list of
search results with n nodes is defined as
ܵ ൌ ܥܽݎ݈݀݅݊ܽ݅ݐݕ
ሺ ܲ ת ܲሻ
ܥܽݎ݈݀݅݊ܽ݅ݐݕሺ ܲ ܲ ܲ … ܲሻ
3. Interaction: Social networks provide users different means for interacting with one another.
Most online social networks allow users to interact via textual comments, exchange links
through shares and like the posts of other users. When two users interact on a social
network, there are two factors that can be used to gauge the closeness of these two users –
frequency of interaction and recency of interaction. Frequency captures volume of
interaction (for each of comment, share, like) between two users within a given span of
time. This span of time is defined by window size w, defined further in the discussion.
Recency captures the time gap between the time of issuance of the search query and the
most recent interaction (for each of comment, share, like) between the searching user and
the user being searched. Frequency of an interaction of type T between user a and b is
defined as
݂
் ൌ 1 െ 1
ܸ
்
where Ti is the type of interaction (i = 1 for comment, i = 2 for share and i = 3 for like) and
Vab
Ti is the volume of interaction between users a and b of type Ti. Recency of interaction
between users a and b is defined as
ݎ் ൌ 1 െ
ݓ்
ݐ െ ݐ்
where to is the time instance at which the search query was issued, tabTi is the time instance
of the most recent interaction between user a and b of type Ti and window size wTi is defined
as
ݓ் ൌ max൫ݐ െ ݐ் , ݐ െ ݐ் , ݐ െ ݐௗ் , … . , ݐ െ ݐ் ൯
where users b, c, d, … ,n are the search results of the query. The frequency and recency
metrics are then used to define interaction i of type Ti between two users a and b as
்݅ ൌ ߙݎ் ሺ1 െ ߙሻ ்݂
where 0 ≤ α ≤ 1. α is defined as the relative importance of recency in comparison to
frequency when quantifying a particular interaction.
The weighted interaction metric I between two users a and b is defined as the weighted sum
of the three types of interactions (comment, share, like) as
ܫ ൌ ߚ݅భ் ߛ݅మ் ߜ݅య்
where β + γ + δ = 1. Here β, γ and δ define the percentage importance of the three types of
interaction i.e. comment, share and like in the overall interaction metric I.
The Association Function
Once the three metrics proximity, similarity and interaction have been defined, the next step is to
define the association between the user and the search results based on these three parameters.
Association captures the effect of the three metrics in question and returns a composite value which
describes how closely the user is associated with each of the search results. As discussed earlier, the
importance of each of these metrics may vary according to the nature of the online social network.
Therefore, weights are defined to give different weights to each of these factors while calculating
the rank of the search results.
The weighted association of a search result, b, being searched by user a is defines as
ܣ ൌ ߤଵ ߤଶܵ ߤଷܫ
where µ1, µ2 and µ3 are the weights assigned to each of the metrics pab, Sab and Iab. It is important to
note that
μଵ μଶ μଷ ൌ 1
Therefore, µ1, µ2 and µ3 can be defined as the percentage importance of proximity, similarity and
interaction in calculating the association according to the nature of the social network. For example,
in social network catering to football fans, where similarity is more important than proximity and
interaction, µ2 may have a higher value, while on a social network for classmates, proximity is more
important, so, µ1may have a higher value.
The Rank Function
The ranks of the search results are subsequently obtained by sorting the results by the weighted
association values. The search with the highest weighted association value is given rank 1, the search
result with the second highest weighted association value is give rank 2, and so on, until the search
result with the lowest weighted association value is given rank n.
Algorithm for Computing Search Ranks
The proposed algorithm is a ranking algorithm and, therefore, allows different search algorithms to
be used to identify the unranked search results. Once the unranked search results are obtained, they
are then ranked using the proposed algorithm. The association function is central to the calculation
of the ranks of the search results.
We consider that n potential search results are returned by the searching algorithm for a given
query. Once this list is obtained, the next steps are to calculate the association and then rank the list
on the basis of the association values.
Pseudo Code:
1. Initialize the network N, searching user s, search query q
2. ranked_search_results(N, s, q)
3. begin
4. search_results[1 … n] ←search(q, N)
5. ranked_list[1 … n] ←compute_ranks(s, search_results[1 … n])
6. end
7. compute_ranks(s, search_results[1 … n])
8. begin
9. window_sizes←get_window_sizes(s, search_results[1 … n])
10. common_interests_cardinality←get_common_interests_cardinality(s,search_results[1 … n])
11. association[1 … n] ←compute_association(s, search_results[1 … n], window_sizes,
common_interests_cardinality)
12. ranked_results[1 … n] ← sort(association[1 … n])
13. end
14. get_window_sizes(s, search_results[1 … n])
15. begin
16. comment_recency_window←0
17. share_recency_window←0
18. like_recency_window←0
19. for i = 1 to n
comment_recency_window←max (comment_recency_window, comment_recencysi)
20. share_recency_window←max(share_recency_window, share_recencysi)
21. like_recency_window←max(like_recency_window, like_recencysi)
22. end
23. End
24. get_common_interests_cardinality(s, search_results[1 … n])
25. begin
26. common_interests←interestss
27. for i = 1 to n
28. common_interests←common_interests interestsi
29. end
30. common_interests_cardinality←cardinality(common_interests)
31. end
32. compute_association(s, search_results[1 … n], window_sizes, common_interests_cardinality)
33. begin
34. for i = 1 to n
35. proximitysi←get_proximity(s, search_resultsi)
36. similaritysi←get_similarity(s, search_resultsi)/common_interests_cardinality
37. interactionsi← get_interaction(s, search_resultsi, window_sizes)
38. associationsi← µ1proximitysi+ µ2 similaritysi+ µ3interactionsi
39. end
40. end
41. get_proximity(s,i)
42. begin
43. proximity ←1/disctance(s,i)
44. end
45. get_similarity(s, i)
46. begin
47. similarity ←interestss interestsi
48. end
49. get_interaction(s, i, window_sizes)
50. begin
51. frequency ←1 – 1/interaction_volume(s,i)
52. recencycomment← 1 – window_sizecomment/recency_of_comment(s,i)
53. recencyshare← 1 – window_sizeshare/recency_of_share(s,i)
54. recencylike← 1 – window_sizelike/recency_of_like(s,i)
55. recency←β recencycomment + γrecencyshare+ δ recencylike
56. interaction ← α recency + (1 – α) frequency
57. end
Lemma 1: ܶ݅݉݁ ܥ݈݉݁ݔ݅ݐݕ ൌ ܱሺ݊ሻ
Proof: Time complexity of interest is that of the function compute_ranks. We assume that the
sorting algorithm used has a worst case complexity of n log n.
The Framework
We propose to model a social network in the form of a graph, where the nodes correspond to the
users and the edges correspond to the friendship between them. We have considered a simple
network shown in Fig.1 where a user John issues a search for another user Maria.
The search query will return Maria a, Maria b and Maria c. The task now is to rank these results
according to relevance to user John.
Proof:
begin
get_window_sizes n steps
get_common_interests_cardinality n steps
compute_association n steps
sort n log n steps
end
compute_ranks (3n + n log n) steps
Time Complexity ൎ Number of Steps = 3n + n log n = O(n log n)
Simulation and Results
We now return to the framework described in Fig. 1. The profiles of users we are interested in i.e.
John, Mariaa, Mariab and Mariac are defined as follows:
ܲ ൌ ሼ݇, ݉, ݊ሽ
ெܲೌ ൌ ሼ݇, ݉ሽ
ெ್ܲ ൌ ሼ݉, ݊ሽ
ெܲ ൌ ሼ݇, ݈ሽ
The interactions between John and Mariaa, Mariab and Mariac are defined in Table 1.
User Comment Share Like
Volume Most Recent Volume Most Recent Volume Most Recent
Mariaa 3 05/08/2012 1 05/08/2012 12 18/09/2012
Mariab ‐ ‐ ‐ ‐ ‐ ‐
Mariac 9 24/09/2012 10 18/07/2012 11 02/10/2012
Table 1
The ranking algorithm was then simulated by varying the weights of the different parameters α, β, γ,
δ, µ1, µ2 and µ3.
Results and Comparisons
John
Mariab
Mariac
Mariaa
Figure 1
Some illustrative results of the proposed ranking algorithm for different weights are mentioned
along with a comparison of search results from other popular social network sites and recent
research work in this area in Table 2.
S.No
Proposed Algorithm Rank results
α Β γ Δ µ1 µ2 µ3 Rank 1 Rank 2 Rank 3
1. 0.5 0.5 0.3 0.2 0.34 0.33 0.33 Mariac Mariaa Mariab
2. 0.5 0.5 0.3 0.2 0.5 0 0.5 Mariac Mariaa Mariab
3. 0 0 0 0 0.5 0.5 0 Mariaa Mariac Mariab
4. 0 0 0 0 0 1 0 Mariaa Mariab Mariac
5. 0.5 1 0 0 0.5 0 0.5 Mariac Mariaa Mariab
6. 0 1 0 0 0.5 0 0.5 Mariac Mariaa Mariab
7. Facebook Mariac Mariaa Mariab
8. Google + Mariaa Mariac Mariab
9. Rank Algorithm Based on Trust and Popularity [2] Mariaa Mariac Mariab
Table 2
The small simulation network which was used for computing the results of the proposed algorithm
was also reconstructe
本文档为【A Ranking Algorithm for Online Social Network Search】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。