首页 How does Netflix recommend movies

How does Netflix recommend movies

举报
开通vip

How does Netflix recommend movies 4 How does Netflix recommend movies? We have just seen three beautiful equations in the last three chapters, each used at least a billion times every single day: pi[t+ 1] = γi SIRi[t] pi[t] Ui(b) = vi − pi(b) pi∗T = pi∗TG. We now continue with our first...

How does Netflix recommend movies
4 How does Netflix recommend movies? We have just seen three beautiful equations in the last three chapters, each used at least a billion times every single day: pi[t+ 1] = γi SIRi[t] pi[t] Ui(b) = vi − pi(b) pi∗T = pi∗TG. We now continue with our first block of four chapters that present four funda- mental algorithms: Distributed power control, second price auction, pagerank, and now, collaborative filtering. These four chapters also introduce the basic languages of optimization, game, graph, and learning theories. 4.1 A Short Answer Netflix started its DVD rental business in 1997: instead of you going to rental stores, DVDs come to you in the mail. Instead of incurring a late fee for each day you hold the DVD beyond the return date, you can keep holding the DVD, as long as you continue to pay monthly subscription fee, but you cannot receive a new one without returning the old one. This is similar in spirit to the sliding win- dow mechanism of congestion control in Chapter 14, or the tit-for-tat incentive mechanism in P2P in Chapter 15. Netflix created an attractive web interface and maintained a remarkable inventory control and mail delivery system. It operated with great scalability (per customer cost much lower as the number of customers go up) and stickiness (users are reluctant to change the service). By 2008, there were about 9 million users in North America. As Netflix’s business grew, it moved on to the next mode of delivering enter- tainment. This time it is streaming movies and TV programs from video servers, through the Internet and wireless networks, to your Internet-connected devices: game consoles, set-top boxes, smart phones, and tablets. With its branding, choice of content, and aggressive pricing, Netflix’s subscriber nearly tripled to 23 million by April 2011. Netflix video streaming generated so much Internet traffic that more than one in every four bits going through the Internet that month is Netflix traffic. 4.1 A Short Answer 49 In September 2011, Netflix announced that it would separate the DVD rental and online streaming businesses. We will later look at the cloud-based video distribution technology, including Netflix, Amazon Prime, Hulu, HBO Go, etc., in Chapter 17, and how that is changing the future of both entertainment and networking. In this chapter, we instead focus on the dimension of social networking in Netflix: How does it recommend movies for you to watch (either by mail or by streaming)? It is like trying to read your mind and predict your movie rating. An effective recommendation system is important to Netflix: it enhances user experience, increases royalty and volume, and helps with inventory control. Recommendation system is a helpful feature for many other applications be- yond video distribution. Similar to search engines in Chapter 3, recommendation systems give rise to structures in the sea of raw data and reduce the impact of information explosion. You must have also seen how Amazon recommends products to you based on your purchase and viewing history, adjusting its recommendation each time you browse a product. This recommendation runs content-based filtering, in contrast to collaborative filtering used by Netflix. There is a related question: when can you trust the averaged rating of a product on Amazon? It is a different variant of the recommendation problem, and will be taken up in Chapter 5. You must have been swayed by recommendations on YouTube that followed each of the videos you watched. We will look at YouTube recommendation in Chapter 7. You may have used Pandora’s online music selection, where recommendation is developed by experts. But you get to thumb-up or thumb-down the recom- mendation: an explicit binary feedback. Netflix wants to develop a recommendation system that does not depend on any expert, but instead use the rich history of all the user behaviors to profile each user’s taste on movies. This system has the following input, output, and criterion of success: • Among the inputs to this system is the history of star ratings across all the users and all the movies. Each data consists of four number: (a) user ID, indexed by u, (b) movie title, indexed by i, (c) number of stars, 1-5, of the rating, denoted as rui, (d) date of the rating, denoted as tui. This is a really large data set: think of millions of users and tens of thousands movies. But only a fraction of users actually watched any given movie, and only a fraction of that fraction actually bothered to rate the movie. Still the size of this input is in the order of billions for Netflix. • The output is, first of all, a set of predictions rˆui, one for each movie i that user u has not watched yet. These can be real numbers, not just integers. And we can interpret a predicted rating of say 4.2 as saying that the user will rate this movie with 4 stars with 20% probability and 5 stars with 80% probability. The actual output is a short, ranked list of movies recom- 50 How does Netflix recommend movies? mended to each user u, presumably those movies receiving rˆui ≥ 4 or the top 5 movies with the highest predicted rˆui. • The real test of this mind-reading system is whether user u actually likes the movies recommended. This information, however, is hard to collect. So a proxy used by Netflix is Root Mean Squared Error (RMSE), measured for those (u, i) pairs where we have both the prediction and the actual rating. Let us say there are C such pairs. Each rating prediction’s error is squared: (rui − rˆui)2, and then averaged over all the predictions. Since square was taken, to scale the numerical value back down, a square root is taken over this average: RMSE = √√√√∑ (u,i) (rui − rˆui)2 C . Netflix could have used aboslute value of the error instead of squared error, but for our purpose we will stick to RMSE as the metric that quantifies the accuracy of a recommendation system. The smaller the RMSE, the better the recommendation system. Can recommendation accuracy be improved by 10% over what Netflix was using? That was the question Netflix throwed at the research community in Oc- tober 2006, with a prize of $1 million and through an open, online, international competition called the Netflix Prize. The competition’s mechanism is interesting in its own right. Netflix made available a set of over 100 million ratings, as part of its record from 1999 to 2005. That amount of data can just fit in the memory of standard desktops in 2006, making it easy for anyone in the world to participate in the competition. These rating data come from more than 480,000 users and 17,770 movies. On average, each movie is rated by more than 5000 users and each user rates more than 200 movies. But those average numbers disguise the real difficulty here: many users only rate a few movies, and very few users rate a huge number of movies (one user rated over 17,000 movies). Whatever recommendation system we use, it must be able to predict well for all these users. The exact distribution of the data is shown in Figure 4.1: • A little less than 100 million ratings are made public as the training set. • About 1.4 million additional ratings are also made public and they have similar statistical properties as the test set and the quiz set described next. It is called the probe set, which competitors in Netflix Prize can use to test their algorithms. • About 1.4 million additional ratings are hidden from the competitors, called the quiz set. Each competing team can submit an algorithm that run on the quiz test, but not more than once a day. The RMSE scores continuously updated on the leaderboard on Netflix Prize website are based on this set’s data. 4.1 A Short Answer 51 _ Training Set 100 Million Public Probe Set Quiz Set Test Set Hidden 1.4 M 1.4 M 1.4 M Figure 4.1 Netflix Prize’s four data sets. The training set and probe set are publicy released, whereas the quiz set and test set hidden from the public. The probe, quiz, and test sets have similar statistical properties, but the probe set can be used by each competiting team as often as they want, the quick set at most once a day, and the final decision is based on comparison of RMSE on the test set. • Another 1.4 million ratings, also hidden from the competitors, called the test set. This is the real test. RMSE scores on this set determine the winner. Each competing team first come up with a model for its recommendation system, then decide their model parameters’ values based on minimizing the RMSE between known ratings in the training set and their model’s predictions, and finally, use this model with tuned parameters to predict unknown ratings in the quiz set. Of course Netflix knows the actual ratings in the quiz set, and can evaluate RMSE between those ratings and the predictions from each team. This is a smart arrangement. No team can reverse engineer the actual test set, since only scores on the quiz set are shown. It is also helpful to have a probe set where the competiting teams can run their own tests as many times as they want. Netflix had its own algorithm called Cinematch that gave an RMSE of 0.9514 on the quiz set if its parameters were tuned by the training set. Improving RMSE by even 0.01 can sometimes make a real difference in the top 10 recommendation for a user. If the recommendation accuracy is improved by 10% over Cinematch, it would push RMSE to 0.8563 on the quiz set, and 0.8572 on the test set. This Netflix Prize ignites the most intense and high-profile surge of activities in the research communities of machine learning, data mining, and information retrieval in recent years. To some researchers, the quality and sheer size of the available data is as attractive as the hype and prize. Over 5000 teams worldwide entered more than 44,000 submissions. Both Netflix and these research fields benefit from the three year quest towards the goal of 10%. Turns out setting the 52 How does Netflix recommend movies? RMSE (On Quiz Set) 0.9514 0.8728 0.8712 0.8616 0.8563 0.8553 Oct. 2 2006 Oct. 2007 Oct. 2008 June July 2007 Time (a) (b) (c) (d) (e) (f) * * * * * Figure 4.2 Netflix Prize’s timeline and some of the highlight events. It lasted for almost three years and the final decision came down to a twenty-minute differential. target as 10% improvement was a really good decision. For the given training set and quiz set, getting 8% improvement was reasonably easy, but getting 11% would have been extremely difficult. Here are a few highlights in the history of Netflix Prize, also shown in the time- line in Figure 4.2: (a) Within a week of the start of the competition, Cinematch was beaten. (b) By early September, 2007, team BellKor made 8.26% improve- ment over Cinematch, but the first place changed hand a couple of times, until (c) in the last hour before the first year of competition ended, the same team got 8.43% improvement and won the $50,000 annual progress prize for leading the pack during the first year of the competition. Then teams started merging. BellKor and BigChaos, two of the leading teams merged, and (d) received the 2008 progress prize for pushing RMSE down to 0.8616. They further merged with Pragmatic Theory, and (e) the new team BellKor’s Pragmatic Chaos in June 2009 became the first team that achieved more than 10% improvement, beating Cinematch by 10.06%, on the quiz set. Then the competition entered the “last call” period: all teams had 30 days to make their final submissions. (f) At the end of this period, two teams beat Cinematch by more than 10% on the quiz set: BellKor’s Pragmatic Chaos had an RMSE of 0.8554, and The Ensemble had an RMSE of 0.8553, slightly better. The final winner is to be declared by comparing their RMSEs on the test set. Here is the grand finale: both teams beat Cinematch by more than 10% on the test set, and actually got the same RMSE on that set: 0.8567. But BellKor’s Pragmatic Chaos submitted their algorithm 20 minutes earlier, thus became the winner of the grand prize. A world-class science race lasting almost 3 years concluded with a 20 minute differential. You must be wondering what algorithm did BellKor’s Pragmatic Chaos use 4.1 A Short Answer 53 _ ? ? 1 2 3 4 5 6 7 8 1 2 3 4 5 6 5 2 4 4 3 1 3 5 4 5 4 1 1 2 3 3 2 4 ? ? Movies Users Figure 4.3 Recommendation system’s problem: predicting missing ratings from given ratings in a large yet sparse table. In this toy example of 6 users and 8 movies, there are 18 known ratings as a training set, and 4 unknown ratings to be predicted. Real problems are much larger, with billions of entries in the table, and sparse: only about 1% filled with known ratings. in the final winning solution? The answer is documented in detail in a set of three reports, one from each component of this composite team, linked from the Netflix Prize website. But what you’ll find is that the winning solution is really a cocktail of many methods combined, with hundreds of ingredient algorithms blended together, and thousands of model parameters fine-tuned specifically to the training set provided by Netflix. That was what it took to get that last 1% of improvement. But if you are only interested in the main approaches, big ideas, and getting 8-9% improvement over Cinematch, there are actually just a few key methodologies. And those are what we will focus on in the rest of this chapter. To start with, take a look at the table in Figure 4.3. We can also think of the table as a matrix R, or a weighted bipartite graph where the user nodes are on the left column and the movie nodes on the right, with a link connecting user node u and movie node i if there was such a rating, and the value of the rating is the weight of the link. In Chapter 8 we will discuss other matrices that describe the structure of different graphs. Each column in this table is a movie (or an item in general), each row is a user, and each cell’s number is the star rating by that user for that movie. Most cells are empty since only few users rate few of the movies. You are given a large yet sparse table like this, and asked to predict some missing entries like those four indicated by question marks in the last two rows in this table. This is a visualization of our problem at hand. There are two main camps of techniques for any recommendation system: content-based filtering and collaborative filtering. 54 How does Netflix recommend movies? Content-based filtering only looks at each row in isolation and attach labels to the columns: if you like a comedy played by Rowan Atkinson, you’ll probably like another comedy played by Rowan Atkinson again. This straight forward solution is often inadequate for Netflix. In contrast, collaborative filtering exploits all the data in the entire table, across all the rows and all the columns, trying to discover structures in the pattern and values of the entries in this table. Drawing an imperfect analogy with search engine in Chapter 3: content-based filtering is like the relevance score of individual webpages, and collaborative filtering is like the importance score determined by the connections among the webpages. Within the camp of collaborative filtering, there are in turn two main ap- proaches. • The intuitively simpler one is called neighborhood model. Here, two users are “neighbors” if they share similar tastes on movies. If Alice and Bob both like “The Schindler’s List” and “Life is Beautiful”, but not as much “E.T.” and “Lion King”, then knowing Alice likes “Dr. Zhivago” a lot would make us think Bob likes “Dr. Zhivago” a lot, too. In the neighborhood method, we first compute a similarity score between each pair of users. Larger the score, closer these two users are. Then for a given user whose opinion of some unwatched movie we would like to predict, we select some number, like 50, of the most similar users who have rated that movie. And then take a weighted sum of these ratings and call that our prediction. • The second approach is called latent factor model. It assumes that under- neath the billions of ratings out there, there are only a few hundred key factors on which users and movies interact. Statistical similarities among users or among movies are actually due to some hidden, low-dimensional structure in the data. This is a big assumption: there are much fewer types of people and movies than there are people and movies, but it sounds about right. Turns out one way to represent a low-dimensional model is to fac- torize the table into two sets of short vectors. Determining baseline predictors for the neighborhood model, and finding just the right short vectors in the latent factor model, boil down to solving least squares problems, also called linear regression. Most of the mileage in the leading solutions to the Netflix Prize is obtained by combining variants of these two approaches, supplemented by a whole bag of tricks. Two of these supplementary ideas are particularly interesting. One is implicity feedback. A user does not have to rate a movie to tell us something about her mind. Which movies she browsed, which she watched, and which she bothered to rate at all are all helpful hints. For example, it is useful to leverage information in a binary table where each entry simply indicates whether this user rated that movie or not. Another idea played an important role in pushing the improvement to 9% 4.2 A Longer Answer 55 g Ground Truth Model Parameter Observation Training Prediction Prediction Observation Model Figure 4.4 The main flow of two steps: first training and then prediction. The training module optimizes over model parameters, and the prediction module uses the optimized model parameters to make predictions. in Netflix Prize: incorporate temporal dynamics. The model parameters be- come time-dependent. This allows the model to capture the change in person’s taste and in trends of movie market, as well as mood of the day when a user rates movies, at the expense of dramatically increasing the number of model parameters to optimize over. One interesting observation is that when a user rates many movies on the same day, she tends to give similar ratings to all of these movies. By discovering and discounting these temporal features, the truly long-term structures in the training set are better quantified. In the next section, we will present baseline predictor training and the neigh- borhood method, leaving latent factor models to Advanced Material. 4.2 A Longer Answer Before diving into any specific predictors, let us take a look at the generic work- flow consisting of two phases, as shown in Figure 4.4: • Training : we put in model with its parameters to work on observation data, and then compare the predictions with ground truth that we known for training data. Then we tune the model parameters so as to minimize some error metric, like RMSE, relative to the ground truth that we know. • Prediction: now we use the optimized model parameter to work on observation data beyond those in the training set. These predictions are then used in practice (or, in the case of Netflix Prize, compared against a quiz set of ground truth that Netflix knows). 56 How does Netflix recommend movies? 4.2.1 Baseline predictor: least squares Collaborative filtering extracts and leverages structures from the data set like those in the table in Figure 4.3. But here is an easy method that does not even require any understanding of the st
本文档为【How does Netflix recommend movies】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_768423
暂无简介~
格式:pdf
大小:472KB
软件:PDF阅读器
页数:25
分类:互联网
上传时间:2012-11-12
浏览量:16