[Massive Data Analysis] Recommendation Systems

Spring 2016 Massive Data Analysis Lecture Notes
Ch9. Recommendation Systems
Instructor: Jia-Shung Wang

Advertising Opportunities and Overview of Recommendation Systems

Content-based systems examine properties of the items recommended.
Collaborative filtering systems recommend items based on similarity measures between users and/or items.
Formal model
- $X$ = set of Customers, $S$ = set of Items —> Utility function $u: X × S \rightarrow R$
- 但是透過Utility Matrix僅能得到很少量的資訊，一般解法：cluster items and/or users.
Key Problems
- Gathering known ratings for matrix：How to collect the data in the utility matrix? 訪問、根據行為推斷（如被買得多代表high rating?）
- Extrapolate unknown ratings from the known ones (Utility matrix $U$ is sparse) 大部分的使用者都不會打分數，相對地，大部分的影片都沒有被打分數，且新影片沒人打分數，新使用者也沒打過分數
  - Mainly interested in high unknown ratings
  - We are not interested in knowing what you don’t like but what you like
  - Evaluating extrapolation methods：How to measure success/performance of recommendation methods
Three approaches to recommender systems
- Content-based
- Collaborative
- Latent factor based

Content-based Recommender Systems

Collaborative Filtering

簡述：the techniques for using one person’s behavior to predict what other people will do.
User-user collaborative filtering：對於一個用戶X，首先找到與其相似的一個用戶集合，這個相似是透過他們的rating來判定的，rating的likes和dislikes越相似，用戶就越相似。然後推薦這些相似用戶及裡的用戶喜歡的items並且預測X評分最高的items給用戶X
- 計算使用者相似度的方法 Let $r_{x}$ be the vector of user $x$’s ratings,
  - Jaccard similarity: 沒有考慮評分，容易導致相似度計算錯誤
  - Cosine similarity: 將沒有評分的 item 設為 0
  - Pearson correlation coefficient：$S_{xy} =$ items rated by both users $x$ and $y$ 減掉每行平均
  - From similarity metric to recommendations 評分預測：迭代求出與 $X$ 最相似的 $k$ 個用戶，預測 $X$ 對這 $k$ 個用戶也評分過的 items 的分數
    - Let $r_{x}$ be the vector of user $x$’s ratings, and $N$ be the set of $k$ users most similar to $x$ who have rated item $i$
Item-item collaborative filtering：推薦給用戶那些和他們之前喜歡的物品相似的物品。比如說，他會因為你買過 “Data Mining” 而推薦給你 “Machine Learning”。不過，Item CF並不利用物品內容的屬性季換物品之間的相似度，而是透過分析用戶行為紀錄來計算物品之間的相似度。基於 Item CF，可以利用用戶的歷史行為對推薦結果提供解釋，比如說推薦給你”天龍八部”的原因可能是因為你之前喜歡”射雕英雄傳”
Common Practice of CF
- Define similarity $s_{ij}$ of items $i$ and $j$, select $k$ nearest neighbors $N(i; x)$
- Items most similar to $i$, that were rated by $x$
- Estimate rating $r_{xi}$ as the weighted average：
Item-Item vs. User-User
- In practice, it has been observed that item-item often works better than user-user because items are simpler, users have multiple tastes
- UserCF 的推薦結果著重於反應和用戶興趣相似的小群體熱點，而 ItemCF 則著重於維繫用戶的歷史興趣。換句話說，UserCF 的推薦更社會化，反映了用戶所在的小型興趣群體中物品的熱門程度，ItemCF 的推薦更加個性化，反映了用戶自己的興趣傳承。
Pros/Cons of Collaborative Filtering
- Pro:
  1. Works for any kind of item
  2. No feature selection needed
- Cons:
  1. Cold Start: Need enough users in the system to find a match 需要夠多的使用者才能找到 match
  2. Sparsity: The user/ratings matrix is sparse. Hard to find users that have rated the same items
  3. First rater: Cannot recommend an item that has not been previously rated. New items, Esoteric items 無法推薦沒有被評分過的物品
  4. Popularity bias: Cannot recommend items to someone with unique taste. Tends to recommend popular items 傾向於推薦熱門的物品
Hybrid Methods: 實作兩個或多個不同的推薦系統然後將他們合併，或是加入 content-based 的方法到 CF 中，這樣新的物品就有 item profile，也有辦法處理新的使用者
Evaluation using RMSE

Recommendations via Optimization

目標：Make good recommendations
- Lower RMSE –> better recommendations
- 想要推薦一些使用者沒看過的物品 –> 很難做到好的推薦
- 所以就來建一個系統，可以在已知 rating 的物品或是使用者上做出好的推薦，並期望這個系統也能對未知 rating 的物品有很好的預測結果
Idea：設定一組 $w$ 值，使得 $w$ 可以對已知 rating 的物品或是使用者做出好的推薦 work well on known (user, item) ratings
- 如何找到 $w$ ? Define an objective function and solve the optimization problem
- Find $w_{ij}$ that minimize SSE on training data (Think of $w$ as a vector of numbers)（p.62有找min.的方法）

Dimensionality Reduction

資料的低維表示空間中的點不是完全隨機分布的，它們分布在某個子空間中。我們的目標就是找到這個可以有效表示所有資料的子空間。
UV-decomposition：矩陣分解，像下圖 $M$ 可以分解成 $U \times V$
- 完整的UV-decomposition演算法：
  1. Preprocessing of the matrix $M$.
  2. Initializing $U$ and $V$.
  3. Ordering the optimization of the elements of $U$ and $V$.
  4. Ending the attempt at optimization.

Singular-Value Decomposition (SVD)

Allows an exact representation of any matrix, and also makes it easy to eliminate the less important parts of that representation to produce an approximate representation with any desired number of dimensions. 一個能適用於任意的矩陣的一種矩陣分解的方法
Definition of SVD: Let $M$ be an $m \times n$ matrix, and let the rank of $M$ be $r$. Recall that the rank of a matrix is the largest number of rows (or equivalently columns) we can choose for which no nonzero linear combination of the rows is the all-zero vector 0.
目標：find matrices $U$, $S$, and $V$ with the following properties
- $U$ $m \times r$ column-orthonormal matrix 每欄都是一個單位向量，欄和欄內積會是0
- $V$ $n \times r$ column-orthonormal matrix. 通常 $V$ 會使用轉置的型態 $V^{T}$，所以 $V^{T}$的列是正交的
- $S$ is a diagonal matrix; 對角矩陣，不在對角線上的元素都是 0. The elements of $S$ are called the singular values of $M$.
The SVD of a matrix $M$ is strongly connected to the eigenvalues of the symmetric matrices $M^{T}M$ and $MM^{T}$
Interpretation of SVD
SVD Example: Users-to-Movies (users-movies 矩陣，其中 row 代表使用者，column 代表一部電影)
- concepts 就是 SVD 分解要告訴我的，用戶是 sci-fi lover 和 romance lover 類型，電影是 sci-fi 和 romance 等類型。也就是不同的 genres, or topics。
- 我们可以將 $U$ 的列看成 concepts，如U的第一欄對應 Sci-Fi concept，第二欄對應 romance concept（第三欄可能代表其它的什麼，但不一定能用一個類别来描述和解釋）。我們從這裡可以看到，前 4 個使用者喜歡 sic-fi，後 3 個使用者喜歡 romance。
- 於是我們可以將 $U$ 矩陣看成是 user to concept similarity matrix。其中元素代表某個使用者對某個 concepts 的感興趣程度。這裡是說第一個使用者很喜歡 sci-fi concept (0.13)，而第五個使用者喜歡 romance concept (-0.59)
- Sigma 矩陣中的值可看做是 concepts 的強度，如 sci-fi concept 強度 (12.4) 就比 romance concept 的強度 (9.5)強。注意這裡還有第三種 concepts，但是其強度相較前面兩者太小，可以忽略。
- 同樣的，我们可以將 $V$ 矩陣看成是 movies to concept similarity matrix。從 $V$ 矩陣的第一欄可以看到，第一部電影與第一個 concept 和第三個 concept 有較高的相關度，然而第三個 concept 強度過低，所以對解釋資料來說並不重要。
總歸來說，SVD 個矩陣可以這樣表示 ‘movies’, ‘users’ 和 ‘concepts’
- $U$ user-to-concept similarity matrix
- $V$ movie-to-concept similarity matrix
- $S$ its diagonal elements, ‘strength’ of each concept

Latent Factor Models

簡介：LFM 屬於一種隱含語意分析技術，目的是找出潛在的主題或是分類。在推薦系統中它能夠基於使用者個行為對 item 進行自動分類，也就是把 item 劃分到不同的類別或是主題，這些類別或是主體可以理解為使用者的興趣。
見上圖 For now let’s assume we can approximate the rating matrix $R$ as a product of “thin” $Q · P^{T}$
- $R$ 有 missing entries 但這裡可以忽略
- Basically, we will want the reconstruction error to be small on known ratings and we don’t care about the values on the missing ones
- 與 SVD 不同之處：SVD 堅持 $U$ 和 $V$ 是正交的，但是 LFM 不在意
- “SVD” on Netflix data: $R ≈ Q · P^{T}（A = R, Q = U, P^{T} = SV^{T})$
- How to estimate the missing rating of user $x$ for item $i$?
見上圖，使用者和電影在同一個二維空間裡，使用者所在的位置代表他對於在某個 dimension 的電影的喜好（或是討厭）程度。比如那個戴帽子的先生在第四象限，我們可以推論他喜歡 male-oriented movies，而且討厭 serious movie，於是我們可以推薦給他 “Dumb and Dumber”，而不會給他 “The Color Purple”。但是在同一張圖上，正中間那位先生就沒有被很好地分類，或許我們還需要另外的 dimension
參考：http://goo.gl/yXsL66

Slides

from CS246 Mining Massive Data Sets, Stanford University
Part 1: http://i.stanford.edu/~jure/pub/talks2/07-recsys1.pdf
Part 2: http://i.stanford.edu/~jure/pub/talks2/08-recsys2.pdf