[Massive Data Analysis] Advertising on the Web
Spring 2016 Massive Data Analysis Lecture Notes
Ch8. Advertising on the Web
Instructor: Jia-Shung Wang
Advertising Opportunities and Online Algorithms
- post ads directly, displayed ads, search ads
- Each of these methods, and many others like these, raise enormous privacy issues 各類型的線上廣告衍伸出隱私問題
- Online Algorithms
- Classic model of algorithms
- You get to see the entire input, then compute some function of it
- In this context, “offline algorithm”
- Online Algorithms
- You get to see the input one piece at a time, and need to make irrevocable(經決定後不可改變的) decisions along the way
- Similar to the data stream model
- Classic model of algorithms
Online Bipartite Matching
- Bipartite Matching:一張二分圖上的匹配稱作二分匹配,理所當然所有的匹配邊都是這橫跨這兩群點的邊,就像是連連看一樣。
- Max. Matching 最大匹配:二分圖G的所有匹配中,edges數量最多的匹配
- p.7-9動畫:
- 男生一群、女生一群,p.7 的連線指的是他們彼此間的 Preferences,而目標是要找到一個 match 方式使得男生女生都可以剛剛好連到有 preference 的對象 (Match boys to girls so that maximum number of preferences is satisfied)
- 先連 2-b 和 4-a,因為 2 和 4 除了 b 和 a 以外沒有其他的 preference 了;接著連 1-c 和 3-d(不能重複連到女生,一個配舞伴的概念,一個女生不會有兩個舞伴)
- Matching Algorithm
- Problem:Find a maximum matching for a given bipartite graph
- 第一步:初始化,只有男生組
- 第二步開始:在每個round,決定其中一個女生的選擇 (girl’s edges are revealed)
- At that time, we have to decide to either 兩種選擇
- Pair the girl with a boy 配對
- Do not pair the girl with any boy 不配對
- At that time, we have to decide to either 兩種選擇
- Greedy algorithm for the online graph matching problem: 把女生配給任何有資格的男生 (If there is none, do not pair girl) 一開始就把一些明顯可以配對的點給配對起來(能連的先連一連),這樣就不必替每一個未匹配點建立交錯樹了。這個手段在圖很龐大的時候可以發揮作用。
最佳配對方法與 greedy algorithm 的比較
- Competitive Ratio: For input $I$, suppose greedy produces matching $M_{greedy}$ while an optimal matching is $M_{opt}$, Competitive ratio = $\textrm{min_all_possible_inputs } I (\left |M_{greedy} \right | / \left |M_{opt} \right |)$ —> greedy’s worst performance over all possible inputs $I$
Consider a case: $M_{greedy} ≠ M_{opt}$ 當 greedy 所算出來的配對不是最佳時
- Consider the set $G$ of girls matched in $M_{opt}$ but not in $M_{greedy}$ 女生配到的男生不是最佳的
- Then every boy $B$ adjacent to girls in $G$ is already matched in $M_{greedy}$
- If there would exist such non-matched (by $M_{greedy}$) boy adjacent to a non-matched girl then greedy would have matched them
- Since boys $B$ are already matched in $M_{greedy}$ then $\left |M_{greedy} \right | \geq \left | B \right |$
Summary so far: $G$ 表示在 $M$ 中存在 $M_{opt}$,但在 $M_{greedy}$ 中不存在的右節點的集合,$B$ 表示 $G$ 中有節點的鄰接邊的左節點的集合 —> $\left |M_{greedy} \right | \geq \left | B \right |$
There are at least $\left | G \right |$ such boys ($\left | G \right | \leq \left | B \right |$) otherwise the optimal algorithm couldn’t have matched all girls in $G$. So $\left | G \right | \leq \left | B \right | \leq \left |M_{greedy} \right |$
- By definition of $G$ also: $\left |M_{opt} \right | \leq \left |M_{greedy} \right | + \left | G \right |$
- Worst case is when $\left | G \right |$ = $\left | B \right |$ = |$M_{greedy}$|
- $\left |M_{opt} \right | \leq 2 \left |M_{greedy} \right |$ then $\left |M_{greedy} \right | / \left |M_{opt} \right | \geq \frac{1}{2}$
Web Advertising
- Google’s Adwords System
- Show only a limited number of ads with each query 針對每次搜尋,Google 只會顯示一部分廣告
- Decide which ads to show, as well as the order in which to show them 所以Google必須決定哪些廣告要show以及show的順序
- Users of the Adwords system specified a budget 而每個廣告商都有一定的預算,這就是 Adwords 要解決的問題
- The revenue of a selection of ads is the total value of the ads selected 廣告收入來自於廣告的價值
- 而廣告的價值來自於投標的商品 (the product of the bid) 以及廣告點擊率
- on-line algorithm 的價值則是用一個月內(因為廣告商的預算通常以一個月為週期)獲取的總收入來衡量
- 所以對所有演算法要有統一的衡量標準:competitive ratio = the minimum total revenue for that algorithm, on any sequence of search queries / the revenue of the optimum off-line algorithm 該演算法的最小收入 / off-line algorithm 的最大收入
- 又回到 Google Adwords 要解決的問題
- A stream of queries arrives at the search engine: $q_{1}, q_{2}, …$ 今天搜尋引擎收到一個 query stream
- Several advertisers bid on each query 而 stream 裡每個 query 都有多個廣告商競標
- When query $q_{i}$ arrives, search engine must pick a subset of advertisers whose ads are shown 當 query 被送進來,搜尋引擎必須決定要 show 哪些廣告
- Goal: Maximize search engine’s revenues 目標當然是為搜尋引擎帶來最大收益
- Simple solution: Instead of raw bids, use the “expected revenue per click” (i.e., Bid*CTR) 去計算每次的廣告點擊能有多少的預期收入
- Expected revenue per click
- Budget: 每個廣告商都有預算的限制,搜尋引擎必須保證廣告的投放不會超過他們的預算
- CTR 是未知的:每個廣告都有不同的可能性會被點到
- Clickthrough rate (CTR) is measured historically
- Exploration vs. exploitation 是一個很困難的問題
- Exploit: Should we keep showing an ad for which we have good estimates of click-through rate 要持續曝光廣告使得它能擁有較好的 CTR
- Explore: Shall we show a brand new ad to get a better sense of its click-through rate 還是要放全新的廣告,讓這個廣告的CTR比較好預測
- Greedy Algorithm
- 假設在一個很簡單的環境下:每個 query 都只有一個廣告、每個廣告商的預算相同、每個廣告被點擊的機會一樣、每個廣告的價值也一樣
- For a query pick any advertiser who has bid 1 for that query
- Worst case: Two advertisers $A$ and $B$, $A$ bids on query $x$, $B$ bids on $x$ and $y$, same budgets of $4
- Query stream: $x x x x y y y y$
- Worst case greedy choice: $B B B B$ _ _ _ _—> $\mathrm{Competitive ratio} = \frac{4}{8} = \frac{1}{2}$
- Optimal: $A A A A B B B B$
- Greedy algorithm is deterministic – it always resolves draws in the same way
- 所以有人提出了 BALANCE Algoritm 來處理這些問題
BALANCE Algoritm
- 概念:For each query, pick the advertiser with the largest unspent budget 對於每次 query,選擇出價最高且剩餘預算最多的廣告商。如果多個廣告商剩餘預相同,則任意挑選一個,並盡可能的平衡所有廣告商的消費。
- 與 optimal 比較:Two advertisers $A$ and $B$, $A$ bids on query $x$, $B$ bids on $x$ and $y$, same budgets of $4
- Query stream: $x x x x y y y y$
- BALANCE choice: $A B A B B B$ _ _
- Optimal: $A A A A B B B B$
- In general, for BALANCE on 2 advertisers —> $\mathrm{Competitive ratio} = \frac{6}{8} = \frac{3}{4}$
- 另一個case (w.l.o.g.): 2 advertisers, $A_{1}$ and $A_{2}$, each with budget $B (\geq 1)$
- 最佳解會把兩方的預算都用完,而 BALANCE 則是必須用完至少一方的預算
- 如果用不完,可以分配更多的 queries. Whenever BALANCE makes a mistake (both advertisers bid on the query), advertiser’s unspent budget only decreases. Since optimal exhausts both budgets, one will for sure get exhausted
- Assume BALANCE exhausts $A_{2}$’s budget, but allocates $x$ queries fewer than the optimal
- Revenue: $BAL = 2B - x$
- In the general case, worst competitive ratio of BALANCE is $1–1/e = approx. 0.63$
- Balance Algorithm with Many Bidders
- There are $N$ advertisers, $A_{1}, A_{2}, … , A_{N}$. 有 $N$ 個廣告商
- Each advertiser has a budget $B = N$ 預算都是 $N$
- There are $N$ queries $q_{1}, q_{2}, … , q_{N}$. 有 $N$ 個queries
- Advertiser $A_{i}$ bids on queries $q_{1}, q_{2}, … , q_{i}$ and no other queries. 廣告商 $A_{i}$ 投錢在 $q_{1}, q_{2}, … , q_{i}$ 上
- The query sequence consists of $N$ rounds. The $i$th round consists of $B$ occurrences of query $q_{i}$ and nothing else. 查詢序列有 $N$ 個 round,第 $i$ 個 round 包含了 $B$ 個 $q_{i}$
- The optimum off-line algorithm assigns the $B$ queries _q__$i$ in the $i$th round to $A_{i}$ for all $i$. Thus, all queries are assigned to a bidder, and the total revenue of the optimum algorithm is $N$_$B$. 最佳的演算法會把第 $i$ 個 round 的 $B$ 個 $q_{i}$ 分配給 $A_{i}$ (for all $i$). 所以所有的 queries 都會被分配到一個廣告商,總收入會是 $N_{B}$
- Worst case for BALANCE
- $N$ advertisers: $A_{1}, A_{2}, … , A_{N}$, each with budget $B > N$
- Queries: $N \cdot B$ queries appear in $N$ rounds of ($B$ queries each)
- Bidding
- Round 1 queries: bidders $A_{1}, A_{2}, … , A_{N}$
- Round 2 queries: bidders $A_{2}, A_{3}, … , A_{N}$
- Round $i$ queries: bidders $A_{i}, A_{i+1}, … , A_{N}$
- Optimum allocation: Allocate round $i$ queries to $A_{i}$ ($B$ queries each)
- Optimum revenue $N \cdot B$
- BALANCE 分析
- So after the first $k = N(1-1/e)$ rounds, we cannot allocate a query to any advertiser
- $\mathrm{Revenue} = B \cdot N(1-1/e)$ (queries of the first $j$ rounds.)
- $\mathrm{Competitive ratio} = 1-1/e$
Generalized Version
- BALANCE 在 bids 是 0 或 1 的時候表現很好,但是實務上 bids 不可能只是 0 或 1,可能是任意值(預算也是任意的),而此時 BALANCE 就無法適當地表現 (fails to weight the sizes of the bids properly)
舉例:
- 有十個 $q$: 最佳演算法會把十個 $q$ 都給 $A_{2}$,然後得到 100 塊收入
- 但是 BALANCE 會把十個都給 $A_{1}$ 因為 $A_{1}$ 的預算比較高,但結果是只有 10 塊收入
- There is no competitive ratio higher than 0 that holds for the Balance Algorithm.
Slides
- from CS246 Mining Massive Data Sets, Stanford University
- http://web.stanford.edu/class/cs246/slides/advertising.pdf