Spring 2016 Massive Data Analysis Lecture Notes
Ch8. Advertising on the Web
Instructor: Jia-Shung Wang



Advertising Opportunities and Online Algorithms

  • post ads directly, displayed ads, search ads
  • Each of these methods, and many others like these, raise enormous privacy issues 各類型的線上廣告衍伸出隱私問題
  • Online Algorithms
    • Classic model of algorithms
      • You get to see the entire input, then compute some function of it
      • In this context, “offline algorithm”
    • Online Algorithms
      • You get to see the input one piece at a time, and need to make irrevocable(經決定後不可改變的) decisions along the way
      • Similar to the data stream model

Online Bipartite Matching

  • Bipartite Matching:一張二分圖上的匹配稱作二分匹配,理所當然所有的匹配邊都是這橫跨這兩群點的邊,就像是連連看一樣。
  • Max. Matching 最大匹配:二分圖G的所有匹配中,edges數量最多的匹配
  • p.7-9動畫:
    • 男生一群、女生一群,p.7 的連線指的是他們彼此間的 Preferences,而目標是要找到一個 match 方式使得男生女生都可以剛剛好連到有 preference 的對象 (Match boys to girls so that maximum number of preferences is satisfied)
    • 先連 2-b 和 4-a,因為 2 和 4 除了 b 和 a 以外沒有其他的 preference 了;接著連 1-c 和 3-d(不能重複連到女生,一個配舞伴的概念,一個女生不會有兩個舞伴)
  • Matching Algorithm
    • Problem:Find a maximum matching for a given bipartite graph
    • 第一步:初始化,只有男生組
    • 第二步開始:在每個round,決定其中一個女生的選擇 (girl’s edges are revealed)
      • At that time, we have to decide to either 兩種選擇
        • Pair the girl with a boy 配對
        • Do not pair the girl with any boy 不配對
  • Greedy algorithm for the online graph matching problem: 把女生配給任何有資格的男生 (If there is none, do not pair girl) 一開始就把一些明顯可以配對的點給配對起來(能連的先連一連),這樣就不必替每一個未匹配點建立交錯樹了。這個手段在圖很龐大的時候可以發揮作用。
  • 最佳配對方法與 greedy algorithm 的比較

    • Competitive Ratio: For input $I$, suppose greedy produces matching $M_{greedy}$ while an optimal matching is $M_{opt}$, Competitive ratio = $\textrm{min_all_possible_inputs } I (\left |M_{greedy} \right | / \left |M_{opt} \right |)$ —> greedy’s worst performance over all possible inputs $I$
    • Consider a case: $M_{greedy} ≠ M_{opt}$ 當 greedy 所算出來的配對不是最佳時

      • Consider the set $G$ of girls matched in $M_{opt}$ but not in $M_{greedy}$ 女生配到的男生不是最佳的
      • Then every boy $B$ adjacent to girls in $G$ is already matched in $M_{greedy}$
      • If there would exist such non-matched (by $M_{greedy}$) boy adjacent to a non-matched girl then greedy would have matched them
      • Since boys $B$ are already matched in $M_{greedy}$ then $\left |M_{greedy} \right | \geq \left | B \right |$
      • Summary so far: $G$ 表示在 $M$ 中存在 $M_{opt}$,但在 $M_{greedy}$ 中不存在的右節點的集合,$B$ 表示 $G$ 中有節點的鄰接邊的左節點的集合 —> $\left |M_{greedy} \right | \geq \left | B \right |$

      • There are at least $\left | G \right |$ such boys ($\left | G \right | \leq \left | B \right |$) otherwise the optimal algorithm couldn’t have matched all girls in $G$. So $\left | G \right | \leq \left | B \right | \leq \left |M_{greedy} \right |$

      • By definition of $G$ also: $\left |M_{opt} \right | \leq \left |M_{greedy} \right | + \left | G \right |$
      • Worst case is when $\left | G \right |$ = $\left | B \right |$ = |$M_{greedy}$|
      • $\left |M_{opt} \right | \leq 2 \left |M_{greedy} \right |$ then $\left |M_{greedy} \right | / \left |M_{opt} \right | \geq \frac{1}{2}$

Web Advertising

  • Google’s Adwords System
    • Show only a limited number of ads with each query 針對每次搜尋,Google 只會顯示一部分廣告
    • Decide which ads to show, as well as the order in which to show them 所以Google必須決定哪些廣告要show以及show的順序
    • Users of the Adwords system specified a budget 而每個廣告商都有一定的預算,這就是 Adwords 要解決的問題
    • The revenue of a selection of ads is the total value of the ads selected 廣告收入來自於廣告的價值
      • 而廣告的價值來自於投標的商品 (the product of the bid) 以及廣告點擊率
      • on-line algorithm 的價值則是用一個月內(因為廣告商的預算通常以一個月為週期)獲取的總收入來衡量
      • 所以對所有演算法要有統一的衡量標準:competitive ratio = the minimum total revenue for that algorithm, on any sequence of search queries / the revenue of the optimum off-line algorithm 該演算法的最小收入 / off-line algorithm 的最大收入
    • 又回到 Google Adwords 要解決的問題
      • A stream of queries arrives at the search engine: $q_{1}, q_{2}, …$ 今天搜尋引擎收到一個 query stream
      • Several advertisers bid on each query 而 stream 裡每個 query 都有多個廣告商競標
      • When query $q_{i}$ arrives, search engine must pick a subset of advertisers whose ads are shown 當 query 被送進來,搜尋引擎必須決定要 show 哪些廣告
      • Goal: Maximize search engine’s revenues 目標當然是為搜尋引擎帶來最大收益
      • Simple solution: Instead of raw bids, use the “expected revenue per click” (i.e., Bid*CTR) 去計算每次的廣告點擊能有多少的預期收入
    • Expected revenue per click
      • Budget: 每個廣告商都有預算的限制,搜尋引擎必須保證廣告的投放不會超過他們的預算
      • CTR 是未知的:每個廣告都有不同的可能性會被點到
        • Clickthrough rate (CTR) is measured historically
        • Exploration vs. exploitation 是一個很困難的問題
          • Exploit: Should we keep showing an ad for which we have good estimates of click-through rate 要持續曝光廣告使得它能擁有較好的 CTR
          • Explore: Shall we show a brand new ad to get a better sense of its click-through rate 還是要放全新的廣告,讓這個廣告的CTR比較好預測
    • Greedy Algorithm
      • 假設在一個很簡單的環境下:每個 query 都只有一個廣告、每個廣告商的預算相同、每個廣告被點擊的機會一樣、每個廣告的價值也一樣
      • For a query pick any advertiser who has bid 1 for that query
      • Worst case: Two advertisers $A$ and $B$, $A$ bids on query $x$, $B$ bids on $x$ and $y$, same budgets of $4
        • Query stream: $x x x x y y y y$
        • Worst case greedy choice: $B B B B$ _ _ _ _—> $\mathrm{Competitive ratio} = \frac{4}{8} = \frac{1}{2}$
        • Optimal: $A A A A B B B B$
      • Greedy algorithm is deterministic – it always resolves draws in the same way
    • 所以有人提出了 BALANCE Algoritm 來處理這些問題

BALANCE Algoritm

  • 概念:For each query, pick the advertiser with the largest unspent budget 對於每次 query,選擇出價最高且剩餘預算最多的廣告商。如果多個廣告商剩餘預相同,則任意挑選一個,並盡可能的平衡所有廣告商的消費。
  • 與 optimal 比較:Two advertisers $A$ and $B$, $A$ bids on query $x$, $B$ bids on $x$ and $y$, same budgets of $4
    • Query stream: $x x x x y y y y$
    • BALANCE choice: $A B A B B B$ _ _
    • Optimal: $A A A A B B B B$
    • In general, for BALANCE on 2 advertisers —> $\mathrm{Competitive ratio} = \frac{6}{8} = \frac{3}{4}$
  • 另一個case (w.l.o.g.): 2 advertisers, $A_{1}$ and $A_{2}$, each with budget $B (\geq 1)$
    • 最佳解會把兩方的預算都用完,而 BALANCE 則是必須用完至少一方的預算
    • 如果用不完,可以分配更多的 queries. Whenever BALANCE makes a mistake (both advertisers bid on the query), advertiser’s unspent budget only decreases. Since optimal exhausts both budgets, one will for sure get exhausted
    • Assume BALANCE exhausts $A_{2}$’s budget, but allocates $x$ queries fewer than the optimal
    • Revenue: $BAL = 2B - x$
  • In the general case, worst competitive ratio of BALANCE is $1–1/e = approx. 0.63$
  • Balance Algorithm with Many Bidders
    • There are $N$ advertisers, $A_{1}, A_{2}, … , A_{N}$. 有 $N$ 個廣告商
    • Each advertiser has a budget $B = N$ 預算都是 $N$
    • There are $N$ queries $q_{1}, q_{2}, … , q_{N}$. 有 $N$ 個queries
    • Advertiser $A_{i}$ bids on queries $q_{1}, q_{2}, … , q_{i}$ and no other queries. 廣告商 $A_{i}$ 投錢在 $q_{1}, q_{2}, … , q_{i}$ 上
    • The query sequence consists of $N$ rounds. The $i$th round consists of $B$ occurrences of query $q_{i}$ and nothing else. 查詢序列有 $N$ 個 round,第 $i$ 個 round 包含了 $B$ 個 $q_{i}$
    • The optimum off-line algorithm assigns the $B$ queries _q__$i$ in the $i$th round to $A_{i}$ for all $i$. Thus, all queries are assigned to a bidder, and the total revenue of the optimum algorithm is $N$_$B$. 最佳的演算法會把第 $i$ 個 round 的 $B$ 個 $q_{i}$ 分配給 $A_{i}$ (for all $i$). 所以所有的 queries 都會被分配到一個廣告商,總收入會是 $N_{B}$
  • Worst case for BALANCE
    • $N$ advertisers: $A_{1}, A_{2}, … , A_{N}$, each with budget $B > N$
    • Queries: $N \cdot B$ queries appear in $N$ rounds of ($B$ queries each)
    • Bidding
      • Round 1 queries: bidders $A_{1}, A_{2}, … , A_{N}$
      • Round 2 queries: bidders $A_{2}, A_{3}, … , A_{N}$
      • Round $i$ queries: bidders $A_{i}, A_{i+1}, … , A_{N}$
    • Optimum allocation: Allocate round $i$ queries to $A_{i}$ ($B$ queries each)
    • Optimum revenue $N \cdot B$
  • BALANCE 分析
    • So after the first $k = N(1-1/e)$ rounds, we cannot allocate a query to any advertiser
    • $\mathrm{Revenue} = B \cdot N(1-1/e)$ (queries of the first $j$ rounds.)
    • $\mathrm{Competitive ratio} = 1-1/e$
  • Generalized Version

    • BALANCE 在 bids 是 0 或 1 的時候表現很好,但是實務上 bids 不可能只是 0 或 1,可能是任意值(預算也是任意的),而此時 BALANCE 就無法適當地表現 (fails to weight the sizes of the bids properly)
    • 舉例:

      • 有十個 $q$: 最佳演算法會把十個 $q$ 都給 $A_{2}$,然後得到 100 塊收入
      • 但是 BALANCE 會把十個都給 $A_{1}$ 因為 $A_{1}$ 的預算比較高,但結果是只有 10 塊收入
      • There is no competitive ratio higher than 0 that holds for the Balance Algorithm.

Slides