We regard the reviewer’s review as a forecast of whether the paper will be accepted, rather than an input to the decision. This has the advantage that the reviewer is not required to make a value judgment on the paper; a forecast is just a guess as to whether the paper will be accepted. The obvious problem is that if the forecast gets used to help with the decision, it is at risk of becoming a self-fulfilling prophesy.
We model the paper selection process as follows. Any paper has an associated quantity p∈ [0,1] that represents the probability with which it ought to be accepted, and furthermore, this value is revealed to a careful reader. In an ideal world, the PC passes the paper to a reviewer, who reads it and reports this number p to the PC, and the PC proceeds to accept the paper with probability p. (We assume that the PC has access to an unlimited source of randomness, which appears to be a realistic assumption.)
Suppose now that a reviewer knows in advance that his review will be ignored by the program committee, who will instead read the paper and accept it with the correct probability. In that case, if the reviewer reports probability p, he/she should be given a reward of log(p) if the paper is accepted, and log(1-p) if it is rejected. (These quantities are negative, but we do not claim that reviewing papers is rewarding.) These rewards incentivize the reviewer to report the correct probability.
Now, suppose when the PC has received the review (the number p), they then read the paper with some probability r. If they read the paper, they accept with the correct probability, and if they don’t read it, they accept with probability p. The problem is that if r is very small, and the reviewer finds that the paper should be accepted with probability about 1/2, the above rewards tempt him to go to extremes and report (say) 0⋅01 or 0⋅99. Important note: the reward should depend only on the review (the reported p) and the (binary) acceptance decision, since you don’t want to reveal any secret discussions to the reviewer. So the PC doesn't have the option to read the paper with some small probability and punish him if they then find he lied.
Given 2 reviewers, we can exploit their professional rivalry to make them tell the truth, by carefully aggregating their forecasts into a single probability, as follows. A basic requirement for a reward scheme is that if a reviewer has obtained value p from a paper, and the other reviewer is truthfully reporting that value p, then the remaining reviewer should also have the incentive to do likewise. Suppose we use the logarithmic rewards above, and the PC uses the average of the 2 reported probabilities, to decide on the paper. The following problem arises: suppose it's a great paper and p=0⋅999. A reviewer might be tempted to report (say) p=0⋅5, since that way, the PC will use a probability of about 0⋅75 to accept the paper, exposing the other reviewer to a big risk of a large penalty. The assumption here is that a reviewer aims to get a higher reward than the other one (his professional rival); the reward being some sort of credit or esteem rather than money.
Let q be the other reviewer's probability, and we seek a function f(p,q) that should be used by the PC as probability to accept the paper; we have noted that (p+q)/2 is not a good choice of f, in conjunction with the logarithmic rewards.
The reviewer’s utility u(p) is his expected reward minus his opponent’s expected reward:
We now notice that the above must be identically zero, since it should not incentivize the reviewer to change his mind if q is incorrect, but it should not incentivize him not to change his mind if q is correct. Setting the above to zero tells us the function f should be
It just remains for the PC to read the paper with any probability ε>0, and in the event that they read it, accept with the correct probability. If they read the paper, the reviewers are incentivized to tell the truth, and if they don’t, (and use the above f) the reviewers have no incentive to lie, so overall their incentive will indeed be to tell the truth.
(Added 6.6.11: at the iAGT workshop, I heard about 2 papers that relate (so now, an answer to the first comment below). Only valuable experts can be valued by Moshe Babaioff, Liad Blumrosen, Nicolas S. Lambert and Omer Reingold, about contracts that will be accepted by self-proclaimed experts, provided that they really do have expert knowledge (and will be declined by a rational charlatan). And, Tight Bounds for Strategyproof Classification by Reshef Meir,
Shaull Almagor, Assaf Michaely and Jeff Rosenschein, about learning classifiers where the class labels of data have been provided by agents who may try to lie about the labels. The latter paper is closer to the “self-fulfilling prophesy” situation described above.)
(Added 22.8.11: This article on “decision fatigue” suggests another reason why it may be better to ask people to try to predict the outcome than to influence it (assuming you believe it puts less strain on someone to make a prediction than to make a decision. It does sometime stress me out a bit to make an accept/reject recommendation for a paper.))
6 comments:
It sounds like an interesting algorithmic mechanism design problem. Is there any literature related to it?
Special case p=q: f(p,q)=0/0.
I am guessing f(p,p)=p, forall p?
Betfair for academic venues? So, we are going to have a prediction market committee for the conferences? :-)
"Big bucks this year in ACM EC... Many underpriced good papers... Get in while it is still possible..."
Troels, I guess I was assuming that no-one would ever use the values 0 and 1. (so, should have used the open interval (0,1), not [0,1], sorry about that.)
Indeed, f(p,p)=p (the reviewers would be annoyed if they both submit the same p and the PC uses a different one.)
The special case is for the entire interval. Whenever the two reviewers report the same probability of accept, both the numerator and denominator of f evaluate to 0.
Troels, stop making difficulties. If p=q we obviously should use p and there's no need for a complicated formula. If they differ, f is well-defined and produces something between p and q.
Post a Comment