No. of Pages 15, Model 5G ARTICLE IN PRESS. Contents lists available at ScienceDirect. Electronic Commerce Research and Applications

(1)

UNCORRECTED

PROOF

1

2

Forecasting market prices in a supply chain game

q

3

Christopher Kiekintveld

a,*

, Jason Miller

b

, Patrick R. Jordan

a

, Lee F. Callender

a

, Michael P. Wellman

a

4 a

Computer Science and Engineering , University of Michigan, 2260 Hayward Avenue, Ann Arbor, MI 48109-2121, USA

5 b

Department of Mathematics, Stanford University, Building 380, Stanford, CA 94305, USA

6 8

a r t i c l e

i n f o

9 Article history:

10 Received 17 October 2007

11 Received in revised form 4 November 2008 12 Accepted 5 November 2008 13 Available online xxxx 14 Keywords: 15 Forecasting 16 Markets 17 Price prediction 18 Trading agent competition 19 Supply chain management 20 Machine learning 21

2 2

a b s t r a c t

23 Predicting the uncertain and dynamic future of market conditions on the supply chain, as reﬂected in

24 prices, is an essential component of effective operational decision-making. We present and evaluate

25 methods used by our agent,Deep Maize, to forecast market prices in the trading agent competition supply

26 chain management game (TAC/SCM). We employ a variety of machine learning and representational

27 techniques to exploit as many types of information as possible, integrating well-known methods in novel

28 ways. We evaluate these techniques through controlled experiments as well as performance in both the

29 main TAC/SCM tournament and supplementary Prediction Challenge. Our prediction methods

demon-30 strate strong performance in controlled experiments and achieved the best overall score in the Prediction

31 Challenge.

33 34

35 1. Introduction

36 The quality of economic decisions made now depend on market

37 conditions that will prevail in the future. This is particularly true of

38 supply chain environments, which evolve dynamically based on

39 complex interactions among multiple players across several tiers.

40 We present and evaluate methods of forecasting market conditions

41 developed to predict prices in the trading agent competition

sup-42 ply chain management game (TAC/SCM). In TAC/SCM, automated

43 trading agents compete to maximize their proﬁts in a simulated

44 supply chain environment. They face many challenges, including

45 strategic interactions and uncertainty about market conditions.

46 Though complex, this domain offers a relatively controlled

envi-47 ronment amenable to extensive experimentation. TAC also attracts

48 a diverse pool of participants, which facilitates benchmarking and

49 evaluation. This is particularly important for prediction tasks,

50 where the quality of predictions depends in part on the actions

51 of opponents. Research competitions such as TAC offer the

oppor-52 tunity to test predictions in environments where the opposing

53 agents are designed and selected by others, mitigating an

impor-54 tant source of potential bias relative to using opponents of one’s

55 own design (Wellman et al., 2007).

56 Our agent for the SCM game,Deep Maize, relies on price

predic-57 tions in both the upstream and downstream markets to make

pur-58 chasing, production, and sales decisions.Fig. 1shows an overview

59 of the daily decision process. Predictions about market conditions

60 feed into an approximate optimization procedure that uses values

61 to decompose the decision problem. The architecture is described

62 in detail elsewhere (Kiekintveld et al., 2006). Since our predictions

63 are integrated into a task-situated agent, we can evaluate the

ben-64 eﬁts of improved predictions on overall performance in addition to

65 prediction error metrics.

66 We apply machine learning to forecast prices in two different

67 markets with distinct negotiation mechanisms and information–

68 revelation rules. Like many real markets, the available information

69 on market conditions comes from a variety of heterogeneous

70 sources. Integrating different learning techniques and

representa-71 tions to exploit all of the available information is a central theme

72 of our approach. We ﬁnd that representing the current market

con-73 ditions well is particularly important for effective learning.

74 We begin with a discussion of related work and an overview of

75 the TAC/SCM game, highlighting the beneﬁts and drawbacks of the

76 different forms of information available for making predictions.

77 The ﬁrst set of methods we describe are used for predicting prices

78 in the downstream consumer market. Central to all of these

doi:10.1016/j.elerap.2008.11.005 q

This is a revised and substantially extended version of a paper presented at the Sixth international joint conference on autonomous agents and multiagent systems (Kiekintveld et al., 2007).

* Corresponding author. Tel.: +1 734 818 0259.

E-mail addresses:[email protected](C. Kiekintveld),[email protected]. edu(J. Miller),[email protected] (P.R. Jordan),[email protected] (L.F. Call-ender),[email protected](M.P. Wellman).

URLs: http://teamcore.usc.edu/kiekintveld (C. Kiekintveld), http://www.eecs. umich.edu/~prjordan(P.R. Jordan),http://ai.eecs.umich.edu/people/wellman(M.P. Wellman).

Contents lists available atScienceDirect

Electronic Commerce Research and Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e c r a

(2)

UNCORRECTED

PROOF

80 game data to induce the relationship between market

indica-81 tors and prices. We test variations that employ two different

82 representations for price distributions: an absolute representation,

83 and a relative representation that exploits local price stability.

84 Finally, we develop an online learning method that procedure that

85 combines and shifts predictions using additional observations

86 within a game instance. Our evaluation of these methods includes

87 both prediction error and supply chain performance. We show

sub-88 stantial improvements over a baseline predictor, with online

adap-89 tation exhibiting especially strong performance improvements.

90 We proceed to discuss methods for making predictions in the

91 market for component supplies. There are two elements to our

ap-92 proach in this market. The ﬁrst is a linear interpolation model that

93 uses recent observations to make relatively accurate predictions

94 about the current spot market. The second is a learning model that

95 regresses prices against market indicators. This learning model can

96 be used to make absolute price predictions, or to predict residuals

97 for the linear interpolation model. We show that combining the

98 linear interpolation model for current market prices with residual

99 predictions improves overall performance, particularly for

long-100 term predictions.

101 We close with discussion of the performance of both prediction

102 algorithms based on the results of TAC/SCM tournaments and the

103 inaugural TAC/SCM Prediction Challenge. Deep Maizehas been a

104 perennial ﬁnalist in the primary competition, and placed ﬁrst in

105 the Prediction Challenge. Our forecasting methods demonstrate

106 strong performance on prediction error measures and contribute

107 to strong overall performance in the context of decision-making.

108 2. Related work

109 Forecasting is studied across many disciplines, and many

ap-110 proaches have been developed for making predictions based on

111 historical data. We do not attempt to survey all of the various

112 methods here, but refer the reader to one of many introductory

113 texts that describe exponential smoothing, GARCH, ARIMA,

spec-114 tral analysis, neural networks, Kalman ﬁlters, and other popular

115 methods (Brockwell and Davis, 1996; Chatﬁeld, 2001; Hastie

116 et al., 2001).Pindyck and Rubinfeld (1998)is a good introduction

117 to many forecasting methods used in the context of economic

pre-118 dictions and modeling. The proliferation of techniques is in part an

119 indication that the best method for any particular problem may

de-120 pend on many factors, including the type and quantity of

informa-121 tion available and the properties of the domain.

122 Many applications focus on comparing methods to identify the

123 most effective approach for a particular problem (for example,

124 see the application of forecasting to retail sales by Alon et al.

125 or the application to predict electricity prices by Nogales et al.

126 (2002)). Our approach for price prediction focuses on combining

127 predictions made using difference sources of information and

128 methods. Various methods have been developed for combining

129 predictions, often motivated by the problem of combining the

130

opinions of experts (Bates and Granger, 1969; Clemen and

131 Winkler, 1999). Recently, predictions markets have become a

132

popular method for aggregating information (Hanson, 2003).

133 Our approach shares some features with these markets, including

134 the use of logarithmic scoring rules to evaluate the quality of

135 predictions.

136 Price forecasting is a very important element of

decision-mak-137 ing in a wide variety of market settings, including ﬁnancial markets

138 (

Kaufman, 1998) and energy markets (Nogales et al., 2002). The rise 139 of internet auctions has led to an increasing interest in

computa-140 tional methods for price forecasting in auction environments

141 (e.g.,Ghani, 2005).Brooks et al. (2002)discusses many issues

re-142 lated to learning price models in the context of online markets.

143 Price prediction methods have also been used as the basis for

suc-144 cessful bidding strategies in many speciﬁc types of auctions,

145 including simultaneous ascending auctions (Osepayshvili et al.,

146 2005). It is not surprising then that price prediction has played

147 an important role in the design of agents for the trading agent

com-148

petition since its inception. For the TAC travel game (Wellman

149 et al., 2007), a crucial factor in agent performance was the ability

150 to predict the closing price of auctions for hotel rooms. Participants

151 explored many approaches for making these predictions, spanning

152 statistical analysis, machine learning, and economic modeling

153 (Cheng et al., 2005; Stone et al., 2003; Wellman et al., 2004).

154 Post-tournament analysis revealed that the common thread among

155 the most effective methods was not the particular technique

ap-156 plied, but rather the information taken into account by the

predic-157

tor (Wellman et al., 2004); speciﬁcally, the best predictors

158 accounted for the initial auction prices.

159 Most agents for the TAC/SCM game also employ some form of

160 price prediction, but vary signiﬁcantly both in what quantities

161 are predicted and the methods used for making predictions. Many

162 agents only estimate prices for the current simulation day, and do

163 not attempt to forecast changes in prices over time (Benisch et al.,

164 2004, 2006; Burke et al., 2006; He et al., 2006). One of the most

165 successful agents in the tournament,TacTex, has applied a variety

166 of sophisticated approaches for price prediction. The speciﬁc

meth-167 ods have evolved in successive tournaments, but generally rely on

168 a combination of statistical machine learning applied to historical

169

data and online adaptation (Pardoe and Stone, 2005, 2006,

170 2007a,b). The most recent version ofTacTexspeciﬁcally uses

ma-171 chine learning to predict both current and future prices. The

Minne-172

TAC team has also devoted signiﬁcant effort to developing

173 prediction methods for the TAC/SCM domain (Ketter, 2007; Ketter

174 et al., 2007). Their approach is somewhat unique in that it focus on

175 predicting changes in the economic regimes, which reﬂect

over-176 supply, under-supply, or other characteristic market conditions.

177 Our use of market state is similar in spirit to the idea of an

eco-178 nomic regime, but we do not distinguish among a discrete set of

179 possible regimes. The ﬁrst version ofDeep Maizeused an analytic

180 solution based on competitive equilibrium to predict market prices

181 (Kiekintveld et al., 2004). This approach became infeasible when

182 the game rules were modiﬁed due to the greater complexity of

183 the supplier pricing model. Another unique approach was used

184 byRedAgent, the winner of the ﬁrst TAC/SCM competition (Keller

185 et al., 2004). This agent used an internal market to coordinate

deci-186 sions and make predictions; to our knowledge no current agents

187 use this approach.

State State Estimation Long-term Production Schedule Supply Supply Market Market Prediction Prediction Customer Customer Market Market Prediction Prediction

Market Predictions

Component Values PC Values Customer Sales Supply Purchasing Factory Schedule

High-level

Decisions

Low-level

Decisions

Fig. 1.Overview ofDeep Maize’s decision process on each TAC/SCM day.

(3)

UNCORRECTED

PROOF

188 3. The TAC supply chain game

189 Agents in TAC/SCM play the role of personal computer (PC)

190 manufacturers (Collins et al., 2007; Eriksson et al., 2006).1_{The six}

191 automated agents compete to maximize their proﬁts on a simulated

192 supply chain, shown inFig. 2. Agents assemble 16 different types of

193 PCs for sale, deﬁned by compatible combinations of CPU,

mother-194 board, hard disk, and memory components. Each day agents

negoti-195 ate with suppliers to purchase components, negotiate with

196 customers to sell ﬁnished products, and create manufacturing and

197 shipping schedules for the next day. Each agent has an identical

fac-198 tory with limited capacity for production. Each game has 220

simu-199 lated days, each of which takes 15 s of real time. 200 3.1. The personal computer market

201 The 16 PC types are separated into three market segments

rep-202 resenting high-, mid-, and low-grade PCs. The average demand

le-203 vel for each segment varies separately according to a random walk

204 with a stochastic trend. Each day, the customer demand process

205 generates a set of RFQs (request-for-quotes) representing the

206 aggregate demand. Each customer request is valid for a single

207 day and speciﬁes a PC type, due date, reserve price, and penalty

208 for late delivery. The negotiations for these requests use a

simulta-209 neous sealed-bid auction mechanism. Agents submit price offers

210 for a subset of the RFQs. The lowest bidder for each request is

211 awarded the order, and must deliver the product by the due date

212 or pay the penalty speciﬁed in the contract. Due to the

conﬁgura-213 tion of the market, there are strong interactions among auctions

214 both within and across days.

215 3.2. The component market

216 To procure components to build PCs, agents must negotiate

217 with eight suppliers whose behavior is deﬁned in the game

speci-218 ﬁcation. Each supplier sells two distinct grades of a single

compo-219 nent type. CPU components have a unique supplier, whereas all

220 other types are provided by two suppliers. Suppliers have a limited

221 daily production capacity for each component they sell. These

222 capacities vary according to a mean-reverting random walk, and

223 are not directly observable by the manufacturers.

224 Manufacturers negotiate with suppliers via an RFQ mechanism.

225 Each day, a manufacturer may send up to ﬁve RFQs per component

226 to each supplier specifying type, due date, quantity, and reserve

227 price. Any future day may be speciﬁed as the due date. The

suppli-228 ers respond the following day with offers specifying a proposed

229 quantity, price, and delivery date.2_{Suppliers may opt to accept a}

230 subset of the offers by responding with orders. Suppliers determine

231 offer prices based on the ratio of their committed and available

pro-232 duction capacity. This pricing function accounts for inventory,

previ-233 ous commitments, future capacity, and the demand represented in

234 the current set of customer RFQs. In general, component prices rise

235 as demand rises and suppliers become more constrained. In some

236 cases, the supplier may be capacity constrained and unable to offer

237 components at any price. Prices depend on both the requested due

238 date and the day the request is made. Price predictions must be

239 made for this entire space of possibilities to support important

deci-240 sions about the optimal time to make component purchases.

241

3.3. Observable market data

242 Agents may utilize several complementary forms of evidence to

243 make predictions about market prices. Each provides unique

in-244 sights into current activity and how the market may evolve. The

245 ﬁrst category of evidence comprises observations of the markets

246 during a game instance. This information is the most timely and

247 speciﬁc to the current situation, but is of relatively low ﬁdelity.

248 More detailed observations of market activity are available in game

249 logs after each game instance has completed.3Logs from previous

250 rounds are publicly available, and additional data may be generated

251 in private simulations.

252 During a game, agents receive daily observations from

partici-253 pating in the PC and component markets. In the PC market, agents

254 observe the set of requests and whether or not each bid resulted in

255 an order. They also receive a daily report for each PC type listing

256 the maximum and minimum selling price from the previous day.

257 The details of individual market transactions and opponents’ bids

258 are not revealed. In the component market, agents receive price

259 quotes for each request they send, some of which may be price

260 probes. Neither the supplier’s state (inventory, capacity, and

com-261 mitments) nor other agents’ requests are revealed. Every 20

simu-262 lation days, manufacturers receive a report containing aggregate

263 market data for the preceding 20-day period. This data includes

264 the mean selling price and quantity sold for each PC type, and

265 the mean selling price, quantity ordered, and quantity shipped

266 for component type. The report also includes the mean production

267 capacity for each supply line. The importance of this evidence is

268 that it is speciﬁc to the current market conditions and current

269 set of manufacturing agents. However, these observations say

rel-270 atively little about how market conditions are likely to evolve,

271 especially if similar conditions have not been observed earlier in

272 the game.

273 The historical logs reveal the entire transaction history in each

274 market (including all bids, offers, and orders), as well as supplier

275 capacities and other state attributes hidden during a game

in-276 stance. This precise information is potentially very useful for

pre-277 dicting the evolution of market prices over time. However,

278 conditions in the current game are unlikely to be identical to

con-279 ditions in historical games. A particularly difﬁcult issue is that the

280 pool of competing agents is constantly changing as teams update

281 their agents, so historical data may not be representative of the

282 current strategic environment. Selecting relevant historical data

283 is an important challenge. Agent 1 Agent 2 Agent 4 Agent 3 Agent 5 Agent 6 Pintel IMD Basus Macrostar Mec Queenmax Watergate Mintor supplier offer component RFQ offer acceptance

Customer

bid PCRFQ CPU Motherboard Memory HardDisk

Fig. 2.TAC/SCM supply chain conﬁguration.

1 _{We present an overview of the game rules here. The full rule speciﬁcations for} each competition, tournament results, and general TAC information are available at http://www.sics.se/tac.

2

Suppliers may offer either a partial quantity or a later delivery if they are unable to meet the initial request due to capacity constraints.

3_{A rule change in 2007 explicitly disallows agents from collecting and utilizing} these logs during a tournament round, but logs from previous rounds or simulations may still be used.

(4)

UNCORRECTED

PROOF

284 3.4. The Prediction Challenge

285 The 2007 TAC event introduced two challenge competitions to

286 complement the original TAC/SCM scenario. In the TAC/SCM

Pre-287 diction Challenge (Pardoe, 2007), agents compete to make the most

288 accurate price predictions in four categories: short- and long-term,

289 for both component and PC markets (seeTable 1). The challenge

290 isolates the prediction component of the game by having

partici-291 pants make predictions on behalf of the same designated agent,

292 PAgent. Simulations including aPAgent are run before the

chal-293 lenge to generate log ﬁles. During the challenge, the log ﬁles are

294 used to simulate the experience of playing the game for the

predic-295 tors by providing the exact sequence of messages observed by

PA-296 gent during the game. The predictions made based on this

297 information by competitors in the challenge do not affect the

298 behavior of thePAgent. This simulation framework allows all

pre-299 dictors to make an identical set of predictions, and also allows

pre-300 dictions to be made many times for the same set of game instances.

301 The predictions are scored against actual prices using the root

302 mean squared (RMS) error, normalizing by component and PC base

303 prices.

304 Each round of the challenge used three sets of 16 game

in-305 stances, with different agents competing withPAgenton the

sup-306 ply chain. The agents comprising the competitive environment

307 were selected from those in the public agent repository.4 _Since

308 tests can be run independently and repeated, the software

frame-309 work for the challenge also provides a useful toolkit for conducting 310 one’s own prediction tests. We took advantage of this in developing

311 and testing new prediction methods for the component market in

312 preparation for the 2007 TAC/SCM tournament and Prediction

313 Challenge.

314 4. Methods for PC market predictions

315 Each day,Deep Maizeestimates the underlying demand state in

316 the customer market and creates predictions of the price curves it

317 faces.5_{For the current simulation day, the set of requests is known.} 318 Estimating the price curve in this case is equivalent to estimating

319 the probability that a given bid on an RFQ will result in an order,

320 denoted PrðwinjbidÞ. To predict the price curve for future simula-321 tion days, it is also necessary to predict the set of request that will

322 be generated. We combine predictions about the set of requests

323 and the distribution of selling prices for each request (i.e.,

324 PrðwinjbidÞ) to estimate the effective demand curve for each future

325 day. The effective demand curve represents the expected marginal

326 selling prices for each additional PC sold. The agent uses these

fore-327 cast demand curves to make customer bidding decisions and create

328 a projected manufacturing schedule. We begin by discussion of our

329 methods for estimating a key element of the game state, and

pro-330 ceed with a discussion of our methods for predicting customer

331 market prices.

332

4.1. Customer demand projection

333 The customer demand processes that determine the number of

334 RFQs to generate exert a great inﬂuence on market prices in the

335 SCM game. There are three stochastic demand processes,

corre-336 sponding to the respective market segments. Each process is

cap-337

tured by two state variables: a meanQ and trend

s

. The number

338

of RFQs generated in each market segment on daydis determined

339 by a draw from a poisson distribution with meanQ. The initial

de-340 mand levels for each segment are drawn uniformly from known

341 intervals. The mean demand evolves according to the daily update

342 rule

343

Qdþ1¼minðQmax;maxðQmin;

s

dQdÞÞ: ð1Þ 345345

346

Both the minimum and maximum values ofQare given for each

347 market segment in the speciﬁcation. The trend

s

is initially neutral,

348

s

0¼1, and is updated by the stochastic perturbation

349

U½0:01;0:01:

350

s

dþ1¼maxð0:95;minð1=0:95;

s

dþ

ÞÞ

: ð2Þ 352352

353

When the demandQreaches a maximum or minimum value, the

354 trend resets to the neutral level.

355 The stateQand

s

are hidden from the manufacturers, but can be

356 estimated based on the observed demand. We estimate the

under-357 lying demand process using a Bayesian model, illustrated inFig. 3.6

358 Note that the updated demand,Qdþ1, is a deterministic function of

359

Qdand

s

d, as speciﬁed in Eq.(1). This deterministic relationship

sim-360 pliﬁes the update of our beliefs about the demand state given a new

361 observation. Let PrðQd;

s

dÞdenote the probability distribution over

362 demand states given our observations through dayd. We can

incor-363 porate a new observationQbdþ1as follows:

364

PrðQd;

s

djQbdþ1Þ /PrðQbdþ1jQd;

s

dÞPrðQd;

s

dÞ: ð3Þ 366366

367 Eq.(3)exploits the fact that the observation atdþ1 is conditionally

368 independent of prior observations given the demand stateðQd;

s

dÞ.

369 We can elaborate the ﬁrst term of the RHS using the demand update

370 (1):

PrðQbdþ1jQd;

s

dÞ ¼PoissonðminðQmax;maxðQmin;

s

dQdÞÞÞ: 372372

373 After calculating the RHS of Eq.(3)for all possible demand states,

374 we normalize to recover the updated probability distribution. We

375 can then project forward our beliefs overðQd;

s

dÞusing the demand

376 and trend update Eqs.(1) and (2), to obtain a probability

distribu-377 tion overðQdþ1;

s

dþ1Þ.

378 A sequence of updates by Eq.(3)represents an exact posterior

379 distribution over demand state given a series of observations. To

380 maintain a ﬁnite encoding of this joint distribution, we consider

381 only integer values for demand and divide the trend range intoT

382 discrete values, evenly spaced (Deep Maize usesT¼9 for

tourna-383 ment play). Given a distribution over demand states, we can

pro-384 ject this distribution forward to estimate future demand.

385

4.2. Learning prices from historical data

386 We use ak-nearest-neighbors (k-NN) algorithm to induce the

387 relationship between indicators of market conditions (features)

388 and likely distributions of current and future prices. The idea is

Table 1

Four categories of predictions made in the Prediction Challenge.

Prediction Description

Current PC prices Lowest price offered by manufacturers for each current RFQ

Future PC prices Median lowest price offered for each PC type in 20 days Current component

prices

Offer price for each component RFQ sent on current day Future component

prices

Offer price for each component RFQ sent in 20 days

4 _{The TAC agent repository}

http://www.sics.se/tac/showagents.phpstores binary implementations of many agents from past TAC tournaments, released by their developers.

5

The methods described in this section were developed for the 2005 TAC/SCM tournament, and used with minor updates and new data sets in the2006 competition. The version used in the 2007 tournament and Prediction Challenge were identical to the 2006 version (including the data sets).

6

We have released source code for this estimation method. It is available in the TAC agent repository.

(5)

UNCORRECTED

PROOF

389 simple: given a set of games, ﬁnd situations that most closely

390 resemble the current situation and predict based on the observed

391 outcomes. When prices are not changing much from day to day,

392 learning prices from historical data has little additional beneﬁt

393 over simply estimating the prices in the current game instance.

394 However, when prices are changing, historical data may help the

395 agent to predict how the prices will move based on features of

396 the current situation.

397 For each PC type and day, we deﬁne the state using a vector of

398 featuresF¼ ðf0;. . .;fnÞ.7The difference between two statesf1and

399 f2_{is deﬁned by weighted Euclidean distance:}

d¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xn i¼0 wi ðfi1fi2Þ 2 v u u t ; 401 401

402 wherewiis a weight associated with each feature. Additionally, we

403 bound the maximum distance for any individual feature, sod¼ 1if

404 f1

i fi2 is greater than the bound. To predict the prices during a

405 game, the agent ﬁrst computes the current state features. It then

406 ﬁnds thekstate vectors in a database of historical game instances

407 that are most similar to the current state, usingdas the metric.

408 To ensure diversity in the data set, we select at most one of thek

409 vectors from each historical game instance. Associated with each

410 state in the historical database are the observed price distributions

411 in that game instance. We average these for thekobservations to

412 predict prices in the current game. Preliminary experiments varying

413 the value ofkshowed the best results for values between 15 and 20;

414 we use a value of 18 for all competitions and experimental results

415 presented below.

416 The potential features we experimented with included the

cur-417 rent simulation day, statistics about current prices, and estimates

418 of underlying supply, demand, and inventory conditions. Many of

419 these were motivated by analysis showing that related measures

420 were strongly correlated with customer market prices in previous

421 competitions (Kiekintveld et al., 2005). We tested a range of

possi-422 ble features, weights, and bounds in preliminary experiments, and

423 selected those which had the lowest prediction error after

exten-424 sive but ad hoc manual exploration. All of the following results

425 use equal weights and bounds. For predicting the current days’

426 prices, we selected the following features: average high and

aver-427 age low selling price from the previous ﬁve days, estimated mean

428 customer demand (Q), the simulation day, and the linear trend of

429 selling prices from the previous ten days. For predicting future

430 prices, we used the same set of features with the addition of the

431 estimated customer demand trend (

s

).

432 4.2.1. Data sets

433 We employ two distinct data sets for making predictions, one

434 based on tournament games and one based on internal

simula-435 tions. The tournament data set contained games from earlier

436 rounds in the tournament. As new rounds were played, we added

437 the additional games to the training set. During the 2005

tourna-438 ment we automatically added new games within a round to the

439 data set; this was disallowed by the 2007 rule change mentioned

440 above.

441 The second data set consisted of games run using six copies

442 ofDeep Maize. We played a sequence of roughly 100 games,

ini-443 tially using just the tournament data set to make predictions.

444 After each game, we replaced the oldest log in the data set with

445 the new one, so eventually the data set consisted entirely of

446

games played by the six Deep Maizeagents. The ﬁnal 45 games

447 of this sequence were selected as the ‘self-play’ data set. Ideally,

448 as the data sets contain more data from this speciﬁc

environ-449 ment, the predictions will come to reﬂect the true market prices

450 and agents’ behavior will improve. In the limit, we can imagine

451 agents converging to a competitive equilibrium, which may be

452 closer to the environment of the ﬁnal round than games from

453 less competitive rounds early in the tournament. As it happened,

454 our analysis indicates that predictions based on this data set

455 were not very accurate.

456

4.3. Representing and estimating price distributions

457 We experiment with two different representations for the

dis-458 tribution of PC prices. One uses absolute price levels, and the other

459 is relative to recent price observations. The advantage of absolute

460 prices is that they translate readily across games and can easily

461 represent changes in the overall level of prices. The relative

distri-462 bution is centered on moving averages of the high and low selling

463 price reports for the previous ﬁve days. Between these two

aver-464 ages we use 60 uniformly distributed points, above the average

465 of the highs and below the average of the lows we use 20 points

466 distributed uniformly. The strength of this representation is in its

467 ability to exploit the fact that prices tend to be stable over short

468 periods of time in the game (a few simulation days). It effectively

469 shifts observations from historical games into the relevant range

470 of prices for the current game, which is often small. When

extract-471 ing data from historical games we adjust for the number of

oppo-472 nent agents by randomly selecting one agent and removing that

473 agent’s bid, if it exists. This accounts for the effects of the agent’s

474 own bid on the price; it is effectively learning the price that would

475 be expected based on only the competitors’ bids. We also smooth

476 the observed distributions by considering all requests in the range

477

d10;. . .;dþ10.

478 To compute the estimated price distribution we take the

aver-479 age of the distributions stored for thek-nearest-neighbors, as

de-480

scribed above. For each price point j, we take the predicted

481 probabilities for this point in each of the historical instances and

482 average them. For relative predictions, these points represent

pric-483 ing statistics relative to current prices; for absolute, they represent

484 absolute prices. After averaging, we enforce monotonicity in the

485 distribution. Starting from the low end of the distribution, we take

486 the maximum of the estimated probability for the price point and

487 the estimated probability of the immediately preceding price

488 point.

489 To forecast pricesidays into the future, we project based on

490

what happened i days into the future in each historical game.

491 The agent requires predictions for all future days, so using this

492 method requires that the historical instances have at least as many

493 days remaining in the game as the current game does. We enforce

494 the tighter constraint that the historical predictions must start on

495 exactly the same simulation day. For relative price predictions, we

496 also account for changes in the overall price levels over time by

497 modifying the pricing statistics. For each price statistic, we

com-498 pute the average change over the interveningidays in the

histor-499 ical data. Pricing statistics for the future days are modiﬁed by

500 adding the average change. The absolute distributions do not need

Q

_d

τ

d

Q

_d+1

#Q

τ

d+1

Fig. 3.Bayesian network model of demand evolution.

7

When initially computing the values of features, we normalize the values for each feature by the standard deviation of the feature in the training set.

(6)

UNCORRECTED

PROOF

501 to account for this, since the price points are based on absolute

502 values.

503 4.4. Online learning

504 We employ an online learning procedure that optimizes

predic-505 tions according to a logarithmic scoring rule.8 _{First, we produce}

506 baseline predictionsPtournamentandPself-playbased on the two data sets 507 described in Section4.2.1. We then construct adjusted predictions of

508 the form

509

aðbPtournamentþ ð1bÞ Pself-playÞ þc; ð4Þ

511 511

512 wherebweights the two prediction sources based on their accuracy

513 in the current environment, and the coefﬁcientsaandcdeﬁne an

514 afﬁne transformation allowing us to correct for systematic errors.

515 We determine the values ofa,bandcusing a brute-force search

516 to ﬁnd the set of values (from a discretized range) that minimizes

517 the scoring rule. The possible transformations are scored using the

518 recent history of bids won and lost, a valuable but otherwise

un-519 used source of information. Each possible set of values is scored

520 on the measure Pbidslogð

a

bidÞ where

a

bid is the magnitude of

521 the difference between the predicted probability of winning and

522 the result of the bid, which is 1 if the bid resulted in a win and 0

523 otherwise. To prevent very large changes in the prediction we

524 loosely bound the possible values ofa,bandc. This optimization

525 is performed using the actual bids submitted by the agent in recent

526 days, and modiﬁes the distribution only within the range of prices

527 where bids are made. This typically leaves the extremes of the

dis-528 tribution (very high or low probability bids) unchanged, since

529 there is little bidding activity at these extremes.

530 5. Evaluation of PC market predictions

531 We evaluate our forecasting techniques using both prediction

532 error and supply chain performance. We evaluate several

varia-533 tions of the predictor that employ subsets of the functionality

de-534 scribed in Section 4 to assess the contributions of each. As a

535 baseline we use a heuristic predictor described by Pardoe and

536 Stone (2005). The baseline predictor uses only information

con-537 tained in the current price level, and relies completely on local

538 price stability for predictive power. This is similar to what our

pre-539 dictor might look like using the relative representation without

540 any historical data or online transformations. This predictor

pro-541 duces surprisingly accurate predictions despite its simplicity. The

542 reason is that prices in the TAC SCM game tend to be quite stable

543 for long periods, especially if the start and end game conditions

544 are excluded.9_{One of the primary reasons we use this as a baseline} 545 is that we believe it is reﬂective of the default heuristic methods 546 used by many of the agents to make decisions (especially in the early 547 years of the competition). In particular, it makes use of only local

548 information about prices in the current game, and does not attempt

549 to project changes in price levels over time.

550 The baseline method predicts the distribution of PC prices using

551 a weighted average of uniform densities between the low and high

552 prices from the previous ﬁve days.10_{We project this into the future}

553 by assuming that the distribution of prices does not change and

554 adjusting for the predicted change in the number of RFQs generated,

555 as given by the model of customer demand described in Section4.1.

556 We present results for the baseline and four variations of ourk

-557 NN-based predictor. These variations use either the relative or

558 absolute representation, and either apply online afﬁne

transforma-559 tions or not. All results use both the tournament and self-play data

560 sets, which contain 95 and 45 game instances respectively.

Regard-561 less of whether overall afﬁne transformations are applied we use

562 the online scoring method to weight the predictions from these

563 two sources. At the end of this section we brieﬂy discuss the effects

564 of varying the data sets. All error results are derived from the 14

565

clean ﬁnals games from the 2005 tournament.11

566

5.1. Prediction error

567 We discuss two different types of prediction error. The ﬁrst is

568 for predictions made about the probability of winning the current

569 day’s auctions, and the second is for predictions about the effective

570 demand curve on future days.

571 As we discuss the prediction error results, it is important to

572 keep in mind that there is a substantial amount of uncertainty

573 inherent in the domain model; we do not expect any prediction

574 method to achieve anywhere near perfect predictions. To

demon-575 strate this and provide some calibration, we consider the process

576 that generates the overall level of customer demand in the SCM

577 game. The level of customer demand (number of requests

gener-578 ated each day) is one of the most important factors inﬂuencing

579 market prices. Recall that we use a Bayesian model to track the

580 mean customer demand and trend; this model is the theoretically

581 optimal predictor of the process. We ran tests comparing this

pre-582 dictor to a naive predictor that predicts that the mean demand on

583 all future days will remain exactly the demand observed today. The

584 optimal Bayesian model improved on the naive model, but the

dif-585 ference was relatively small. The naive model had an RMS error of

586 approximately 35, and the Bayesian model achieved an RMS error

587 of approximately 30 – a reduction on the order of 15%. Whereas

588 this may seem a relatively small improvement in prediction

qual-589 ity, these small improvements can translate into substantial

differ-590 ences in overall agent performance.

591

5.1.1. Current day prediction error

592 Predicting the conditional distribution PrðwinjbidÞfor the

cur-593 rent day’s RFQ set is an important special case because these

esti-594 mates are used for making bidding decisions. It is also unique

595 because it is the only day for which we have the actual set of

cus-596 tomer RFQs available; for future days, there is substantial

addi-597 tional uncertainty since the quantities, PC types, due dates,

598 penalties, and reserve prices are all unknown.

599 We measure RMS error over a range of possible prices (i.e.,

po-600 tential bid values). The error may vary over the distribution, and

601 this allows us to present a more detailed picture of the errors each

602 predictor makes. The range of values we consider is relative to the

603 recent high and low selling prices in the game. We distributed 125

604 price points over the range from 10% below the average low selling

605 price from the previous ﬁve simulation days, to 10% above the

606 average of the high selling price over the same window. Bids

to-607 wards the high and low ends of this distribution are very rare, since

608 they are either very high or low relative to the market.Deep Maize’s

609 bids are centered around price point 62, with a large majority

fall-610 ing between price points 50 and 80.

8

The logarithmic scoring rule is strictly proper, in that the optimal score is obtained only for the true probability. Scoring according to rules that are not proper (including RMS error) can lead to biased estimates.

9

A linear regression on the average selling prices from one day to the next results in anR2_{value on the order of 0.99, so current prices are extremely predictive of prices} in the near future.

10

We use weights of 0.3 for the most recent two days, 0.2 for the middle day, and 0.1 for the oldest two days.

11

In two games (3718 and 4254) the University of Michigan network lost connectivity and this team’s agent was unable to connect to the game server for a large part of the game. This distorts the results for uninteresting reasons, so we omit these games.

(7)

UNCORRECTED

PROOF

611 To compute the error measure, we process each RFQ separately.

612 For each of the 125 price points, we measure the error as the

dif-613 ference between the predicted probability and the actual outcome.

614 The outcome is either 1 if the lowest bid was higher than the price

615 point, or 0 if the lowest bid was lower than the price point

(corre-616 sponding to whether a bid at that price would have won or lost the

617 auction). A subtlety of measuring error based on individual RFQs in

618 this way is that any probabilistic prediction that is not either 1 or 0

619 will, by deﬁnition, show error on this measure. To provide a

bench-620 mark for this effect, we also plot the error for the optimal distribu-621 tion. This is a distribution derived from the actual prices for the PC

622 type and day being measured, but without RFQ-speciﬁc features.

623 Fig. 4 shows the results of this error measure computed for 624 each of the ﬁve candidate predictors, as well as the optimal

distri-625 bution. We compute errors averaged over all games from the

626 2005 tournament ﬁnal round, restricted to predictions made for

627 the middle part of the game from simulation day 20 to 200. This

628 restriction is primarily because the baseline predictor is not

de-629 signed for border cases, such as when no recent price

observa-630 tions are available. The error of the baseline is artiﬁcially high

631 in these cases, which would distort the overall error results in

fa-632 vor of our methods.

633 Of the candidate predictors, the relative predictor with online

634 learning has the lowest overall prediction error. All four of our pre-635 dictors have lower error than the baseline in the center part of the

636 distribution. The two absolute predictors have greater error when

637 making predictions for very high or very low prices. Online

learn-638 ing reduces prediction error across the entire distribution for both

639 the absolute and relative representations. The relative predictors

640 generally perform better than the absolute predictors. However,

641 the absolute predictor with online learning has lower error than

642 the relative predictor without online learning in the center of the

643 price distribution. This is where online learning should have the

644 greatest impact, since more bids are placed in this region and it

645 has more data to make updates. The differences in error between

646 our predictors and the baseline are quite substantial. In the center

647 of the distribution, the difference between the baseline method

648 and the optimal method is0.3, and the difference between the

649 optimal and the relative predictor with online learning is 0.2,

650 an improvement of roughly 30%.

651 Conceptually, errors towards the center of the distribution are

652 likely to be much more important because bids are centered there.

653 However, is not clear how errors should be weighted, since bid

654 densities may vary in different situations (and certainly for

differ-655 ent agents). These predictions are also used for making

non-bid-656 ding decisions, which may use the information in different ways.

657

Using supply chain performance (as we do in Section5.2below)

658 is one way to resolve this dilemma, since we can test the predictors

659 as they are used by a real decision-making procedure.

660

5.1.2. Future forecasting error

661 A distinctive feature ofDeep Maizeis that it explicitly forecasts

662 changes in future market conditions. For future days, the agent

663 translates its predictions into an effective demand curve for use

664 in production scheduling. This demand curve speciﬁes the price

665 the agent would need to bid to win a particular quantity of PCs,

666 on average. These prices are given in an ordered list, with the ﬁrst

667 price representing the bid to win one PC, the second price the bid

668 to win two PCs, and so on. The predictor gives the probability of

669 winning a bid Prðwin;jbidÞ, which we combine with the predicted

670 overall demand to determine the expected quantity won for any

gi-671 ven bid level. Inverting this function yields the desired prices.12

672 We measure the error for this effective demand curve, which

673 combines errors from the price predictions and demand

predic-674 tions. For each simulation day, we compute the actual demand

675 curve by sorting the observed prices for each PC. We then measure

676 the RMS error between the prices in this list and the predicted

de-677 mand curve, summing the errors for each PC in the list up to a

max-678 imum of 200 values per day.13_{Fig. 5}_{compares the ﬁve predictors on}

679 this aggregate measure of future forecasting error out to a prediction

680 horizon of 50 days.

681 The pattern of results is similar to the results for the current

682 day. The two relative predictors score the best over all horizons,

683 followed by the baseline and then the two absolute predictors.

684 The online afﬁne transformations are beneﬁcial regardless of the

685 representation used. We note that this measure averages error

686 over the distribution (each data point represents a compression

687 of the full distribution shown inFig. 4). We present information

688 in this way partially for easy visualization, but also because future l l ll l l l ll l l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l ll l l l l l l l l l l l l l_{l l l l l} l l l l l_{l l l l} l l l l l l l l_{l l l l} l l l l l l l l_{l l l l} l l l l l l l l_{l l l l l} l 0 20 40 60 80 100 120 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Current Day Prediction Error

Relative Price

RMS Error

Relative with online learning Relative

Absolute with online learning Absolute

Baseline Heuristic Optimal Distribution

Fig. 4.RMS prediction error for current day predictions in the TAC/SCM ﬁnal round games. The points on the horizontal axis represent a uniform distribution of points over a range deﬁned by the recent high and low selling prices in the market.

12

Our functional form allows simple inversion of this function, but this inversion can also be computed using a simple binary search to ﬁnd the necessary bid level.

13

An agent typically projects selling 100 or fewer PCs of any type on a given day, so this is roughly twice the number of values the agent actually uses to make decisions. Please cite this article in press as: Kiekintveld C et al., Forecasting market prices in a supply chain game, Electron. Comm. Res.Appl. (2008),

(8)

UNCORRECTED

PROOF

689 planning may be more dependent on errors over the entire

distri-690 bution than bidding decisions. However, we should note that the

691 absolute predictor shows higher overall error on this measure

pri-692 marily because of errors at the extreme parts of the distribution, as

693 in the short-term error results.

694 5.2. Supply chain performance

695 Since our forecasting methods were speciﬁcally designed to

696 support decision-making in a fully implemented and automated

697 agent, we have the opportunity to assess how forecasts impact

698 agent performance. This is particularly interesting because of the

699 complex interactions between prediction error and decisions. As

700 discussed above, it can be difﬁcult to determine how to weight dif-701 ferent types of prediction errors. Another issue with online predic-702 tions is that the decisions made affect the information available for

703 making future predictions.

704 We ran simulations using a proﬁle of twoMinneTAC-05agents,

705 twoTacTex-05 agents, one static version of Deep Maize, and one

706 modiﬁed version. MinneTAC-05 and TacTex-05 were ﬁnalists in

707 2005 that were among the ﬁrst to release agent binaries. We ran

708 four sets of games with a minimum of 35 samples. The static

ver-709 sion ofDeep Maizeused the relative afﬁne predictor. We present

re-710 sults as differences between the static and variable versions of

711 Deep Maize. This pairing reduces variance but introduces bias to

712 the extent that varying the sixth agent affects the performance of

713 the static version. Another important caveat is that these results

714 are for a single proﬁle of strategies. More systematic approaches

715 to selecting test environments may be better justiﬁed on

game-716 theoretic grounds (Jordan et al., 2007), but we make do with a

sin-717 gle convenient albeit ad hoc proﬁle here.

718 In addition to scores we give three measures of customer sales

719 performance: average selling price (ASP), ‘‘selling efﬁciency”, and

720 ‘‘timing efﬁciency”. Selling efﬁciency is the fraction of achieved

721 revenue to the maximum possible revenue for identical daily sales,

722 given perfect information about opponents’ bids and the option to

723 partially ﬁll orders. Timing efﬁciency measures how well the agent

724 distributed sales over time. Let thet-optimal same day selling price

725 be the highest ASP the agent could achieve by selling each PC at

726 mosttdays after it was actually delivered and no earlier than it

727 was actually produced. PCs are labeled according to a FIFO queuing

728 policy. Timing efﬁciency is the ratio of thet-optimal same day sell-729 ing price to the selling efﬁciency. This factors out the effects of bid-730 ding policy and leaves the effects of deciding which days to sell on. 731 The appeal of these additional metrics is that they allow us to

com-732 pare a speciﬁc element of decision performance against an optimal

733 value, though we must remain cognizant of the additional

con-734 straints when interpreting the results.

735 The simulation results are inTable 2. The most striking point is

736 that all of ourk-NN-based predictors comfortably outperform the

737 same agent using the baseline predictor. To calibrate, the

differ-738 ence between the top and bottom scores in the 2005 ﬁnals was

739

6.5 M, less than the advantage of any of our predictors over the 740 baseline. Clearly, effective prediction can have a sizable impact

741 on agent performance when the decision-making architecture is

742 capable of exploiting the information. The afﬁne transformations

743 again showed beneﬁts for both representations.

744 The absolute predictors scored better than the relative

predic-745 tors, albeit by fairly small margins. This is somewhat unexpected,

746 given the error results. Earlier results using a smaller data set for

747 the predictors actually showed a slight advantage for the relative

748 predictor. We expect a larger data set to beneﬁt the absolute

rep-749 resentation disproportionately because some of the features are

750 based on current prices. With more data more similar instances

751 are available, and the absolute representation should predict more

752 like the relative representation. We conjecture that the relative

753 representation may have greater advantages for smaller data sets,

754 but have not veriﬁed this experimentally. Another likely

explana-755 tion for this discrepancy is that the absolute representation better

756 reﬂects the actual uncertainty present because it is able to spread

757 predictions over a broader range of prices. This may cause the

758 agent to delay making purchasing decisions until more information

759 is available, potentially improving performance.

760 In general, the differences between the agent variations on the

761 sales metrics are relatively small, but we note some interesting

fea-762 tures. The timing measure correlates quite well with the overall

763 scores. The baseline predictor performs very poorly on this

mea-764 sure as well as having a low ASP. The selling efﬁciency numbers

765

do not show a strong pattern, and the ASP numbers for thek-NN

766 variations are also ambiguous. This suggest that one of the more

767 consistent beneﬁts of better forecasts forDeep Maizeis the

useful-768 ness of these forecasts in timing sales activity.

769

5.3. Tournament and self-play data

770 We also generated error scores for the four predictor variants

771 using only the tournament data set and only the self-play data

772 set (a total of 12 different variations). We do not present full data

773 here, but can describe the pattern of results fairly easily. Using only

774 the tournament data resulted in almost exactly equivalent

perfor-775 mance to using the combined data sets. The results using only the

776 self-play data were almost always worse than either the

tourna-777 ment only or combined settings. One conclusion we draw from this

778 is that the process we used to generate the self-play data did not

779 produce the most relevant experience for making predictions.

An-780 other is that the online scoring procedure was effective at

detect-781 ing the source of more salient information to use in the

782 combined predictor. It does this by placing a very high value on

783 the variablebin Eq.(4), weighting the tournament data set much

784 more heavily than the self-play data set. As the agent plays a game,

785 we often observe values set to the maximum possible value, within

786 the range of allowable values.

l l l l l l l l l l 0 10 20 30 40 50 0.00 0.05 0.10 0.15 0.20

Future Prediction Error

Prediction Horizon (days)

RMS Error

Relative with online learning Relative

Absolute with online learning Absolute

Baseline Heuristic

Fig. 5.RMS error between predicted and actual effective demand curves out to a horizon of 50 days, computed on the 2005 ﬁnals.

Table 2

Performance measures in simulated games. The numbers represent the difference between two versions ofDeep Maize using the relative afﬁne predictor and the indicated predictor.

Predictor Score ASP Selling efﬁciency 20-Day timing Relative, no afﬁne 2.3 M 0.004 0.007 0.015

Absolute, afﬁne 0.3 M 0.001 0.008 0.000

Absolute, no afﬁne 1.2 M 0.007 0.014 0.009

Baseline 9.1 M 0.032 0.002 0.032

(9)

UNCORRECTED

PROOF

787 6. Methods for component market predictions

788 As for the PC market,Deep Maizeuses price forecasts to

deter-789 mine how to procure components in the supply market.

Compo-790 nent price forecasts take the formpðs;cÞ

i;j ðqÞ, denoting the predicted

791 offer price for a quantityqof component typec, requested from

792 supplierson dayi, with due datej. The price predictions for each

793 possible pair of request date and due date form a surface, as

de-794 picted inFig. 6. These prediction are made each day for each

sup-795 plier-component pair ðs;cÞ. This contrasts with the PC market,

796 where we model a price curve with a single time parameter (the

797 request date).14

798 Each day, manufacturers receive offers from suppliers in

re-799 sponse to the RFQs sent the previous day. Information from these

800 offers may be used as input to price predictions. Note that it is gen-801 erally not possible to get a direct price quote for each future date,

802 due to the limitation of ﬁve RFQs per day for each supplier and

803 component combination. Thus, there is signiﬁcant uncertainty

804 regarding even the current price curve.

805 We consider three classes of prediction methods for the

compo-806 nent market. The ﬁrst creates an estimate of the current price curve

807 using the observed spot price quotes and linear interpolation. This

808 method does not attempt to predict changes in prices over time.

809 The second approach augments the ﬁrst by using market indicators

810 to predict residual changes from the estimated price curve over

811 time. This method exploits local price stability in the short term

812 by using the linear interpolation model as a base, but still allows

813 for modeling price variability over time. The ﬁnal method applies

814 regression to predict prices directly based on market indicators.

815 This method allows for the greatest ﬂexibility in predicting price

816 changes.

817 6.1. Linear interpolation price predictor

818 As noted in many accounts of TAC/SCM agents, market prices in

819 the game tend to exhibit local price stability over short horizons.

820 The degree of volatility may vary depending on the particular

821 agents playing and random variations in the underlying game

822 state. However, there are structural reasons to expect such

stabil-823 ity. One is that the underlying supplier capacity and customer

de-824 mand levels change over time, but rarely exhibit drastic changes

825 over short time periods. In addition, the outstanding commitments

826 and inventory of suppliers and manufacturers tend to exert a

827 damping effect on component price changes.

828 Deep Maizeestimates the current market state based on price

829 quotes from recent days. We employ linear interpolation to

esti-830 mate prices for due dates with no recent observations.Fig. 7shows

831 an example price curve estimated using this method, which we

832 term the linear interpolation price predictor (LIPP). Due to local

833 price stability, we expect LIPP to be a reasonably good predictor

834 for prices in the near future. Empirical results below support this

835 proposition.

836

6.2. Relative component price predictor

837 In analyzing the performance of LIPP, we note a tendency for

838 predictions over the entire price curve to systematically

overesti-839 mate or underestimate prices. There are several plausible reasons

840 for this. Recall that suppliers’ current capacity varies according to

841 a random walk during the game. Suppliers construct prices in part

842 based on projecting their current production capacity into the

fu-843 ture to determine their available capacity. Since available capacity

844 is used in pricing for all request dates, ﬂuctuations in capacity

845 cause prices for all of these dates to rise or fall in unison. This

pat-846 tern could also result from manufacturing agents increasing or

847 decreasing order quantities, but maintaining approximately the

848 same timing in their purchase schedules. This might result, for

in-849 stance, from rising or falling customer demand levels.

850 LIPP does not take into account any information beyond recent

851 price quotes that might help to predict these sorts of price

ﬂuctu-852 ations. We introduce the relative component price predictor (RCPP)

853 as a way of exploiting these additional forms of information. RCPP

854 predicts the residual error of LIPP by applying regression to

techni-855 cal attributes summarizing the observable market state. A list of

856 the attributes we used is given inTable 3. The predictions made

857 by RCPP are formed by calculating the summation of the residual

858 errors and LIPP predictions.

859 The intuition for this approach is that the RCPP algorithm

iden-860 tiﬁes game states that are likely to result in predictable price shifts,

861 leading to inaccurate LIPP estimates. The residuals modify the

cur-862 rent price estimates to account for these shifts. This approach

ex-863 ploits the short-term accuracy of the linear price predictor, while

864 exploiting historical data and market indicators to adapt future

865 forecasts. We note that there is a parallel between this approach

866 and the relative representation used in predicting prices in the

867 PC market.

868

6.3. Absolute component price predictor

869 The ﬁnal class of predictors we consider is the absolute

compo-870 nent price predictor (ACPP). ACPP applies regression to directly

871 learn prices from the technical indicators. Unlike RCPP, this

ap-872 proach does not bootstrap the short-term accuracy of LIPP, so it 50 100 150 200 10 20 30 40 50 0.6 0.8 1 1.2 1.4

Relative Due Date Current Day

Price Relative to Base Price

Fig. 6.Sample price predictionspi;iþdð1Þfor a single component whereiis the current day anddis the relative due date.

100 110 120 130 140 500 520 540 560 580 600 Day Due Price

Fig. 7.A sample linear interpolation of recent price quotes.

14

We have found empirically that the due dates for PC requests have minor effects on the price, so we do not model these directly.

(10)

UNCORRECTED

PROOF

873 is not relative to the estimated current prices. The potential

advan-874 tage of this approach is that it offers greater ﬂexibility for modeling 875 price shifts, since it is not constrained by the LIPP prediction. If

876 there are prevailing long-term price trends not captured in the

877 RCPP residuals, ACPP may yield more accurate predictions at long

878 horizons. We note that there is a parallel between this approach

879 and the absolute representation used in predicting prices in the

880 PC market.

881 7. Evaluation of component market predictions

882 We evaluate the relative performance of the three classes of

883 predictors on the basis of prediction error using the Prediction

884 Challenge framework. We present the results of experiments run

885 after the competition using the data set from the ﬁnal round of

886 the 2007 Prediction Challenge.15_{The data consists of a total of 48} 887 games containing the designatedPAgent, divided into three sets of

888 16 games played against the same set of opponents. All opponents

889 were selected from the agent repository. The predictors are scored

890 using normalized RMS error on all required component price

891 predictions.

892 7.1. Setting expiration intervals for LIPP

893 We begin by exploring an important parameter of the LIPP

894 method, which we then ﬁx for subsequent analysis. Recall that

895 manufacturers are limited in the number of RFQs they may submit

896 each day. We can increase the number of observations available for

897 estimating the price curve by including RFQ responses from

previ-898 ous days, with the caveat that these quotes may represent stale

899 information. We represent this tradeoff using an expiration

inter-900 val that determines how many days of previous observations are

901 used to construct the LIPP estimate. Using a longer interval

in-902 creases the amount of data available, along with the risk that the

903 data no longer accurately reﬂects the current situation. To

deter-904 mine the best setting of the expiration interval, we tested intervals

905 from one to ﬁve days. The results are shown inFig. 8.

906 The mean accuracy of the LIPP predictions decreases

monoton-907 ically as the size of the length of the expiration interval is

in-908 creased. This result implies that the day-to-day price volatility in

909 TAC/SCM is high enough that the beneﬁt of having data points

910 for additional due dates is outweighed by the reduced accuracy

911 of the older data points. It is possible that adapting this expiration

912 interval dynamically during a game based on observed price

vola-913 tility or other factors could yield more accurate predictions, but we

914 have not yet fully explored this possibility.

915

7.2. Estimating attribute quality for regression models

916 As a preliminary step to regression, we seek to evaluate the

917 quality of various attributes that may be used to predict current

918 and future component prices. Unlike classiﬁcation problems, there

919 are few measures of attribute quality available from the current

lit-920 erature. For example, information gain,j-measure, and

v

2_cannot

921 be used for estimating attribute quality in a regression. Three

mea-922 sures that have been proposed for regression with numerical

attri-923 butes are mean absolute and mean squared error (Breiman et al.,

924 1984), and RReliefF (Robnik-Šikonja and Kononenko, 1997).

Be-925 cause the error measures assume independence among attributes,

926 they are less appropriate given strong interdependencies like those

927 introduced in our analysis. In contrast, RReliefF has be shown

928 empirically to perform well in situations where there are strong

929 interdependencies, such as parity problems. A short explanation

930 of the RReliefF values is rather elusive (see related work

Robnik-931 Šikonja and Kononenko, 2003for a thorough exposition), but we

932 attempt here to give some intuition as to what the values

933 represent.

Table 3

Attributes considered by RCPP and their RReliefF values for current and future days. The ﬁrst ﬁve attributes correspond to parameters of an individual RFQ and the remainder of market state.

Attribute Current Future Description

Day 0.000603 0.000842 Current day

Sent-day 0.000603 0.000842 Day the RFQ will be sent

Due-day 0.000124 0.001038 Day the RFQ will be due

Duration 0.000209 0.000603 Due-day minus sent-day

Quantity 0.011297 0.013593 RFQ quantity

Linear-price 0.019816 0.010123 LIPP price prediction

Nearest-price-point-diff 0.001689 0.002899 Nearest known price point distance (days from due date)

Days-since-data 0.006886 0.006228 Days since due date had a price quote

Market-capacity 0.000496 0.00023 Last supplier capacity from market report

Estimated-capacity 0.000317 0.001029 Estimate of supplier capacity using market reports and trend

Surplus 0.000514 0.000157 Available capacity minus agent demand in recent period

Aggregate-surplus 0.000618 0.000295 Available capacity minus agent demand in total

Mean-demand 0.001663 0.000741 Mean customer demand for component

Mean-low-demand 0.002198 0.001023 Mean customer demand for low-end components

Mean-mid-demand 0.0014 0.0009 Mean customer demand for mid-range components

Mean-high-demand 0.002 0.000787 Mean customer demand for high-end components

5-Day-demand 0.001258 0.000852 Customer demand for last 5 days

EX1 EX2 EX3 EX4 EX5

0.035 0.040 0.045 0.050 Expiration Interval RMS Error

Fig. 8.RMS error for LIPP with various expiration intervals.

15

Many of these analyses were also performed before the Challenge using other test sets; we repeat all of the experiments using this data set for consistency