UNCORRECTED
PROOF
12
Forecasting market prices in a supply chain game
q
3
Christopher Kiekintveld
a,*, Jason Miller
b, Patrick R. Jordan
a, Lee F. Callender
a, Michael P. Wellman
a4 a
Computer Science and Engineering , University of Michigan, 2260 Hayward Avenue, Ann Arbor, MI 48109-2121, USA
5 b
Department of Mathematics, Stanford University, Building 380, Stanford, CA 94305, USA
6 8
a r t i c l e
i n f o
9 Article history:
10 Received 17 October 2007
11 Received in revised form 4 November 2008 12 Accepted 5 November 2008 13 Available online xxxx 14 Keywords: 15 Forecasting 16 Markets 17 Price prediction 18 Trading agent competition 19 Supply chain management 20 Machine learning 21
2 2
a b s t r a c t
23 Predicting the uncertain and dynamic future of market conditions on the supply chain, as reflected in
24 prices, is an essential component of effective operational decision-making. We present and evaluate
25 methods used by our agent,Deep Maize, to forecast market prices in the trading agent competition supply
26 chain management game (TAC/SCM). We employ a variety of machine learning and representational
27 techniques to exploit as many types of information as possible, integrating well-known methods in novel
28 ways. We evaluate these techniques through controlled experiments as well as performance in both the
29 main TAC/SCM tournament and supplementary Prediction Challenge. Our prediction methods
demon-30 strate strong performance in controlled experiments and achieved the best overall score in the Prediction
31 Challenge.
32 Ó2008 Elsevier B.V. All rights reserved.
33 34
35 1. Introduction
36 The quality of economic decisions made now depend on market
37 conditions that will prevail in the future. This is particularly true of
38 supply chain environments, which evolve dynamically based on
39 complex interactions among multiple players across several tiers.
40 We present and evaluate methods of forecasting market conditions
41 developed to predict prices in the trading agent competition
sup-42 ply chain management game (TAC/SCM). In TAC/SCM, automated
43 trading agents compete to maximize their profits in a simulated
44 supply chain environment. They face many challenges, including
45 strategic interactions and uncertainty about market conditions.
46 Though complex, this domain offers a relatively controlled
envi-47 ronment amenable to extensive experimentation. TAC also attracts
48 a diverse pool of participants, which facilitates benchmarking and
49 evaluation. This is particularly important for prediction tasks,
50 where the quality of predictions depends in part on the actions
51 of opponents. Research competitions such as TAC offer the
oppor-52 tunity to test predictions in environments where the opposing
53 agents are designed and selected by others, mitigating an
impor-54 tant source of potential bias relative to using opponents of one’s
55 own design (Wellman et al., 2007).
56 Our agent for the SCM game,Deep Maize, relies on price
predic-57 tions in both the upstream and downstream markets to make
pur-58 chasing, production, and sales decisions.Fig. 1shows an overview
59 of the daily decision process. Predictions about market conditions
60 feed into an approximate optimization procedure that uses values
61 to decompose the decision problem. The architecture is described
62 in detail elsewhere (Kiekintveld et al., 2006). Since our predictions
63 are integrated into a task-situated agent, we can evaluate the
ben-64 efits of improved predictions on overall performance in addition to
65 prediction error metrics.
66 We apply machine learning to forecast prices in two different
67 markets with distinct negotiation mechanisms and information–
68 revelation rules. Like many real markets, the available information
69 on market conditions comes from a variety of heterogeneous
70 sources. Integrating different learning techniques and
representa-71 tions to exploit all of the available information is a central theme
72 of our approach. We find that representing the current market
con-73 ditions well is particularly important for effective learning.
74 We begin with a discussion of related work and an overview of
75 the TAC/SCM game, highlighting the benefits and drawbacks of the
76 different forms of information available for making predictions.
77 The first set of methods we describe are used for predicting prices
78 in the downstream consumer market. Central to all of these
meth-79 ods is a nearest-neighbors learning algorithm that uses historical 1567-4223/$34.00Ó2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.elerap.2008.11.005 q
This is a revised and substantially extended version of a paper presented at the Sixth international joint conference on autonomous agents and multiagent systems (Kiekintveld et al., 2007).
* Corresponding author. Tel.: +1 734 818 0259.
E-mail addresses:[email protected](C. Kiekintveld),[email protected]. edu(J. Miller),[email protected] (P.R. Jordan),[email protected] (L.F. Call-ender),[email protected](M.P. Wellman).
URLs: http://teamcore.usc.edu/kiekintveld (C. Kiekintveld), http://www.eecs. umich.edu/~prjordan(P.R. Jordan),http://ai.eecs.umich.edu/people/wellman(M.P. Wellman).
Contents lists available atScienceDirect
Electronic Commerce Research and Applications
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e c r aUNCORRECTED
PROOF
80 game data to induce the relationship between market
indica-81 tors and prices. We test variations that employ two different
82 representations for price distributions: an absolute representation,
83 and a relative representation that exploits local price stability.
84 Finally, we develop an online learning method that procedure that
85 combines and shifts predictions using additional observations
86 within a game instance. Our evaluation of these methods includes
87 both prediction error and supply chain performance. We show
sub-88 stantial improvements over a baseline predictor, with online
adap-89 tation exhibiting especially strong performance improvements.
90 We proceed to discuss methods for making predictions in the
91 market for component supplies. There are two elements to our
ap-92 proach in this market. The first is a linear interpolation model that
93 uses recent observations to make relatively accurate predictions
94 about the current spot market. The second is a learning model that
95 regresses prices against market indicators. This learning model can
96 be used to make absolute price predictions, or to predict residuals
97 for the linear interpolation model. We show that combining the
98 linear interpolation model for current market prices with residual
99 predictions improves overall performance, particularly for
long-100 term predictions.
101 We close with discussion of the performance of both prediction
102 algorithms based on the results of TAC/SCM tournaments and the
103 inaugural TAC/SCM Prediction Challenge. Deep Maizehas been a
104 perennial finalist in the primary competition, and placed first in
105 the Prediction Challenge. Our forecasting methods demonstrate
106 strong performance on prediction error measures and contribute
107 to strong overall performance in the context of decision-making.
108 2. Related work
109 Forecasting is studied across many disciplines, and many
ap-110 proaches have been developed for making predictions based on
111 historical data. We do not attempt to survey all of the various
112 methods here, but refer the reader to one of many introductory
113 texts that describe exponential smoothing, GARCH, ARIMA,
spec-114 tral analysis, neural networks, Kalman filters, and other popular
115 methods (Brockwell and Davis, 1996; Chatfield, 2001; Hastie
116 et al., 2001).Pindyck and Rubinfeld (1998)is a good introduction
117 to many forecasting methods used in the context of economic
pre-118 dictions and modeling. The proliferation of techniques is in part an
119 indication that the best method for any particular problem may
de-120 pend on many factors, including the type and quantity of
informa-121 tion available and the properties of the domain.
122 Many applications focus on comparing methods to identify the
123 most effective approach for a particular problem (for example,
124 see the application of forecasting to retail sales by Alon et al.
125 or the application to predict electricity prices by Nogales et al.
126 (2002)). Our approach for price prediction focuses on combining
127 predictions made using difference sources of information and
128 methods. Various methods have been developed for combining
129 predictions, often motivated by the problem of combining the
130
opinions of experts (Bates and Granger, 1969; Clemen and
131 Winkler, 1999). Recently, predictions markets have become a
132
popular method for aggregating information (Hanson, 2003).
133 Our approach shares some features with these markets, including
134 the use of logarithmic scoring rules to evaluate the quality of
135 predictions.
136 Price forecasting is a very important element of
decision-mak-137 ing in a wide variety of market settings, including financial markets
138 (
Kaufman, 1998) and energy markets (Nogales et al., 2002). The rise 139 of internet auctions has led to an increasing interest in
computa-140 tional methods for price forecasting in auction environments
141 (e.g.,Ghani, 2005).Brooks et al. (2002)discusses many issues
re-142 lated to learning price models in the context of online markets.
143 Price prediction methods have also been used as the basis for
suc-144 cessful bidding strategies in many specific types of auctions,
145 including simultaneous ascending auctions (Osepayshvili et al.,
146 2005). It is not surprising then that price prediction has played
147 an important role in the design of agents for the trading agent
com-148
petition since its inception. For the TAC travel game (Wellman
149 et al., 2007), a crucial factor in agent performance was the ability
150 to predict the closing price of auctions for hotel rooms. Participants
151 explored many approaches for making these predictions, spanning
152 statistical analysis, machine learning, and economic modeling
153 (Cheng et al., 2005; Stone et al., 2003; Wellman et al., 2004).
154 Post-tournament analysis revealed that the common thread among
155 the most effective methods was not the particular technique
ap-156 plied, but rather the information taken into account by the
predic-157
tor (Wellman et al., 2004); specifically, the best predictors
158 accounted for the initial auction prices.
159 Most agents for the TAC/SCM game also employ some form of
160 price prediction, but vary significantly both in what quantities
161 are predicted and the methods used for making predictions. Many
162 agents only estimate prices for the current simulation day, and do
163 not attempt to forecast changes in prices over time (Benisch et al.,
164 2004, 2006; Burke et al., 2006; He et al., 2006). One of the most
165 successful agents in the tournament,TacTex, has applied a variety
166 of sophisticated approaches for price prediction. The specific
meth-167 ods have evolved in successive tournaments, but generally rely on
168 a combination of statistical machine learning applied to historical
169
data and online adaptation (Pardoe and Stone, 2005, 2006,
170 2007a,b). The most recent version ofTacTexspecifically uses
ma-171 chine learning to predict both current and future prices. The
Minne-172
TAC team has also devoted significant effort to developing
173 prediction methods for the TAC/SCM domain (Ketter, 2007; Ketter
174 et al., 2007). Their approach is somewhat unique in that it focus on
175 predicting changes in the economic regimes, which reflect
over-176 supply, under-supply, or other characteristic market conditions.
177 Our use of market state is similar in spirit to the idea of an
eco-178 nomic regime, but we do not distinguish among a discrete set of
179 possible regimes. The first version ofDeep Maizeused an analytic
180 solution based on competitive equilibrium to predict market prices
181 (Kiekintveld et al., 2004). This approach became infeasible when
182 the game rules were modified due to the greater complexity of
183 the supplier pricing model. Another unique approach was used
184 byRedAgent, the winner of the first TAC/SCM competition (Keller
185 et al., 2004). This agent used an internal market to coordinate
deci-186 sions and make predictions; to our knowledge no current agents
187 use this approach.
State State Estimation Long-term Production Schedule Supply Supply Market Market Prediction Prediction Customer Customer Market Market Prediction Prediction
Market Predictions
Component Values PC Values Customer Sales Supply Purchasing Factory ScheduleHigh-level
Decisions
Low-level
Decisions
Fig. 1.Overview ofDeep Maize’s decision process on each TAC/SCM day.
UNCORRECTED
PROOF
188 3. The TAC supply chain game
189 Agents in TAC/SCM play the role of personal computer (PC)
190 manufacturers (Collins et al., 2007; Eriksson et al., 2006).1The six
191 automated agents compete to maximize their profits on a simulated
192 supply chain, shown inFig. 2. Agents assemble 16 different types of
193 PCs for sale, defined by compatible combinations of CPU,
mother-194 board, hard disk, and memory components. Each day agents
negoti-195 ate with suppliers to purchase components, negotiate with
196 customers to sell finished products, and create manufacturing and
197 shipping schedules for the next day. Each agent has an identical
fac-198 tory with limited capacity for production. Each game has 220
simu-199 lated days, each of which takes 15 s of real time. 200 3.1. The personal computer market
201 The 16 PC types are separated into three market segments
rep-202 resenting high-, mid-, and low-grade PCs. The average demand
le-203 vel for each segment varies separately according to a random walk
204 with a stochastic trend. Each day, the customer demand process
205 generates a set of RFQs (request-for-quotes) representing the
206 aggregate demand. Each customer request is valid for a single
207 day and specifies a PC type, due date, reserve price, and penalty
208 for late delivery. The negotiations for these requests use a
simulta-209 neous sealed-bid auction mechanism. Agents submit price offers
210 for a subset of the RFQs. The lowest bidder for each request is
211 awarded the order, and must deliver the product by the due date
212 or pay the penalty specified in the contract. Due to the
configura-213 tion of the market, there are strong interactions among auctions
214 both within and across days.
215 3.2. The component market
216 To procure components to build PCs, agents must negotiate
217 with eight suppliers whose behavior is defined in the game
speci-218 fication. Each supplier sells two distinct grades of a single
compo-219 nent type. CPU components have a unique supplier, whereas all
220 other types are provided by two suppliers. Suppliers have a limited
221 daily production capacity for each component they sell. These
222 capacities vary according to a mean-reverting random walk, and
223 are not directly observable by the manufacturers.
224 Manufacturers negotiate with suppliers via an RFQ mechanism.
225 Each day, a manufacturer may send up to five RFQs per component
226 to each supplier specifying type, due date, quantity, and reserve
227 price. Any future day may be specified as the due date. The
suppli-228 ers respond the following day with offers specifying a proposed
229 quantity, price, and delivery date.2Suppliers may opt to accept a
230 subset of the offers by responding with orders. Suppliers determine
231 offer prices based on the ratio of their committed and available
pro-232 duction capacity. This pricing function accounts for inventory,
previ-233 ous commitments, future capacity, and the demand represented in
234 the current set of customer RFQs. In general, component prices rise
235 as demand rises and suppliers become more constrained. In some
236 cases, the supplier may be capacity constrained and unable to offer
237 components at any price. Prices depend on both the requested due
238 date and the day the request is made. Price predictions must be
239 made for this entire space of possibilities to support important
deci-240 sions about the optimal time to make component purchases.
241
3.3. Observable market data
242 Agents may utilize several complementary forms of evidence to
243 make predictions about market prices. Each provides unique
in-244 sights into current activity and how the market may evolve. The
245 first category of evidence comprises observations of the markets
246 during a game instance. This information is the most timely and
247 specific to the current situation, but is of relatively low fidelity.
248 More detailed observations of market activity are available in game
249 logs after each game instance has completed.3Logs from previous
250 rounds are publicly available, and additional data may be generated
251 in private simulations.
252 During a game, agents receive daily observations from
partici-253 pating in the PC and component markets. In the PC market, agents
254 observe the set of requests and whether or not each bid resulted in
255 an order. They also receive a daily report for each PC type listing
256 the maximum and minimum selling price from the previous day.
257 The details of individual market transactions and opponents’ bids
258 are not revealed. In the component market, agents receive price
259 quotes for each request they send, some of which may be price
260 probes. Neither the supplier’s state (inventory, capacity, and
com-261 mitments) nor other agents’ requests are revealed. Every 20
simu-262 lation days, manufacturers receive a report containing aggregate
263 market data for the preceding 20-day period. This data includes
264 the mean selling price and quantity sold for each PC type, and
265 the mean selling price, quantity ordered, and quantity shipped
266 for component type. The report also includes the mean production
267 capacity for each supply line. The importance of this evidence is
268 that it is specific to the current market conditions and current
269 set of manufacturing agents. However, these observations say
rel-270 atively little about how market conditions are likely to evolve,
271 especially if similar conditions have not been observed earlier in
272 the game.
273 The historical logs reveal the entire transaction history in each
274 market (including all bids, offers, and orders), as well as supplier
275 capacities and other state attributes hidden during a game
in-276 stance. This precise information is potentially very useful for
pre-277 dicting the evolution of market prices over time. However,
278 conditions in the current game are unlikely to be identical to
con-279 ditions in historical games. A particularly difficult issue is that the
280 pool of competing agents is constantly changing as teams update
281 their agents, so historical data may not be representative of the
282 current strategic environment. Selecting relevant historical data
283 is an important challenge. Agent 1 Agent 2 Agent 4 Agent 3 Agent 5 Agent 6 Pintel IMD Basus Macrostar Mec Queenmax Watergate Mintor supplier offer component RFQ offer acceptance
Customer
bid PCRFQ CPU Motherboard Memory HardDiskFig. 2.TAC/SCM supply chain configuration.
1 We present an overview of the game rules here. The full rule specifications for each competition, tournament results, and general TAC information are available at http://www.sics.se/tac.
2
Suppliers may offer either a partial quantity or a later delivery if they are unable to meet the initial request due to capacity constraints.
3A rule change in 2007 explicitly disallows agents from collecting and utilizing these logs during a tournament round, but logs from previous rounds or simulations may still be used.
UNCORRECTED
PROOF
284 3.4. The Prediction Challenge285 The 2007 TAC event introduced two challenge competitions to
286 complement the original TAC/SCM scenario. In the TAC/SCM
Pre-287 diction Challenge (Pardoe, 2007), agents compete to make the most
288 accurate price predictions in four categories: short- and long-term,
289 for both component and PC markets (seeTable 1). The challenge
290 isolates the prediction component of the game by having
partici-291 pants make predictions on behalf of the same designated agent,
292 PAgent. Simulations including aPAgent are run before the
chal-293 lenge to generate log files. During the challenge, the log files are
294 used to simulate the experience of playing the game for the
predic-295 tors by providing the exact sequence of messages observed by
PA-296 gent during the game. The predictions made based on this
297 information by competitors in the challenge do not affect the
298 behavior of thePAgent. This simulation framework allows all
pre-299 dictors to make an identical set of predictions, and also allows
pre-300 dictions to be made many times for the same set of game instances.
301 The predictions are scored against actual prices using the root
302 mean squared (RMS) error, normalizing by component and PC base
303 prices.
304 Each round of the challenge used three sets of 16 game
in-305 stances, with different agents competing withPAgenton the
sup-306 ply chain. The agents comprising the competitive environment
307 were selected from those in the public agent repository.4 Since
308 tests can be run independently and repeated, the software
frame-309 work for the challenge also provides a useful toolkit for conducting 310 one’s own prediction tests. We took advantage of this in developing
311 and testing new prediction methods for the component market in
312 preparation for the 2007 TAC/SCM tournament and Prediction
313 Challenge.
314 4. Methods for PC market predictions
315 Each day,Deep Maizeestimates the underlying demand state in
316 the customer market and creates predictions of the price curves it
317 faces.5For the current simulation day, the set of requests is known. 318 Estimating the price curve in this case is equivalent to estimating
319 the probability that a given bid on an RFQ will result in an order,
320 denoted PrðwinjbidÞ. To predict the price curve for future simula-321 tion days, it is also necessary to predict the set of request that will
322 be generated. We combine predictions about the set of requests
323 and the distribution of selling prices for each request (i.e.,
324 PrðwinjbidÞ) to estimate the effective demand curve for each future
325 day. The effective demand curve represents the expected marginal
326 selling prices for each additional PC sold. The agent uses these
fore-327 cast demand curves to make customer bidding decisions and create
328 a projected manufacturing schedule. We begin by discussion of our
329 methods for estimating a key element of the game state, and
pro-330 ceed with a discussion of our methods for predicting customer
331 market prices.
332
4.1. Customer demand projection
333 The customer demand processes that determine the number of
334 RFQs to generate exert a great influence on market prices in the
335 SCM game. There are three stochastic demand processes,
corre-336 sponding to the respective market segments. Each process is
cap-337
tured by two state variables: a meanQ and trend
s
. The number338
of RFQs generated in each market segment on daydis determined
339 by a draw from a poisson distribution with meanQ. The initial
de-340 mand levels for each segment are drawn uniformly from known
341 intervals. The mean demand evolves according to the daily update
342 rule
343
Qdþ1¼minðQmax;maxðQmin;
s
dQdÞÞ: ð1Þ 345345346
Both the minimum and maximum values ofQare given for each
347 market segment in the specification. The trend
s
is initially neutral,348
s
0¼1, and is updated by the stochastic perturbation349
U½0:01;0:01:350
s
dþ1¼maxð0:95;minð1=0:95;s
dþÞÞ
: ð2Þ 352352353
When the demandQreaches a maximum or minimum value, the
354 trend resets to the neutral level.
355 The stateQand
s
are hidden from the manufacturers, but can be356 estimated based on the observed demand. We estimate the
under-357 lying demand process using a Bayesian model, illustrated inFig. 3.6
358 Note that the updated demand,Qdþ1, is a deterministic function of
359
Qdand
s
d, as specified in Eq.(1). This deterministic relationshipsim-360 plifies the update of our beliefs about the demand state given a new
361 observation. Let PrðQd;
s
dÞdenote the probability distribution over362 demand states given our observations through dayd. We can
incor-363 porate a new observationQbdþ1as follows:
364
PrðQd;
s
djQbdþ1Þ /PrðQbdþ1jQd;s
dÞPrðQd;s
dÞ: ð3Þ 366366367 Eq.(3)exploits the fact that the observation atdþ1 is conditionally
368 independent of prior observations given the demand stateðQd;
s
dÞ.369 We can elaborate the first term of the RHS using the demand update
370 (1):
PrðQbdþ1jQd;
s
dÞ ¼PoissonðminðQmax;maxðQmin;s
dQdÞÞÞ: 372372373 After calculating the RHS of Eq.(3)for all possible demand states,
374 we normalize to recover the updated probability distribution. We
375 can then project forward our beliefs overðQd;
s
dÞusing the demand376 and trend update Eqs.(1) and (2), to obtain a probability
distribu-377 tion overðQdþ1;
s
dþ1Þ.378 A sequence of updates by Eq.(3)represents an exact posterior
379 distribution over demand state given a series of observations. To
380 maintain a finite encoding of this joint distribution, we consider
381 only integer values for demand and divide the trend range intoT
382 discrete values, evenly spaced (Deep Maize usesT¼9 for
tourna-383 ment play). Given a distribution over demand states, we can
pro-384 ject this distribution forward to estimate future demand.
385
4.2. Learning prices from historical data
386 We use ak-nearest-neighbors (k-NN) algorithm to induce the
387 relationship between indicators of market conditions (features)
388 and likely distributions of current and future prices. The idea is
Table 1
Four categories of predictions made in the Prediction Challenge.
Prediction Description
Current PC prices Lowest price offered by manufacturers for each current RFQ
Future PC prices Median lowest price offered for each PC type in 20 days Current component
prices
Offer price for each component RFQ sent on current day Future component
prices
Offer price for each component RFQ sent in 20 days
4 The TAC agent repository
http://www.sics.se/tac/showagents.phpstores binary implementations of many agents from past TAC tournaments, released by their developers.
5
The methods described in this section were developed for the 2005 TAC/SCM tournament, and used with minor updates and new data sets in the2006 competition. The version used in the 2007 tournament and Prediction Challenge were identical to the 2006 version (including the data sets).
6
We have released source code for this estimation method. It is available in the TAC agent repository.
UNCORRECTED
PROOF
389 simple: given a set of games, find situations that most closely
390 resemble the current situation and predict based on the observed
391 outcomes. When prices are not changing much from day to day,
392 learning prices from historical data has little additional benefit
393 over simply estimating the prices in the current game instance.
394 However, when prices are changing, historical data may help the
395 agent to predict how the prices will move based on features of
396 the current situation.
397 For each PC type and day, we define the state using a vector of
398 featuresF¼ ðf0;. . .;fnÞ.7The difference between two statesf1and
399 f2is defined by weighted Euclidean distance:
d¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xn i¼0 wi ðfi1fi2Þ 2 v u u t ; 401 401
402 wherewiis a weight associated with each feature. Additionally, we
403 bound the maximum distance for any individual feature, sod¼ 1if
404 f1
i fi2 is greater than the bound. To predict the prices during a
405 game, the agent first computes the current state features. It then
406 finds thekstate vectors in a database of historical game instances
407 that are most similar to the current state, usingdas the metric.
408 To ensure diversity in the data set, we select at most one of thek
409 vectors from each historical game instance. Associated with each
410 state in the historical database are the observed price distributions
411 in that game instance. We average these for thekobservations to
412 predict prices in the current game. Preliminary experiments varying
413 the value ofkshowed the best results for values between 15 and 20;
414 we use a value of 18 for all competitions and experimental results
415 presented below.
416 The potential features we experimented with included the
cur-417 rent simulation day, statistics about current prices, and estimates
418 of underlying supply, demand, and inventory conditions. Many of
419 these were motivated by analysis showing that related measures
420 were strongly correlated with customer market prices in previous
421 competitions (Kiekintveld et al., 2005). We tested a range of
possi-422 ble features, weights, and bounds in preliminary experiments, and
423 selected those which had the lowest prediction error after
exten-424 sive but ad hoc manual exploration. All of the following results
425 use equal weights and bounds. For predicting the current days’
426 prices, we selected the following features: average high and
aver-427 age low selling price from the previous five days, estimated mean
428 customer demand (Q), the simulation day, and the linear trend of
429 selling prices from the previous ten days. For predicting future
430 prices, we used the same set of features with the addition of the
431 estimated customer demand trend (
s
).432 4.2.1. Data sets
433 We employ two distinct data sets for making predictions, one
434 based on tournament games and one based on internal
simula-435 tions. The tournament data set contained games from earlier
436 rounds in the tournament. As new rounds were played, we added
437 the additional games to the training set. During the 2005
tourna-438 ment we automatically added new games within a round to the
439 data set; this was disallowed by the 2007 rule change mentioned
440 above.
441 The second data set consisted of games run using six copies
442 ofDeep Maize. We played a sequence of roughly 100 games,
ini-443 tially using just the tournament data set to make predictions.
444 After each game, we replaced the oldest log in the data set with
445 the new one, so eventually the data set consisted entirely of
446
games played by the six Deep Maizeagents. The final 45 games
447 of this sequence were selected as the ‘self-play’ data set. Ideally,
448 as the data sets contain more data from this specific
environ-449 ment, the predictions will come to reflect the true market prices
450 and agents’ behavior will improve. In the limit, we can imagine
451 agents converging to a competitive equilibrium, which may be
452 closer to the environment of the final round than games from
453 less competitive rounds early in the tournament. As it happened,
454 our analysis indicates that predictions based on this data set
455 were not very accurate.
456
4.3. Representing and estimating price distributions
457 We experiment with two different representations for the
dis-458 tribution of PC prices. One uses absolute price levels, and the other
459 is relative to recent price observations. The advantage of absolute
460 prices is that they translate readily across games and can easily
461 represent changes in the overall level of prices. The relative
distri-462 bution is centered on moving averages of the high and low selling
463 price reports for the previous five days. Between these two
aver-464 ages we use 60 uniformly distributed points, above the average
465 of the highs and below the average of the lows we use 20 points
466 distributed uniformly. The strength of this representation is in its
467 ability to exploit the fact that prices tend to be stable over short
468 periods of time in the game (a few simulation days). It effectively
469 shifts observations from historical games into the relevant range
470 of prices for the current game, which is often small. When
extract-471 ing data from historical games we adjust for the number of
oppo-472 nent agents by randomly selecting one agent and removing that
473 agent’s bid, if it exists. This accounts for the effects of the agent’s
474 own bid on the price; it is effectively learning the price that would
475 be expected based on only the competitors’ bids. We also smooth
476 the observed distributions by considering all requests in the range
477
d10;. . .;dþ10.
478 To compute the estimated price distribution we take the
aver-479 age of the distributions stored for thek-nearest-neighbors, as
de-480
scribed above. For each price point j, we take the predicted
481 probabilities for this point in each of the historical instances and
482 average them. For relative predictions, these points represent
pric-483 ing statistics relative to current prices; for absolute, they represent
484 absolute prices. After averaging, we enforce monotonicity in the
485 distribution. Starting from the low end of the distribution, we take
486 the maximum of the estimated probability for the price point and
487 the estimated probability of the immediately preceding price
488 point.
489 To forecast pricesidays into the future, we project based on
490
what happened i days into the future in each historical game.
491 The agent requires predictions for all future days, so using this
492 method requires that the historical instances have at least as many
493 days remaining in the game as the current game does. We enforce
494 the tighter constraint that the historical predictions must start on
495 exactly the same simulation day. For relative price predictions, we
496 also account for changes in the overall price levels over time by
497 modifying the pricing statistics. For each price statistic, we
com-498 pute the average change over the interveningidays in the
histor-499 ical data. Pricing statistics for the future days are modified by
500 adding the average change. The absolute distributions do not need
Q
dτ
dQ
d+1#Q
τ
d+1Fig. 3.Bayesian network model of demand evolution.
7
When initially computing the values of features, we normalize the values for each feature by the standard deviation of the feature in the training set.
UNCORRECTED
PROOF
501 to account for this, since the price points are based on absolute
502 values.
503 4.4. Online learning
504 We employ an online learning procedure that optimizes
predic-505 tions according to a logarithmic scoring rule.8 First, we produce
506 baseline predictionsPtournamentandPself-playbased on the two data sets 507 described in Section4.2.1. We then construct adjusted predictions of
508 the form
509
aðbPtournamentþ ð1bÞ Pself-playÞ þc; ð4Þ
511 511
512 wherebweights the two prediction sources based on their accuracy
513 in the current environment, and the coefficientsaandcdefine an
514 affine transformation allowing us to correct for systematic errors.
515 We determine the values ofa,bandcusing a brute-force search
516 to find the set of values (from a discretized range) that minimizes
517 the scoring rule. The possible transformations are scored using the
518 recent history of bids won and lost, a valuable but otherwise
un-519 used source of information. Each possible set of values is scored
520 on the measure Pbidslogð
a
bidÞ wherea
bid is the magnitude of521 the difference between the predicted probability of winning and
522 the result of the bid, which is 1 if the bid resulted in a win and 0
523 otherwise. To prevent very large changes in the prediction we
524 loosely bound the possible values ofa,bandc. This optimization
525 is performed using the actual bids submitted by the agent in recent
526 days, and modifies the distribution only within the range of prices
527 where bids are made. This typically leaves the extremes of the
dis-528 tribution (very high or low probability bids) unchanged, since
529 there is little bidding activity at these extremes.
530 5. Evaluation of PC market predictions
531 We evaluate our forecasting techniques using both prediction
532 error and supply chain performance. We evaluate several
varia-533 tions of the predictor that employ subsets of the functionality
de-534 scribed in Section 4 to assess the contributions of each. As a
535 baseline we use a heuristic predictor described by Pardoe and
536 Stone (2005). The baseline predictor uses only information
con-537 tained in the current price level, and relies completely on local
538 price stability for predictive power. This is similar to what our
pre-539 dictor might look like using the relative representation without
540 any historical data or online transformations. This predictor
pro-541 duces surprisingly accurate predictions despite its simplicity. The
542 reason is that prices in the TAC SCM game tend to be quite stable
543 for long periods, especially if the start and end game conditions
544 are excluded.9One of the primary reasons we use this as a baseline 545 is that we believe it is reflective of the default heuristic methods 546 used by many of the agents to make decisions (especially in the early 547 years of the competition). In particular, it makes use of only local
548 information about prices in the current game, and does not attempt
549 to project changes in price levels over time.
550 The baseline method predicts the distribution of PC prices using
551 a weighted average of uniform densities between the low and high
552 prices from the previous five days.10We project this into the future
553 by assuming that the distribution of prices does not change and
554 adjusting for the predicted change in the number of RFQs generated,
555 as given by the model of customer demand described in Section4.1.
556 We present results for the baseline and four variations of ourk
-557 NN-based predictor. These variations use either the relative or
558 absolute representation, and either apply online affine
transforma-559 tions or not. All results use both the tournament and self-play data
560 sets, which contain 95 and 45 game instances respectively.
Regard-561 less of whether overall affine transformations are applied we use
562 the online scoring method to weight the predictions from these
563 two sources. At the end of this section we briefly discuss the effects
564 of varying the data sets. All error results are derived from the 14
565
clean finals games from the 2005 tournament.11
566
5.1. Prediction error
567 We discuss two different types of prediction error. The first is
568 for predictions made about the probability of winning the current
569 day’s auctions, and the second is for predictions about the effective
570 demand curve on future days.
571 As we discuss the prediction error results, it is important to
572 keep in mind that there is a substantial amount of uncertainty
573 inherent in the domain model; we do not expect any prediction
574 method to achieve anywhere near perfect predictions. To
demon-575 strate this and provide some calibration, we consider the process
576 that generates the overall level of customer demand in the SCM
577 game. The level of customer demand (number of requests
gener-578 ated each day) is one of the most important factors influencing
579 market prices. Recall that we use a Bayesian model to track the
580 mean customer demand and trend; this model is the theoretically
581 optimal predictor of the process. We ran tests comparing this
pre-582 dictor to a naive predictor that predicts that the mean demand on
583 all future days will remain exactly the demand observed today. The
584 optimal Bayesian model improved on the naive model, but the
dif-585 ference was relatively small. The naive model had an RMS error of
586 approximately 35, and the Bayesian model achieved an RMS error
587 of approximately 30 – a reduction on the order of 15%. Whereas
588 this may seem a relatively small improvement in prediction
qual-589 ity, these small improvements can translate into substantial
differ-590 ences in overall agent performance.
591
5.1.1. Current day prediction error
592 Predicting the conditional distribution PrðwinjbidÞfor the
cur-593 rent day’s RFQ set is an important special case because these
esti-594 mates are used for making bidding decisions. It is also unique
595 because it is the only day for which we have the actual set of
cus-596 tomer RFQs available; for future days, there is substantial
addi-597 tional uncertainty since the quantities, PC types, due dates,
598 penalties, and reserve prices are all unknown.
599 We measure RMS error over a range of possible prices (i.e.,
po-600 tential bid values). The error may vary over the distribution, and
601 this allows us to present a more detailed picture of the errors each
602 predictor makes. The range of values we consider is relative to the
603 recent high and low selling prices in the game. We distributed 125
604 price points over the range from 10% below the average low selling
605 price from the previous five simulation days, to 10% above the
606 average of the high selling price over the same window. Bids
to-607 wards the high and low ends of this distribution are very rare, since
608 they are either very high or low relative to the market.Deep Maize’s
609 bids are centered around price point 62, with a large majority
fall-610 ing between price points 50 and 80.
8
The logarithmic scoring rule is strictly proper, in that the optimal score is obtained only for the true probability. Scoring according to rules that are not proper (including RMS error) can lead to biased estimates.
9
A linear regression on the average selling prices from one day to the next results in anR2value on the order of 0.99, so current prices are extremely predictive of prices in the near future.
10
We use weights of 0.3 for the most recent two days, 0.2 for the middle day, and 0.1 for the oldest two days.
11
In two games (3718 and 4254) the University of Michigan network lost connectivity and this team’s agent was unable to connect to the game server for a large part of the game. This distorts the results for uninteresting reasons, so we omit these games.
UNCORRECTED
PROOF
611 To compute the error measure, we process each RFQ separately.
612 For each of the 125 price points, we measure the error as the
dif-613 ference between the predicted probability and the actual outcome.
614 The outcome is either 1 if the lowest bid was higher than the price
615 point, or 0 if the lowest bid was lower than the price point
(corre-616 sponding to whether a bid at that price would have won or lost the
617 auction). A subtlety of measuring error based on individual RFQs in
618 this way is that any probabilistic prediction that is not either 1 or 0
619 will, by definition, show error on this measure. To provide a
bench-620 mark for this effect, we also plot the error for the optimal distribu-621 tion. This is a distribution derived from the actual prices for the PC
622 type and day being measured, but without RFQ-specific features.
623 Fig. 4 shows the results of this error measure computed for 624 each of the five candidate predictors, as well as the optimal
distri-625 bution. We compute errors averaged over all games from the
626 2005 tournament final round, restricted to predictions made for
627 the middle part of the game from simulation day 20 to 200. This
628 restriction is primarily because the baseline predictor is not
de-629 signed for border cases, such as when no recent price
observa-630 tions are available. The error of the baseline is artificially high
631 in these cases, which would distort the overall error results in
fa-632 vor of our methods.
633 Of the candidate predictors, the relative predictor with online
634 learning has the lowest overall prediction error. All four of our pre-635 dictors have lower error than the baseline in the center part of the
636 distribution. The two absolute predictors have greater error when
637 making predictions for very high or very low prices. Online
learn-638 ing reduces prediction error across the entire distribution for both
639 the absolute and relative representations. The relative predictors
640 generally perform better than the absolute predictors. However,
641 the absolute predictor with online learning has lower error than
642 the relative predictor without online learning in the center of the
643 price distribution. This is where online learning should have the
644 greatest impact, since more bids are placed in this region and it
645 has more data to make updates. The differences in error between
646 our predictors and the baseline are quite substantial. In the center
647 of the distribution, the difference between the baseline method
648 and the optimal method is0.3, and the difference between the
649 optimal and the relative predictor with online learning is 0.2,
650 an improvement of roughly 30%.
651 Conceptually, errors towards the center of the distribution are
652 likely to be much more important because bids are centered there.
653 However, is not clear how errors should be weighted, since bid
654 densities may vary in different situations (and certainly for
differ-655 ent agents). These predictions are also used for making
non-bid-656 ding decisions, which may use the information in different ways.
657
Using supply chain performance (as we do in Section5.2below)
658 is one way to resolve this dilemma, since we can test the predictors
659 as they are used by a real decision-making procedure.
660
5.1.2. Future forecasting error
661 A distinctive feature ofDeep Maizeis that it explicitly forecasts
662 changes in future market conditions. For future days, the agent
663 translates its predictions into an effective demand curve for use
664 in production scheduling. This demand curve specifies the price
665 the agent would need to bid to win a particular quantity of PCs,
666 on average. These prices are given in an ordered list, with the first
667 price representing the bid to win one PC, the second price the bid
668 to win two PCs, and so on. The predictor gives the probability of
669 winning a bid Prðwin;jbidÞ, which we combine with the predicted
670 overall demand to determine the expected quantity won for any
gi-671 ven bid level. Inverting this function yields the desired prices.12
672 We measure the error for this effective demand curve, which
673 combines errors from the price predictions and demand
predic-674 tions. For each simulation day, we compute the actual demand
675 curve by sorting the observed prices for each PC. We then measure
676 the RMS error between the prices in this list and the predicted
de-677 mand curve, summing the errors for each PC in the list up to a
max-678 imum of 200 values per day.13Fig. 5compares the five predictors on
679 this aggregate measure of future forecasting error out to a prediction
680 horizon of 50 days.
681 The pattern of results is similar to the results for the current
682 day. The two relative predictors score the best over all horizons,
683 followed by the baseline and then the two absolute predictors.
684 The online affine transformations are beneficial regardless of the
685 representation used. We note that this measure averages error
686 over the distribution (each data point represents a compression
687 of the full distribution shown inFig. 4). We present information
688 in this way partially for easy visualization, but also because future l l ll l l l ll l l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l l ll l ll l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l l l l l 0 20 40 60 80 100 120 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Current Day Prediction Error
Relative Price
RMS Error
Relative with online learning Relative
Absolute with online learning Absolute
Baseline Heuristic Optimal Distribution
Fig. 4.RMS prediction error for current day predictions in the TAC/SCM final round games. The points on the horizontal axis represent a uniform distribution of points over a range defined by the recent high and low selling prices in the market.
12
Our functional form allows simple inversion of this function, but this inversion can also be computed using a simple binary search to find the necessary bid level.
13
An agent typically projects selling 100 or fewer PCs of any type on a given day, so this is roughly twice the number of values the agent actually uses to make decisions. Please cite this article in press as: Kiekintveld C et al., Forecasting market prices in a supply chain game, Electron. Comm. Res.Appl. (2008),
UNCORRECTED
PROOF
689 planning may be more dependent on errors over the entire
distri-690 bution than bidding decisions. However, we should note that the
691 absolute predictor shows higher overall error on this measure
pri-692 marily because of errors at the extreme parts of the distribution, as
693 in the short-term error results.
694 5.2. Supply chain performance
695 Since our forecasting methods were specifically designed to
696 support decision-making in a fully implemented and automated
697 agent, we have the opportunity to assess how forecasts impact
698 agent performance. This is particularly interesting because of the
699 complex interactions between prediction error and decisions. As
700 discussed above, it can be difficult to determine how to weight dif-701 ferent types of prediction errors. Another issue with online predic-702 tions is that the decisions made affect the information available for
703 making future predictions.
704 We ran simulations using a profile of twoMinneTAC-05agents,
705 twoTacTex-05 agents, one static version of Deep Maize, and one
706 modified version. MinneTAC-05 and TacTex-05 were finalists in
707 2005 that were among the first to release agent binaries. We ran
708 four sets of games with a minimum of 35 samples. The static
ver-709 sion ofDeep Maizeused the relative affine predictor. We present
re-710 sults as differences between the static and variable versions of
711 Deep Maize. This pairing reduces variance but introduces bias to
712 the extent that varying the sixth agent affects the performance of
713 the static version. Another important caveat is that these results
714 are for a single profile of strategies. More systematic approaches
715 to selecting test environments may be better justified on
game-716 theoretic grounds (Jordan et al., 2007), but we make do with a
sin-717 gle convenient albeit ad hoc profile here.
718 In addition to scores we give three measures of customer sales
719 performance: average selling price (ASP), ‘‘selling efficiency”, and
720 ‘‘timing efficiency”. Selling efficiency is the fraction of achieved
721 revenue to the maximum possible revenue for identical daily sales,
722 given perfect information about opponents’ bids and the option to
723 partially fill orders. Timing efficiency measures how well the agent
724 distributed sales over time. Let thet-optimal same day selling price
725 be the highest ASP the agent could achieve by selling each PC at
726 mosttdays after it was actually delivered and no earlier than it
727 was actually produced. PCs are labeled according to a FIFO queuing
728 policy. Timing efficiency is the ratio of thet-optimal same day sell-729 ing price to the selling efficiency. This factors out the effects of bid-730 ding policy and leaves the effects of deciding which days to sell on. 731 The appeal of these additional metrics is that they allow us to
com-732 pare a specific element of decision performance against an optimal
733 value, though we must remain cognizant of the additional
con-734 straints when interpreting the results.
735 The simulation results are inTable 2. The most striking point is
736 that all of ourk-NN-based predictors comfortably outperform the
737 same agent using the baseline predictor. To calibrate, the
differ-738 ence between the top and bottom scores in the 2005 finals was
739
6.5 M, less than the advantage of any of our predictors over the 740 baseline. Clearly, effective prediction can have a sizable impact
741 on agent performance when the decision-making architecture is
742 capable of exploiting the information. The affine transformations
743 again showed benefits for both representations.
744 The absolute predictors scored better than the relative
predic-745 tors, albeit by fairly small margins. This is somewhat unexpected,
746 given the error results. Earlier results using a smaller data set for
747 the predictors actually showed a slight advantage for the relative
748 predictor. We expect a larger data set to benefit the absolute
rep-749 resentation disproportionately because some of the features are
750 based on current prices. With more data more similar instances
751 are available, and the absolute representation should predict more
752 like the relative representation. We conjecture that the relative
753 representation may have greater advantages for smaller data sets,
754 but have not verified this experimentally. Another likely
explana-755 tion for this discrepancy is that the absolute representation better
756 reflects the actual uncertainty present because it is able to spread
757 predictions over a broader range of prices. This may cause the
758 agent to delay making purchasing decisions until more information
759 is available, potentially improving performance.
760 In general, the differences between the agent variations on the
761 sales metrics are relatively small, but we note some interesting
fea-762 tures. The timing measure correlates quite well with the overall
763 scores. The baseline predictor performs very poorly on this
mea-764 sure as well as having a low ASP. The selling efficiency numbers
765
do not show a strong pattern, and the ASP numbers for thek-NN
766 variations are also ambiguous. This suggest that one of the more
767 consistent benefits of better forecasts forDeep Maizeis the
useful-768 ness of these forecasts in timing sales activity.
769
5.3. Tournament and self-play data
770 We also generated error scores for the four predictor variants
771 using only the tournament data set and only the self-play data
772 set (a total of 12 different variations). We do not present full data
773 here, but can describe the pattern of results fairly easily. Using only
774 the tournament data resulted in almost exactly equivalent
perfor-775 mance to using the combined data sets. The results using only the
776 self-play data were almost always worse than either the
tourna-777 ment only or combined settings. One conclusion we draw from this
778 is that the process we used to generate the self-play data did not
779 produce the most relevant experience for making predictions.
An-780 other is that the online scoring procedure was effective at
detect-781 ing the source of more salient information to use in the
782 combined predictor. It does this by placing a very high value on
783 the variablebin Eq.(4), weighting the tournament data set much
784 more heavily than the self-play data set. As the agent plays a game,
785 we often observe values set to the maximum possible value, within
786 the range of allowable values.
l l l l l l l l l l 0 10 20 30 40 50 0.00 0.05 0.10 0.15 0.20
Future Prediction Error
Prediction Horizon (days)
RMS Error
Relative with online learning Relative
Absolute with online learning Absolute
Baseline Heuristic
Fig. 5.RMS error between predicted and actual effective demand curves out to a horizon of 50 days, computed on the 2005 finals.
Table 2
Performance measures in simulated games. The numbers represent the difference between two versions ofDeep Maize using the relative affine predictor and the indicated predictor.
Predictor Score ASP Selling efficiency 20-Day timing Relative, no affine 2.3 M 0.004 0.007 0.015
Absolute, affine 0.3 M 0.001 0.008 0.000
Absolute, no affine 1.2 M 0.007 0.014 0.009
Baseline 9.1 M 0.032 0.002 0.032
UNCORRECTED
PROOF
787 6. Methods for component market predictions
788 As for the PC market,Deep Maizeuses price forecasts to
deter-789 mine how to procure components in the supply market.
Compo-790 nent price forecasts take the formpðs;cÞ
i;j ðqÞ, denoting the predicted
791 offer price for a quantityqof component typec, requested from
792 supplierson dayi, with due datej. The price predictions for each
793 possible pair of request date and due date form a surface, as
de-794 picted inFig. 6. These prediction are made each day for each
sup-795 plier-component pair ðs;cÞ. This contrasts with the PC market,
796 where we model a price curve with a single time parameter (the
797 request date).14
798 Each day, manufacturers receive offers from suppliers in
re-799 sponse to the RFQs sent the previous day. Information from these
800 offers may be used as input to price predictions. Note that it is gen-801 erally not possible to get a direct price quote for each future date,
802 due to the limitation of five RFQs per day for each supplier and
803 component combination. Thus, there is significant uncertainty
804 regarding even the current price curve.
805 We consider three classes of prediction methods for the
compo-806 nent market. The first creates an estimate of the current price curve
807 using the observed spot price quotes and linear interpolation. This
808 method does not attempt to predict changes in prices over time.
809 The second approach augments the first by using market indicators
810 to predict residual changes from the estimated price curve over
811 time. This method exploits local price stability in the short term
812 by using the linear interpolation model as a base, but still allows
813 for modeling price variability over time. The final method applies
814 regression to predict prices directly based on market indicators.
815 This method allows for the greatest flexibility in predicting price
816 changes.
817 6.1. Linear interpolation price predictor
818 As noted in many accounts of TAC/SCM agents, market prices in
819 the game tend to exhibit local price stability over short horizons.
820 The degree of volatility may vary depending on the particular
821 agents playing and random variations in the underlying game
822 state. However, there are structural reasons to expect such
stabil-823 ity. One is that the underlying supplier capacity and customer
de-824 mand levels change over time, but rarely exhibit drastic changes
825 over short time periods. In addition, the outstanding commitments
826 and inventory of suppliers and manufacturers tend to exert a
827 damping effect on component price changes.
828 Deep Maizeestimates the current market state based on price
829 quotes from recent days. We employ linear interpolation to
esti-830 mate prices for due dates with no recent observations.Fig. 7shows
831 an example price curve estimated using this method, which we
832 term the linear interpolation price predictor (LIPP). Due to local
833 price stability, we expect LIPP to be a reasonably good predictor
834 for prices in the near future. Empirical results below support this
835 proposition.
836
6.2. Relative component price predictor
837 In analyzing the performance of LIPP, we note a tendency for
838 predictions over the entire price curve to systematically
overesti-839 mate or underestimate prices. There are several plausible reasons
840 for this. Recall that suppliers’ current capacity varies according to
841 a random walk during the game. Suppliers construct prices in part
842 based on projecting their current production capacity into the
fu-843 ture to determine their available capacity. Since available capacity
844 is used in pricing for all request dates, fluctuations in capacity
845 cause prices for all of these dates to rise or fall in unison. This
pat-846 tern could also result from manufacturing agents increasing or
847 decreasing order quantities, but maintaining approximately the
848 same timing in their purchase schedules. This might result, for
in-849 stance, from rising or falling customer demand levels.
850 LIPP does not take into account any information beyond recent
851 price quotes that might help to predict these sorts of price
fluctu-852 ations. We introduce the relative component price predictor (RCPP)
853 as a way of exploiting these additional forms of information. RCPP
854 predicts the residual error of LIPP by applying regression to
techni-855 cal attributes summarizing the observable market state. A list of
856 the attributes we used is given inTable 3. The predictions made
857 by RCPP are formed by calculating the summation of the residual
858 errors and LIPP predictions.
859 The intuition for this approach is that the RCPP algorithm
iden-860 tifies game states that are likely to result in predictable price shifts,
861 leading to inaccurate LIPP estimates. The residuals modify the
cur-862 rent price estimates to account for these shifts. This approach
ex-863 ploits the short-term accuracy of the linear price predictor, while
864 exploiting historical data and market indicators to adapt future
865 forecasts. We note that there is a parallel between this approach
866 and the relative representation used in predicting prices in the
867 PC market.
868
6.3. Absolute component price predictor
869 The final class of predictors we consider is the absolute
compo-870 nent price predictor (ACPP). ACPP applies regression to directly
871 learn prices from the technical indicators. Unlike RCPP, this
ap-872 proach does not bootstrap the short-term accuracy of LIPP, so it 50 100 150 200 10 20 30 40 50 0.6 0.8 1 1.2 1.4
Relative Due Date Current Day
Price Relative to Base Price
Fig. 6.Sample price predictionspi;iþdð1Þfor a single component whereiis the current day anddis the relative due date.
100 110 120 130 140 500 520 540 560 580 600 Day Due Price
Fig. 7.A sample linear interpolation of recent price quotes.
14
We have found empirically that the due dates for PC requests have minor effects on the price, so we do not model these directly.
UNCORRECTED
PROOF
873 is not relative to the estimated current prices. The potentialadvan-874 tage of this approach is that it offers greater flexibility for modeling 875 price shifts, since it is not constrained by the LIPP prediction. If
876 there are prevailing long-term price trends not captured in the
877 RCPP residuals, ACPP may yield more accurate predictions at long
878 horizons. We note that there is a parallel between this approach
879 and the absolute representation used in predicting prices in the
880 PC market.
881 7. Evaluation of component market predictions
882 We evaluate the relative performance of the three classes of
883 predictors on the basis of prediction error using the Prediction
884 Challenge framework. We present the results of experiments run
885 after the competition using the data set from the final round of
886 the 2007 Prediction Challenge.15The data consists of a total of 48 887 games containing the designatedPAgent, divided into three sets of
888 16 games played against the same set of opponents. All opponents
889 were selected from the agent repository. The predictors are scored
890 using normalized RMS error on all required component price
891 predictions.
892 7.1. Setting expiration intervals for LIPP
893 We begin by exploring an important parameter of the LIPP
894 method, which we then fix for subsequent analysis. Recall that
895 manufacturers are limited in the number of RFQs they may submit
896 each day. We can increase the number of observations available for
897 estimating the price curve by including RFQ responses from
previ-898 ous days, with the caveat that these quotes may represent stale
899 information. We represent this tradeoff using an expiration
inter-900 val that determines how many days of previous observations are
901 used to construct the LIPP estimate. Using a longer interval
in-902 creases the amount of data available, along with the risk that the
903 data no longer accurately reflects the current situation. To
deter-904 mine the best setting of the expiration interval, we tested intervals
905 from one to five days. The results are shown inFig. 8.
906 The mean accuracy of the LIPP predictions decreases
monoton-907 ically as the size of the length of the expiration interval is
in-908 creased. This result implies that the day-to-day price volatility in
909 TAC/SCM is high enough that the benefit of having data points
910 for additional due dates is outweighed by the reduced accuracy
911 of the older data points. It is possible that adapting this expiration
912 interval dynamically during a game based on observed price
vola-913 tility or other factors could yield more accurate predictions, but we
914 have not yet fully explored this possibility.
915
7.2. Estimating attribute quality for regression models
916 As a preliminary step to regression, we seek to evaluate the
917 quality of various attributes that may be used to predict current
918 and future component prices. Unlike classification problems, there
919 are few measures of attribute quality available from the current
lit-920 erature. For example, information gain,j-measure, and
v
2cannot921 be used for estimating attribute quality in a regression. Three
mea-922 sures that have been proposed for regression with numerical
attri-923 butes are mean absolute and mean squared error (Breiman et al.,
924 1984), and RReliefF (Robnik-Šikonja and Kononenko, 1997).
Be-925 cause the error measures assume independence among attributes,
926 they are less appropriate given strong interdependencies like those
927 introduced in our analysis. In contrast, RReliefF has be shown
928 empirically to perform well in situations where there are strong
929 interdependencies, such as parity problems. A short explanation
930 of the RReliefF values is rather elusive (see related work
Robnik-931 Šikonja and Kononenko, 2003for a thorough exposition), but we
932 attempt here to give some intuition as to what the values
933 represent.
Table 3
Attributes considered by RCPP and their RReliefF values for current and future days. The first five attributes correspond to parameters of an individual RFQ and the remainder of market state.
Attribute Current Future Description
Day 0.000603 0.000842 Current day
Sent-day 0.000603 0.000842 Day the RFQ will be sent
Due-day 0.000124 0.001038 Day the RFQ will be due
Duration 0.000209 0.000603 Due-day minus sent-day
Quantity 0.011297 0.013593 RFQ quantity
Linear-price 0.019816 0.010123 LIPP price prediction
Nearest-price-point-diff 0.001689 0.002899 Nearest known price point distance (days from due date)
Days-since-data 0.006886 0.006228 Days since due date had a price quote
Market-capacity 0.000496 0.00023 Last supplier capacity from market report
Estimated-capacity 0.000317 0.001029 Estimate of supplier capacity using market reports and trend
Surplus 0.000514 0.000157 Available capacity minus agent demand in recent period
Aggregate-surplus 0.000618 0.000295 Available capacity minus agent demand in total
Mean-demand 0.001663 0.000741 Mean customer demand for component
Mean-low-demand 0.002198 0.001023 Mean customer demand for low-end components
Mean-mid-demand 0.0014 0.0009 Mean customer demand for mid-range components
Mean-high-demand 0.002 0.000787 Mean customer demand for high-end components
5-Day-demand 0.001258 0.000852 Customer demand for last 5 days
20-Day-demand 0.000687 0.00164 Customer demand for last 20 days
40-Day-demand 0.000833 0.0014 Customer demand for last 40 days
EX1 EX2 EX3 EX4 EX5
0.035 0.040 0.045 0.050 Expiration Interval RMS Error
Fig. 8.RMS error for LIPP with various expiration intervals.
15
Many of these analyses were also performed before the Challenge using other test sets; we repeat all of the experiments using this data set for consistency