Price Discrimination in Online Commerce
In 2013, The Wall Street Journal began probing e-commerce platforms to identify instances of potential price discrimination—the provision of different prices to different people.29 By polling different websites it was able to spot several vendors, such as Staples, Home Depot, Rosetta Stone, and Orbitz, that were adjusting prices dynamically based on different factors like user geography, browser history, or mobile-browser use. In the case of Staples, it found that the input most strongly correlated to price was the distance to a rival’s store, explaining about 90 percent of the pricing pattern.
To get the story the WSJ had to simulate visiting the various sites from different computers and browsers in different geographies.30 This initially required using various proxy servers that made it appear like the website was being loaded from different geographies. The publication’s staff also created different archetype users and built user profiles using cookies to see how those user profiles might impact the prices recorded. This case again mimics Figure 1(A), wherein both inputs and outputs are fully observable. Yet, it was more complex than that of the autocomplete algorithm since a straightforward API wasn’t available. Instead, the journalists had to painstakingly reconstruct profiles that simulated inputs to the algorithm, and look to see if any of the variables in those profiles led to significant differences in output (prices).
Using reverse engineering on the scale of the Web surfaces several challenges, underscored both by the WSJ story and by academic efforts to reverse engineer personalization in Web search.31 One of the issues is that sites like Staples might be using A/B testing to analyze whether or not subtle differences on their websites are useful to them. In other words, they’re already running experiments on their sites, and to a reverse engineer it might look like noise, or just confusing irregularities. Algorithms may be unstable and change over time, or have randomness built in to them, which makes understanding patterns in their input-output relationship much more challenging. If you suspect the algorithm may be extremely dynamic and time-sensitive you may need to initiate all of your inputs to the algorithm in parallel in order to minimize the impact of a changing and dynamic algorithm.