NUK - logo
E-viri
Recenzirano Odprti dostop
  • Filtered poisson process ba...
    Grant, James A.; Szechtman, Roberto

    European journal of operational research, 12/2021, Letnik: 295, Številka: 2
    Journal Article

    •We propose a continuum-armed bandit model of surveillance resource allocation.•Poisson process data is given as feedback, with filtering depending on the actions.•We derive an effective upper confidence bound approach with adaptive discretisation.•We show matching upper and lower bounds on the regret. We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O˜(T2/3) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.