A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very ...large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states. The running time is exponential in the horizon time (which depends only on the discount factor γ and the desired degree of approximation to the optimal policy). Our algorithm thus provides a different complexity trade-off than classical algorithms such as value iteration--rather than scaling linearly in both horizon time and state space size, our running time trades an exponential dependence on the former in exchange for no dependence on the latter. Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs (Kearns, Mansour, & Ng. Neural information processing systems 13, to appear).PUBLICATION ABSTRACT
In this paper an extension of the distribution-free model of learning introduced by Valiant Comm. ACM, 27(1984), pp. 1134-1142 that allows the presence of malicious errors in the examples given to a ...learning algorithm is studied. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, a worst-case model of errors is studied. The results of this research include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems.
Kearns, Neel, Roth, and Wu ICML 2018 recently proposed a notion of rich subgroup fairness intended to bridge the gap between statistical and individual notions of fairness. Rich subgroup fairness ...picks a statistical fairness constraint (say, equalizing false positive rates across protected groups), but then asks that this constraint hold over an exponentially or infinitely large collection of subgroups defined by a class of functions with bounded VC dimension. They give an algorithm guaranteed to learn subject to this constraint, under the condition that it has access to oracles for perfectly learning absent a fairness constraint. In this paper, we undertake an extensive empirical evaluation of the algorithm of Kearns et al. On four real datasets for which fairness is a concern, we investigate the basic convergence of the algorithm when instantiated with fast heuristics in place of learning oracles, measure the tradeoffs between fairness and accuracy, and compare this approach with the recent algorithm of Agarwal, Beygelzeimer, Dudik, Langford, and Wallach ICML 2018, which implements weaker and more traditional marginal fairness constraints defined by individual protected attributes. We find that in general, the Kearns et al. algorithm converges quickly, large gains in fairness can be obtained with mild costs to accuracy, and that optimizing accuracy subject only to marginal fairness leads to classifiers with substantial subgroup unfairness. We also provide a number of analyses and visualizations of the dynamics and behavior of the Kearns et al. algorithm. Overall we find this algorithm to be effective on real data, and rich subgroup fairness to be a viable notion in practice.
Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which ...addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance.
In this paper we investigate a new formal model of machine learning in which the concept (Boolean function) to be learned may exhibit uncertain or probabilistic behavior—thus, the same input may ...sometimes be classified as a positive example and sometimes as a negative example. Such
probabilistic concepts (or
p-concepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learining 28 the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of p-concepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of p-concepts, we study and develop in detail an underlying theory of learning p-concepts.
Abstract
PARP inhibitors (PARPi) have demonstrated potent therapeutic efficacy in patients with ovarian cancer. However, acquired resistance to PARPi is a major challenge in the clinic. Therefore, ...understanding the resistance mechanism and developing new treatment approaches are important to overcome therapeutic resistance to PARP inhibition. By using our GEM of Brca1-deficient ovarian tumor model, we uncovered a new mechanism underlying a secondary resistance to PARP inhibition mediated by tumor associated macrophages (TAMs) in the tumor microenvironment (TME). Mechanistically, PARP inhibition induced the STAT3 signaling pathway in tumor cells, which in turn promoted pro-tumor polarization of TAMs in the TME of ovarian cancer. Ablation of STAT3 in tumor cells mitigated polarization of M2-like pro-tumor macrophages in the TME and increased tumor infiltration T cells in response to PARP inhibition. Furthermore, we demonstrated that STING agonists reprogramed myeloid cells in the TME of ovarian tumor into antitumor M1-like macrophages and activated dendritic cells (DCs) in a myeloid cell STING-dependent manner. These findings were recapitulated in patient-derived PARPi-resistant ovarian tumor models. Finally, we show that STING agonism was able to overcome the secondary resistance to PARPi in ovarian cancer rendered by immunosuppressive TME. Our study elucidates a new mechanism of PARPi resistance and provides a new treatment strategy to overcome the TME-induced therapeutic resistance to PARP inhibition in ovarian cancers.
Citation Format: Liya Ding, Qiwei Wang, Michael Kearns, Tao Jiang, Xin Cheng, Changli Qian, Jean Zhao. Upregulation of STAT3 signaling in tumor cells fosters a TME-dependent secondary resistance of BRCA1-deficient HGSOC to PARP inhibition abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr LB004.
Minimax Group Fairness: Algorithms and Experiments Diana, Emily; Gill, Wesley; Kearns, Michael ...
Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society,
07/2021
Conference Proceeding
Odprti dostop
We consider a recently introduced framework in which fairness is measured by worst-case outcomes across groups, rather than by the more standard differences between group outcomes. In this framework ...we provide provably convergent oracle-efficient learning algorithms (or equivalently, reductions to non-fair learning) for minimax group fairness. Here the goal is that of minimizing the maximum loss across all groups, rather than equalizing group losses. Our algorithms apply to both regression and classification settings and support both overall error and false positive or false negative rates as the fairness measure of interest. They also support relaxations of the fairness constraints, thus permitting study of the tradeoff between overall accuracy and minimax fairness. We compare the experimental behavior and performance of our algorithms across a variety of fairness-sensitive data sets and show empirical cases in which minimax fairness is strictly and strongly preferable to equal outcome notions.