The algorithm selection problem Rice 1976 seeks to answer the question: Which algorithm is likely to perform best for my problem? Recognizing the problem as a learning task in the early 1990's, the ...machine learning community has developed the field of meta-learning, focused on learning about learning algorithm performance on classification problems. But there has been only limited generalization of these ideas beyond classification, and many related attempts have been made in other disciplines (such as AI and operations research) to tackle the algorithm selection problem in different ways, introducing different terminology, and overlooking the similarities of approaches. In this sense, there is much to be gained from a greater awareness of developments in meta-learning, and how these ideas can be generalized to learn about the behaviors of other (nonlearning) algorithms. In this article we present a unified framework for considering the algorithm selection problem as a learning problem, and use this framework to tie together the crossdisciplinary developments in tackling the algorithm selection problem. We discuss the generalization of meta-learning concepts to algorithms focused on tasks including sorting, forecasting, constraint satisfaction, and optimization, and the extension of these ideas to bioinformatics, cryptography, and other fields.
Outliers due to technical errors in water‐quality data from in situ sensors can reduce data quality and have a direct impact on inference drawn from subsequent data analysis. However, outlier ...detection through manual monitoring is infeasible given the volume and velocity of data the sensors produce. Here we introduce an automated procedure, named oddwater, that provides early detection of outliers in water‐quality data from in situ sensors caused by technical issues. Our oddwater procedure is used to first identify the data features that differentiate outlying instances from typical behaviors. Then, statistical transformations are applied to make the outlying instances stand out in a transformed data space. Unsupervised outlier scoring techniques are applied to the transformed data space, and an approach based on extreme value theory is used to calculate a threshold for each potential outlier. Using two data sets obtained from in situ sensors in rivers flowing into the Great Barrier Reef lagoon, Australia, we show that oddwater successfully identifies outliers involving abrupt changes in turbidity, conductivity, and river level, including sudden spikes, sudden isolated drops, and level shifts, while maintaining very low false detection rates. We have implemented this oddwater procedure in the open source R package oddwater.
Key Points
Feature‐based procedure starts by applying different statistical transformations to data to highlight outliers in high‐dimensional space
Density‐ and distance‐based unsupervised outlier scoring techniques were applied to detect outliers due to technical issues with the sensors
An approach based on extreme value theory was then used to calculate outlier thresholds
Studies in Real−Time Control (RTC) Rainwater Harvesting Systems (RWH) have to date been limited to the control of single storages, leaving the potential benefits of operating multiple storages in a ...coordinated manner largely untested. In this study, we aimed to design an optimization‐based RTC strategy that can operate multiple storages in a coordinated manner to achieve multiple objectives. We modeled the long‐term performance of this coordinated approach (i.e., termed as coordinated control) across a range of storage sizes and compared it with a strategy that optimized the operation of each storage individually, ignoring the state of other stores within the system. Our results show that coordinated control delivered a synergy benefit in achieving better baseflow restoration, with almost no detriment to the water supply and flood protection (overflow reduction) performance. The efficiency achieved through coordinated control allows large storages to compensate for smaller, underperforming systems, to achieve higher overall performance. Such a finding suggests a general control principle in building coordination among multiple storages, which can potentially be adapted to mitigate flooding risks, and also applied to other stormwater control measures. This also opens up a new opportunity for practitioners to construct a future “smart rainwater grid” using a network of distributed storages, in combination with centralized large storages, to manage urban stormwater in a range of contexts and for a range of environmental objectives.
Plain Language Summary
“Smart tanks” based on Real−Time Control (RTC) technology is increasingly applied in rainwater harvesting systems to address water shortages, urban flooding and streams depleted of flow. However, most uses of this technology have been applied to single tanks, without testing the potential of a network controlled in a coordinated manner to better address the environmental problems. To understand the effect of such coordination, we designed a control strategy accordingly and modeled its performance using a customized model. We found that a network of smart tanks can, in most cases, deliver a synergy benefit in restoring streamflow compared to systems that only work on their own. More importantly, this coordination allows large tanks to compensate for smaller, underperforming tanks, to achieve higher overall performance. It suggests a general control principle in building coordination among multiple storages, which can potentially be adapted to mitigate flooding risks, and also applied to other stormwater control measures. It opens up a smart future for managing urban water in a range of contexts and for a range of environmental objectives.
Key Points
Multiple rainwater storages can be operated in a coordinated manner by Real−Time Control (RTC) technology for multiple objectives
This coordinated RTC delivers synergy benefits in restoring baseflow
Large storages compensate for small underperforming storages within the network
This article presents a method for the objective assessment of an algorithm’s strengths and weaknesses. Instead of examining the performance of only one or more algorithms on a benchmark set, or ...generating custom problems that maximize the performance difference between two algorithms, our method quantifies both the nature of the test instances and the algorithm performance. Our aim is to gather information about possible phase transitions in performance, that is, the points in which a small change in problem structure produces algorithm failure. The method is based on the accurate estimation and characterization of the algorithm footprints, that is, the regions of instance space in which good or exceptional performance is expected from an algorithm. A footprint can be estimated for each algorithm and for the overall portfolio. Therefore, we select a set of features to generate a common instance space, which we validate by constructing a sufficiently accurate prediction model. We characterize the footprints by their area and density. Our method identifies complementary performance between algorithms, quantifies the common features of hard problems, and locates regions where a phase transition may lie.
This article analyses ubiquitous flow structures which affect the dynamics of stable atmospheric boundary layers. These structures introduce non‐stationarity and intermittency to turbulent mixing, ...thus invalidating the usual scaling laws and numerical model parametrizations, but their characteristics and generating mechanisms are still generally unknown. Detecting these unknown events from time series requires techniques that do not assume particular geometries or amplitudes of the flow structures. We use a recently developed such method with some modifications to study the night‐time structures over a three‐month period during the FLOSSII experiment.
The structures cover about 26% of the dataset, and can be categorized using clustering into only three classes with similar characteristics. The largest class, including about 50% of the events, contains smooth structures, often with wave‐like shapes, which occur in stronger winds and weak stability. The second class, including sharper structures with large kurtosis, is characterized by weaker winds and stronger stability. The smallest class, including about 20% of the events, contains predominantly sharp step‐like structures, or microfronts. They occur in the weakest winds with strong stability.
Sharper, and particularly shallower, structures are related to transient low‐level wind maxima which create inflection points and may affect generation of turbulence. Furthermore, large wind directional shear, which is another source of transient inflection points, is generated even by deep coherent structures when the background wind is weaker than the structure intensity.
These results show that the complexity of structures can be reduced for the purpose of further analysis using a proper classification. Mapping common characteristics of such events leads to their better understanding, which, if combined with similar analyses of other boundary‐layer data, could lead to improving their effects in numerical models.
We introduce a hybrid Gegenbauer (ultraspherical) integration method (HGIM) for solving boundary value problems (BVPs), integral and integro-differential equations. The proposed approach recasts the ...original problems into their integral formulations, which are then discretized into linear systems of algebraic equations using Gegenbauer integration matrices (GIMs). The resulting linear systems are well-conditioned and can be easily solved using standard linear system solvers. A study on the error bounds of the proposed method is presented, and the spectral convergence is proven for two-point BVPs (TPBVPs). Comparisons with other competitive methods in the recent literature are included. The proposed method results in an efficient algorithm, and spectral accuracy is verified using eight test examples addressing the aforementioned classes of problems. The proposed method can be applied on a broad range of mathematical problems while producing highly accurate results. The developed numerical scheme provides a viable alternative to other solution methods when high-order approximations are required using only a relatively small number of solution nodes.
In 2005, David Pisinger asked the question “where are the hard knapsack problems?”. Noting that the classical benchmark test instances were limited in difficulty due to their selected structure, he ...proposed a set of new test instances for the 0–1 knapsack problem with characteristics that made them more challenging for dynamic programming and branch-and-bound algorithms. This important work highlighted the influence of diversity in test instances to draw reliable conclusions about algorithm performance. In this paper, we revisit the question in light of recent methodological advances – in the form of Instance Space Analysis – enabling the strengths and weaknesses of algorithms to be visualised and assessed across the broadest possible space of test instances. We show where the hard instances lie, and objectively assess algorithm performance across the instance space to articulate the strengths and weaknesses of algorithms. Furthermore, we propose a method to fill the instance space with diverse and challenging new test instances with controllable properties to support greater insights into algorithm selection, and drive future algorithmic innovations.
This paper treats definite integrations numerically using Gegenbauer quadratures. The novel numerical scheme introduces the idea of exploiting the strengths of the Chebyshev, Legendre, and Gegenbauer ...polynomials through a unified approach, and using a unique numerical quadrature. In particular, the developed numerical scheme employs the Gegenbauer polynomials to achieve rapid rates of convergence of the quadrature for the small range of the spectral expansion terms. For a large-scale number of expansion terms, the numerical quadrature has the advantage of converging to the optimal Chebyshev and Legendre quadratures in the L∞-norm and L2-norm, respectively. The key idea is to construct the Gegenbauer quadrature through discretizations at some optimal sets of points of the Gegenbauer–Gauss (GG) type in a certain optimality sense. We show that the Gegenbauer polynomial expansions can produce higher-order approximations to the definite integrals ∫−1xif(x)dx of a smooth function f(x)∈C∞−1,1 for the small range by minimizing the quadrature error at each integration point xi through a pointwise approach. The developed Gegenbauer quadrature can be applied for approximating integrals with any arbitrary sets of integration nodes. Exact integrations are obtained for polynomials of any arbitrary degree n if the number of columns in the developed Gegenbauer integration matrix (GIM) is greater than or equal to n. The error formula for the Gegenbauer quadrature is derived. Moreover, a study on the error bounds and the convergence rate shows that the optimal Gegenbauer quadrature exhibits very rapid convergence rates, faster than any finite power of the number of Gegenbauer expansion terms. Two efficient computational algorithms are presented for optimally constructing the Gegenbauer quadrature. We illustrate the high-order approximations of the optimal Gegenbauer quadrature through extensive numerical experiments, including comparisons with conventional Chebyshev, Legendre, and Gegenbauer polynomial expansion methods. The present method is broadly applicable and represents a strong addition to the arsenal of numerical quadrature methods.
•Presenting a set of mathematical models for green flowshop scheduling problems.•Considering common carbon reduction policies and their corresponding scenarios.•Providing analysis of models and ...insights to design a framework for green flowshops.•Industries can use our cost-effective strategies to meet limits mandated by government.•Government can explore our integrated framework to gauge impacts of carbon policies.
In this paper we consider, from an environmental policy-maker perspective, how carbon reduction policies impact the economic competitiveness of the manufacturing sector. Specifically, we focus on flowshop scheduling – which typically aims to minimize makespan for purely economic objectives – and consider how three common carbon reduction policies – namely, taxes on emissions, baselines on emissions, and emissions trading schemes – can create competitive green flowshops that balance minimization of makespan and carbon emissions. The goal is to enable policy-makers to understand how to set policies and control parameters to achieve environmental objectives while ensuring global economic competitiveness of industry. We initially present a set of mixed-integer linear programming (MILP) models for flowshop scheduling problems operating in a regulated environment in which each carbon reduction policy is adopted. We then introduce a bi-objective scheduling framework for the corresponding problem to obtain alternative solutions under each policy. These models and their computational results however, are not the main focus of the study, but are presented as a means to demonstrate how green policies co-exist with economic objectives, with policy-makers in control of the balance. To this end, based on financial data from Australia’s carbon emissions profile, we provide a policy-oriented analysis of the models, and some managerial insights into the effect of scheduling strategies on carbon emissions under different reduction policies. These insights offer support to both environmental policy-makers and corporate production and sustainability managers to determine whether it is technically feasible and profitable to replace traditional scheduling strategies with environmentally friendly scheduling strategies.
This paper investigates event extraction and early event classification in contiguous spatio-temporal data streams, where events need to be classified using partial information, i.e. while the event ...is ongoing. The framework incorporates an event extraction algorithm and an early event classification algorithm. We apply this framework to synthetic and real problems and demonstrate its reliability and broad applicability. The algorithms and data are available in the R package eventstream, and other code in the supplementary material.