Opportunities offered by new neuro-technologies are threatened by lack of coherent plans to analyze, manage, and understand the data. High-performance computing will allow exploratory analysis of ...massive datasets stored in standardized formats, hosted in open repositories, and integrated with simulations.
Bouchard et al. propose that for the neuroscience field to be able to effectively leverage the advances in new neurotechnologies, it will require new high-performance computing platforms and practices for data management and analysis.
Neuroscience initiatives aim to develop new technologies and tools to measure and manipulate neuronal circuits. To deal with the massive amounts of data generated by these tools, the authors envision ...the co-location of open data repositories in standardized formats together with high-performance computing hardware utilizing open source optimized analysis codes.
Ranking the tens of thousands of retrieved webpages for a user query on a Web search engine such that the most informative webpages are on the top is a key information retrieval technology. A popular ...ranking algorithm is the HITS algorithm of Kleinberg. It explores the reinforcing interplay between authority and hub webpages on a particular topic by taking into account the structure of the Web graphs formed by the hyperlinks between the webpages. In this paper, we give a detailed analysis of the HITS algorithm through a unique combination of probabilistic analysis and matrix algebra. In particular, we show that to first-order approximation, the ranking given by the HITS algorithm is the same as the ranking by counting inbound and outbound hyperlinks. Using Web graphs of different sizes, we also provide experimental results to illustrate the analysis.
Low-rank approximation of large and/or sparse matrices is important in many applications, and the singular value decomposition (SVD) gives the best low-rank approximations with respect to ...unitarily-invariant norms. In this paper we show that good low-rank approximations can be directly obtained from the Lanczos bidiagonalization process applied to the given matrix without computing any SVD. We also demonstrate that a so-called one-sided reorthogonalization process can be used to maintain an adequate level of orthogonality among the Lanczos vectors and produce accurate low-rank approximations. This technique reduces the computational cost of the Lanczos bidiagonalization process. We illustrate the efficiency and applicability of our algorithm using numerical examples from several applications areas.
Celotno besedilo
Dostopno za:
CEKLJ, DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
We develop new SVD-updating algorithms for three types of updating problems arising from latent semantic indexing (LSI) for information retrieval to deal with rapidly changing text document ...collections. We also provide theoretical justification for using a reduced-dimension representation of the original document collection in the updating process. Numerical experiments using several standard text document collections show that the new algorithms give higher (interpolated) average precisions than the existing algorithms, and the retrieval accuracy is comparable to that obtained using the complete document collection.
Celotno besedilo
Dostopno za:
CEKLJ, DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
We show that K-means and spectral clustering objective functions can be written as a trace of quadratic forms. Instead of relaxation by eigenvectors, we propose a novel relaxation maintaining the ...nonnegativity of the cluster indicators and thus give the cluster posterior probabilities, therefore resolving cluster assignment difficulty in spectral relaxation. We derive a multiplicative updating algorithm to solve the nonnegative relaxation problem. The method is briefly extended to semi-supervised classification and semi-supervised clustering.
How Good is Recursive Bisection? Simon, Horst D.; Teng, Shang-Hua
SIAM journal on scientific computing,
09/1997, Letnik:
18, Številka:
5
Journal Article
Recenzirano
The most commonly used p-way partitioning method is recursive bisection (RB). It first divides a graph or a mesh into two equal-sized pieces, by a "good" bisection algorithm, and then recursively ...divides the two pieces. Ideally, we would like to use an optimal bisection algorithm. Because the optimal bisection problem that partitions a graph into two equal-sized subgraphs to minimize the number of edges cut is NP-complete, practical RB algorithms use more efficient heuristics in place of an optimal bisection algorithm. Most such heuristics are designed to find the best possible bisection within allowed time. We show that the RB method, even when an optimal bisection algorithm is assumed, may produce a p-way partition that is very far way from the optimal one. Our negative result is complemented by two positive ones: first we show that for some important classes of graphs that occur in practical applications, such as well-shaped finite-element and finite-difference meshes, RB is within a constant factor of the optimal one "almost always." Second, we show that if the balance condition is relaxed so that each block in the p-way partition is bounded by 2n/p, where n is the number of vertices of the graph, then a modified RB finds an approximately balanced $p$-way partition whose cost is within an O(log p) factor of the cost of the optimal p-way partition.
Celotno besedilo
Dostopno za:
CEKLJ, DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
With the exponential growth of information on the World Wide Web, there is great demand for developing efficient methods for effectively organizing the large amount of retrieved information. Document ...clustering plays an important role in information retrieval and taxonomy management for the Web. In this paper we examine three clustering methods: K-means, multi-level METIS, and the recently developed
normalized-cut method using a new approach of combining textual information, hyperlink structure and co-citation relations into a single similarity metric. We found the normalized-cut method with the new similarity metric is particularly effective, as demonstrated on three datasets of web query results. We also explore some theoretical connections between the normalized-cut method and the K-means method.