This open access book covers the use of data science, including advanced machine learning, big data analytics, Semantic Web technologies, natural language processing, social media analysis, time ...series analysis, among others, for applications in economics and finance. In addition, it shows some successful applications of advanced data science solutions used to extract new knowledge from data in order to improve economic forecasting models. The book starts with an introduction on the use of data science technologies in economics and finance and is followed by thirteen chapters showing success stories of the application of specific data science methodologies, touching on particular topics related to novel big data sources and technologies for economic analysis (e.g. social media and news); big data models leveraging on supervised/unsupervised (deep) machine learning; natural language processing to build economic and financial indicators; and forecasting and nowcasting of economic variables through time series analysis. This book is relevant to all stakeholders involved in digital and data-intensive research in economics and finance, helping them to understand the main opportunities and challenges, become familiar with the latest methodological findings, and learn how to use and evaluate the performances of novel tools and frameworks. It primarily targets data scientists and business analysts exploiting data science technologies, and it will also be a useful resource to research students in disciplines and courses related to these topics. Overall, readers will learn modern and effective data science solutions to create tangible innovations for economic and financial applications.
Utilize R to uncover hidden patterns in your Big Data. Perform computational analyses on Big Data to generate meaningful results Get a practical knowledge of R programming language while working on ...Big Data platforms like Hadoop, Spark, H2O and SQL/NoSQL databases, Explore fast, streaming, and scalable data analysis with the most cutting-edge technologies in the marketWho This Book Is For This book is intended for Data Analysts, Scientists, Data Engineers, Statisticians, Researchers, who want to integrate R with their current or future Big Data workflows. It is assumed that readers have some experience in data analysis and understanding of data management and algorithmic processing of large quantities of data, however they may lack specific skills related to R.What You Will Learn Learn about current state of Big Data processing using R programming language and its powerful statistical capabilities Deploy Big Data analytics platforms with selected Big Data tools supported by R in a cost-effective and time-saving manner Apply the R language to real-world Big Data problems on a multi-node Hadoop cluster, e.g. electricity consumption across various socio-demographic indicators and bike share scheme usage Explore the compatibility of R with Hadoop, Spark, SQL and NoSQL databases, and H2O platformIn Detail Big Data analytics is the process of examining large and complex data sets that often exceed the computational capabilities. R is a leading programming language of data science, consisting of powerful functions to tackle all problems related to Big Data processing. The book will begin with a brief introduction to the Big Data world and its current industry standards. With introduction to the R language and presenting its development, structure, applications in real world, and its shortcomings. Book will progress towards revision of major R functions for data management and transformations. Readers will be introduce to Cloud based Big Data solutions (e.g. Amazon EC2 instances and Amazon RDS, Microsoft Azure and its HDInsight clusters) and also provide guidance on R connectivity with relational and non-relational databases such as MongoDB and HBase etc. It will further expand to include Big Data tools such as Apache Hadoop ecosystem, HDFS and MapReduce frameworks. Also other R compatible tools such as Apache Spark, its machine learning library Spark MLlib, as well as H2O.Style and approach This book will serve as a practical guide to tackling Big Data problems using R programming language and its statistical environment. Each section of the book will present you with concise and easy-to-follow steps on how to process, transform and analyse large data sets.
"This book provides a comprehensive reference for the many different types and methods of compression. Included are a detailed and helpful taxonomy, analysis of most common methods, and discussions ...on the use and comparative benefits of methods and description of ""how to"" use them. Detailed descriptions and explanations of the most well-known and frequently used compression methods are covered in a self-contained fashion, with an accessible style and technical level for specialists and nonspecialists. This 4th edition of this successful volume contains significant additional material as there has been tremendous progress in this field, especially in audio compression such as FLAC, AAC, WavPack, ALS and Dolby AC3, which are all covered. Additional key features include: RAR, Tunstall code, Differential and Hyperspectral Compression, LZMA, H.264, 3D data sets, PDF. This book provides an invaluable reference and guide for all researchers and practitioners needing a comprehensive compilation for a broad range of compression methods."
The culture of academic medicine may foster mistreatment that disproportionately affects individuals who have been marginalized within a given society (minoritized groups) and compromises workforce ...vitality. Existing research has been limited by a lack of comprehensive, validated measures, low response rates, and narrow samples as well as comparisons limited to the binary gender categories of male or female assigned at birth (cisgender).
To evaluate academic medical culture, faculty mental health, and their relationship.
A total of 830 faculty members in the US received National Institutes of Health career development awards from 2006-2009, remained in academia, and responded to a 2021 survey that had a response rate of 64%. Experiences were compared by gender, race and ethnicity (using the categories of Asian, underrepresented in medicine defined as race and ethnicity other than Asian or non-Hispanic White, and White), and lesbian, gay, bisexual, transgender, queer (LGBTQ+) status. Multivariable models were used to explore associations between experiences of culture (climate, sexual harassment, and cyber incivility) with mental health.
Minoritized identity based on gender, race and ethnicity, and LGBTQ+ status.
Three aspects of culture were measured as the primary outcomes: organizational climate, sexual harassment, and cyber incivility using previously developed instruments. The 5-item Mental Health Inventory (scored from 0 to 100 points with higher values indicating better mental health) was used to evaluate the secondary outcome of mental health.
Of the 830 faculty members, there were 422 men, 385 women, 2 in nonbinary gender category, and 21 who did not identify gender; there were 169 Asian respondents, 66 respondents underrepresented in medicine, 572 White respondents, and 23 respondents who did not report their race and ethnicity; and there were 774 respondents who identified as cisgender and heterosexual, 31 as having LGBTQ+ status, and 25 who did not identify status. Women rated general climate (5-point scale) more negatively than men (mean, 3.68 95% CI, 3.59-3.77 vs 3.96 95% CI, 3.88-4.04, respectively, P < .001). Diversity climate ratings differed significantly by gender (mean, 3.72 95% CI, 3.64-3.80 for women vs 4.16 95% CI, 4.09-4.23 for men, P < .001) and by race and ethnicity (mean, 4.0 95% CI, 3.88-4.12 for Asian respondents, 3.71 95% CI, 3.50-3.92 for respondents underrepresented in medicine, and 3.96 95% CI, 3.90-4.02 for White respondents, P = .04). Women were more likely than men to report experiencing gender harassment (sexist remarks and crude behaviors) (71.9% 95% CI, 67.1%-76.4% vs 44.9% 95% CI, 40.1%-49.8%, respectively, P < .001). Respondents with LGBTQ+ status were more likely to report experiencing sexual harassment than cisgender and heterosexual respondents when using social media professionally (13.3% 95% CI, 1.7%-40.5% vs 2.5% 95% CI, 1.2%-4.6%, respectively, P = .01). Each of the 3 aspects of culture and gender were significantly associated with the secondary outcome of mental health in the multivariable analysis.
High rates of sexual harassment, cyber incivility, and negative organizational climate exist in academic medicine, disproportionately affecting minoritized groups and affecting mental health. Ongoing efforts to transform culture are necessary.
ALEX: An Updatable Adaptive Learned Index Ding, Jialin; Minhas, Umar Farooq; Yu, Jia ...
Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data,
06/2020
Conference Proceeding
Odprti dostop
Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key ...in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+ tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+ trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.
Most designers know that yellow text presented against a blue background reads clearly and easily, but how many can explain why, and what really are the best ways to help others and ourselves clearly ...see key patterns in a bunch of data? When we use software, access a website, or view business or scientific graphics, our understanding is greatly enhanced or impeded by the way the information is presented. This book explores the art and science of why we see objects the way we do. Based on the science of perception and vision, the author presents the key principles at work for a wide range of applications--resulting in visualization of improved clarity, utility, and persuasiveness. The book offers practical guidelines that can be applied by anyone: interaction designers, graphic designers of all kinds (including web designers), data miners, and financial analysts. * Complete update of the recognized source in industry, research, and academic for applicable guidance on information visualizing * Includes the latest research and state of the art information on multimedia presentation * More than 160 explicit design guidelines based on vision science * A new final chapter that explains the process of visual thinking and how visualizations help us to think about problems * Packed with over 400 informative full color illustrations, which are key to understanding of the subject
Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools ...and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. The book is targeted at information systems practitioners, programmers, consultants, developers, information technology managers, specification writers, data analysts, data modelers, database R professionals, data warehouse engineers, data mining professionals. The book will also be useful for professors and students of upper-level undergraduate and graduate-level data mining and machine learning courses who want to incorporate data mining as part of their data management knowledge base and expertise. * Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects * Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods * Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks—in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization