In this paper, we present a stacking model to detect phishing webpages using URL and HTML features. In terms of features, we design lightweight URL and HTML features and introduce HTML string ...embedding without using the third-party services, making it possible to develop real-time detection applications. Furthermore, we devise a stacking model by combining GBDT, XGBoost and LightGBM in multiple layers, which enables different models to be complementary, thus improving the performance on phishing webpage detection. In particular, we collect two real-world datasets for evaluations, named as 50K-PD and 50K-IPD, respectively. 50K-PD contains 49,947 webpages with URLs and HTML codes. 50K-IPD contains 53,103 webpages with screenshots in addition to URLs and HTML codes. The proposed approach outperforms quite a few machine learning models on multiple metrics, achieving 97.30% on accuracy, 4.46% on missing alarm rate, and 1.61% on false alarm rate on 50K-PD dataset. On 50K-IPD dataset, the proposed approach achieves 98.60% on accuracy, 1.28% on missing alarm rate, and 1.54% on false alarm rate.
•A real-time phishing webpage detection system that can be used to protect users from phishing attacks is proposed.•HTML string embedding is proposed for extracting features from HTML code automatically.•Stacking model combines multiple machine learning models for better performance.
ChIPseeker is an R package for annotating ChIP-seq data analysis. It supports annotating ChIP peaks and provides functions to visualize ChIP peaks coverage over chromosomes and profiles of peaks ...binding to TSS regions. Comparison of ChIP peak profiles and annotation are also supported. Moreover, it supports evaluating significant overlap among ChIP-seq datasets. Currently, ChIPseeker contains 15 000 bed file information from GEO database. These datasets can be downloaded and compare with user's own data to explore significant overlap datasets for inferring co-regulation or transcription factor complex for further investigation.
ChIPseeker is released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/ChIPseeker.html).
The HTML and CSS Workshop equips you with the practical knowledge to create modern responsive websites. From mastering simple HTML markup and CSS tags, through to integrating media queries and ...animations to create a rich, engaging user experience, you'll build your skills with the help of hands-on examples and activities.
Education or learning is essentially a process of optimizing a child's potential towards the achievement of certain abilities as a standard for learning outcomes, by the task of growth and ...development which is reflected in the selection of life skills. Generating motivation regarding the learning material will increase the interest of learners in participating in learning. Giving attention to learning can be through learning media. Learning media is very developed and continues to increase due to the use of computer information technology. The purpose of this project was to examine how HTML may be used to enable interactive learning in Linear Algebra courses. The experimental research method employed in this study was the One Group Pre-test Post-test Design. Data analysis in this study used the Wilcoxon test because the data were not normally distributed. The results indicate that student scores significantly improved between the pre-test and post-test. Interactive media affect online learning today, so it is hoped that teachers/lecturers will use interactive media more in learning.
Dalam era globalisasi, teknologi informasi berkembang sangat pesat yang meranah pada seluruh aspek kehidupan mulai dari instansi pendidikan, perusahaan, pemerintahan, bahkan kehidupan sehari-hari. ...Dengan pemanfaatan teknologi tersebut, maka informasi dapat disajikan secara cepat dan efisien. Salah satunya dengan adanya aplikasi panduan haji mandiri ini yang dibuat dengan tujuan untuk membantu kaum muslim dalam medapatkan informasi mengenai teknis dari perjalanan haji yang dapat diakses secara efektif dan efisien. Pada Balai Penelitian dan Pengembangan Agama semarang untuk panduan teknis perjalanan haji ini masih menggunakan buku cetak. Cara yang demikian dirasa kurang begitu efektif karena akan merepotkan apabila dalam keadaan mendesak. Dengan adanya permasalahan tersebut, maka dibuatlah aplikasi panduan haji mandiri berbasis android ini. Aplikasi ini dibuat dengan menggunakan metode waterfall, karena dengan menggunakan metode waterfall tahapan-tahapan pembuatannya berurutan mulai dari analisa kebutuhan, desain, pembuatan kode program, pengujian dan pemeliharaan. Sistem kerja yang berjalan pada aplikasi ini akan menampilkan materi-materi yang berkaitan dengan teknis dari perjalanan haji tersebut. Tujuan dari penelitian ini yaitu terbuatnya sebuah perancangan aplikasi panduan haji mandiri berbasis android mulai dari use case, activity diagram, dan desain tampilan aplikasi.
Electronic documents are becoming increasingly popular in various industries and sectors as they provide greater convenience and cost-efficiency than physical documents. PDF is a widely used format ...for creating and sharing electronic documents, while HTML is commonly used in mobile environments as the foundation for creating web pages displayed on mobile devices, such as smartphones and tablets. HTML is becoming a more critical document format as mobile environments have been raised as the primary communication channel nowadays. However, HTML does not have the standard content integrity feature, and an electronic document based on HTML consists of a set of related files. Therefore, it has a vulnerability in terms of reliable electronic documents. We have proposed Document HTML, a single independent file with extended meta tags, to be a reliable electronic document and Chained Document, a single independent file with a blockchain network to secure content integrity and delivery assurance. In this paper, we improved the definition of Document HTML and researched certified electronic document intermediaries. Additionally, we designed and validated the electronic document distribution service using Enhanced Document HTML for real usability. Moreover, we conducted experimental verification using a tax notification electronic document, which has one of the top distribution volumes in Korea, to confirm how Document HTML provides a content integrity verification feature. Document HTML can be used in an enterprise that must send a reliable electronic document to a customer with an electronic document delivery service provider.
PESummary is a Python software package for processing and visualizing data from any parameter estimation code. The easy to use Python executable scripts and extensive online documentation has ...resulted in PESummary becoming a key component in the international gravitational-wave analysis toolkit. PESummary has been developed to be more than just a post-processing tool with all outputs fully self-contained. PESummary has become central to making gravitational-wave inference analysis open and easily reproducible.
The R package compareGroups provides functions meant to facilitate the construction of bivariate tables (descriptives of several variables for comparison between groups) and generates reports in ...several formats (LATEX, HTML or plain text CSV). Moreover, bivariate tables can be viewed directly on the R console in a nice format. A graphical user interface (GUI) has been implemented to build the bivariate tables more easily for those users who are not familiar with the R software. Some new functions and methods have been incorporated in the newest version of the compareGroups package (version 1.x) to deal with time-to-event variables, stratifying tables, merging several tables, and revising the statistical methods used. The GUI interface also has been improved, making it much easier and more intuitive to set the inputs for building the bivariate tables. The ?rst version (version 0.x) and this version were presented at the 2010 useR! conference (Sanz, Subirana, and Vila 2010) and the 2011 useR! conference (Sanz, Subirana, and Vila 2011), respectively. Package compareGroups is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=compareGroups.
Extracting data from user-friendly HTML tables is difficult because of their different layouts, formats, and encoding problems. In this article, we present a new proposal that first applies several ...pre-processing heuristics to clean the tables, then performs functional analysis, and finally applies some post-processing heuristics to produce the output. Our most important contribution is regarding functional analysis, which we address by projecting the cells onto a high-dimensional feature space in which a standard clustering technique is used to make the meta-data cells apart from the data cells. We experimented with two large repositories of real-world HTML tables and our results confirm that our proposal can extract data from them with an F1 score of 89.50% in just 0.09 CPU seconds per table. We confronted our proposal with several competitors and the statistical analysis confirmed its superiority in terms of effectiveness, while it keeps very competitive in terms of efficiency.
Data collection has become a necessity today, especially since many sources of data on the internet can be used for various needs. The main activity in data collection is collecting quality ...information that can be analyzed and used to support decisions or provide evidence. The process of retrieving data from the internet is also known as web scraping. There are various methods of web scraping that are commonly used. The amount of data scattered on the internet will be quite time-consuming if the web scraping is done on a large scale. By applying the parallel concept, the multi-processing approach can help complete a job. This study aimed to determine the performance of the web scraping method with the application of multi-processing. Testing is done by doing the process of scraping data from a predetermined target web. Four web scraping methods: CSS Selector, HTML DOM, Regex, and XPath, were selected to be used in the experiment measured based on the parameters of CPU usage, memory usage, execution time, and bandwidth usage. Based on experimental data, the Regex method has the least CPU and memory usage compared to other methods. While XPath requires the least time compared to other methods. The CSS Selector method is the smallest in terms of bandwidth usage compared to other methods. The application of multi-processing techniques to each web scraping method is proven to save memory usage, reduce execution time and reduce bandwidth usage compared to only using single processing.