In this paper, we present a stacking model to detect phishing webpages using URL and HTML features. In terms of features, we design lightweight URL and HTML features and introduce HTML string ...embedding without using the third-party services, making it possible to develop real-time detection applications. Furthermore, we devise a stacking model by combining GBDT, XGBoost and LightGBM in multiple layers, which enables different models to be complementary, thus improving the performance on phishing webpage detection. In particular, we collect two real-world datasets for evaluations, named as 50K-PD and 50K-IPD, respectively. 50K-PD contains 49,947 webpages with URLs and HTML codes. 50K-IPD contains 53,103 webpages with screenshots in addition to URLs and HTML codes. The proposed approach outperforms quite a few machine learning models on multiple metrics, achieving 97.30% on accuracy, 4.46% on missing alarm rate, and 1.61% on false alarm rate on 50K-PD dataset. On 50K-IPD dataset, the proposed approach achieves 98.60% on accuracy, 1.28% on missing alarm rate, and 1.54% on false alarm rate.
•A real-time phishing webpage detection system that can be used to protect users from phishing attacks is proposed.•HTML string embedding is proposed for extracting features from HTML code automatically.•Stacking model combines multiple machine learning models for better performance.
ChIPseeker is an R package for annotating ChIP-seq data analysis. It supports annotating ChIP peaks and provides functions to visualize ChIP peaks coverage over chromosomes and profiles of peaks ...binding to TSS regions. Comparison of ChIP peak profiles and annotation are also supported. Moreover, it supports evaluating significant overlap among ChIP-seq datasets. Currently, ChIPseeker contains 15 000 bed file information from GEO database. These datasets can be downloaded and compare with user's own data to explore significant overlap datasets for inferring co-regulation or transcription factor complex for further investigation.
ChIPseeker is released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/ChIPseeker.html).
The HTML and CSS Workshop equips you with the practical knowledge to create modern responsive websites. From mastering simple HTML markup and CSS tags, through to integrating media queries and ...animations to create a rich, engaging user experience, you'll build your skills with the help of hands-on examples and activities.
Education or learning is essentially a process of optimizing a child's potential towards the achievement of certain abilities as a standard for learning outcomes, by the task of growth and ...development which is reflected in the selection of life skills. Generating motivation regarding the learning material will increase the interest of learners in participating in learning. Giving attention to learning can be through learning media. Learning media is very developed and continues to increase due to the use of computer information technology. The purpose of this project was to examine how HTML may be used to enable interactive learning in Linear Algebra courses. The experimental research method employed in this study was the One Group Pre-test Post-test Design. Data analysis in this study used the Wilcoxon test because the data were not normally distributed. The results indicate that student scores significantly improved between the pre-test and post-test. Interactive media affect online learning today, so it is hoped that teachers/lecturers will use interactive media more in learning.
Dalam era globalisasi, teknologi informasi berkembang sangat pesat yang meranah pada seluruh aspek kehidupan mulai dari instansi pendidikan, perusahaan, pemerintahan, bahkan kehidupan sehari-hari. ...Dengan pemanfaatan teknologi tersebut, maka informasi dapat disajikan secara cepat dan efisien. Salah satunya dengan adanya aplikasi panduan haji mandiri ini yang dibuat dengan tujuan untuk membantu kaum muslim dalam medapatkan informasi mengenai teknis dari perjalanan haji yang dapat diakses secara efektif dan efisien. Pada Balai Penelitian dan Pengembangan Agama semarang untuk panduan teknis perjalanan haji ini masih menggunakan buku cetak. Cara yang demikian dirasa kurang begitu efektif karena akan merepotkan apabila dalam keadaan mendesak. Dengan adanya permasalahan tersebut, maka dibuatlah aplikasi panduan haji mandiri berbasis android ini. Aplikasi ini dibuat dengan menggunakan metode waterfall, karena dengan menggunakan metode waterfall tahapan-tahapan pembuatannya berurutan mulai dari analisa kebutuhan, desain, pembuatan kode program, pengujian dan pemeliharaan. Sistem kerja yang berjalan pada aplikasi ini akan menampilkan materi-materi yang berkaitan dengan teknis dari perjalanan haji tersebut. Tujuan dari penelitian ini yaitu terbuatnya sebuah perancangan aplikasi panduan haji mandiri berbasis android mulai dari use case, activity diagram, dan desain tampilan aplikasi.
PESummary is a Python software package for processing and visualizing data from any parameter estimation code. The easy to use Python executable scripts and extensive online documentation has ...resulted in PESummary becoming a key component in the international gravitational-wave analysis toolkit. PESummary has been developed to be more than just a post-processing tool with all outputs fully self-contained. PESummary has become central to making gravitational-wave inference analysis open and easily reproducible.
Extracting data from user-friendly HTML tables is difficult because of their different layouts, formats, and encoding problems. In this article, we present a new proposal that first applies several ...pre-processing heuristics to clean the tables, then performs functional analysis, and finally applies some post-processing heuristics to produce the output. Our most important contribution is regarding functional analysis, which we address by projecting the cells onto a high-dimensional feature space in which a standard clustering technique is used to make the meta-data cells apart from the data cells. We experimented with two large repositories of real-world HTML tables and our results confirm that our proposal can extract data from them with an F1 score of 89.50% in just 0.09 CPU seconds per table. We confronted our proposal with several competitors and the statistical analysis confirmed its superiority in terms of effectiveness, while it keeps very competitive in terms of efficiency.
The R package compareGroups provides functions meant to facilitate the construction of bivariate tables (descriptives of several variables for comparison between groups) and generates reports in ...several formats (LATEX, HTML or plain text CSV). Moreover, bivariate tables can be viewed directly on the R console in a nice format. A graphical user interface (GUI) has been implemented to build the bivariate tables more easily for those users who are not familiar with the R software. Some new functions and methods have been incorporated in the newest version of the compareGroups package (version 1.x) to deal with time-to-event variables, stratifying tables, merging several tables, and revising the statistical methods used. The GUI interface also has been improved, making it much easier and more intuitive to set the inputs for building the bivariate tables. The ?rst version (version 0.x) and this version were presented at the 2010 useR! conference (Sanz, Subirana, and Vila 2010) and the 2011 useR! conference (Sanz, Subirana, and Vila 2011), respectively. Package compareGroups is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=compareGroups.
Data collection has become a necessity today, especially since many sources of data on the internet can be used for various needs. The main activity in data collection is collecting quality ...information that can be analyzed and used to support decisions or provide evidence. The process of retrieving data from the internet is also known as web scraping. There are various methods of web scraping that are commonly used. The amount of data scattered on the internet will be quite time-consuming if the web scraping is done on a large scale. By applying the parallel concept, the multi-processing approach can help complete a job. This study aimed to determine the performance of the web scraping method with the application of multi-processing. Testing is done by doing the process of scraping data from a predetermined target web. Four web scraping methods: CSS Selector, HTML DOM, Regex, and XPath, were selected to be used in the experiment measured based on the parameters of CPU usage, memory usage, execution time, and bandwidth usage. Based on experimental data, the Regex method has the least CPU and memory usage compared to other methods. While XPath requires the least time compared to other methods. The CSS Selector method is the smallest in terms of bandwidth usage compared to other methods. The application of multi-processing techniques to each web scraping method is proven to save memory usage, reduce execution time and reduce bandwidth usage compared to only using single processing.
Pada halaman web, tabel adalah bagian penting dari masalah yang dijelaskan dalam sebuah artikel. Tabel yang terdapat pada halaman web berbeda dari tabel dalam basis data. Tabel di halaman web ...cenderung tidak memiliki aturan atau bentuk standar. Salah satu bentuk tabel yang tidak standar pada halaman web adalah column-row wise. Penelitian ini menawarkan pendekatan untuk mengekstraksi isi tabel sedemikian sehingga arti dari keterkaitan antara dua atribut dan data dalam tabel column-row wise tidak hilang. Data yang diekstrak disimpan ke dalam basis data yang membentuk tiga tabel, yaitu tabel yang menyimpan atribut pertama, tabel yang menyimpan atribut kedua, dan tabel yang menyimpan atribut pertama, kedua, dan data dari atribut pertama dan kedua. Penelitian ini menghasilkan sebuah algoritma untuk mengekstrak data dari tabel yang berbentuk column-row wise pada sebuah halaman web. Algoritma yang dihasilkan dari penelitian ini diharapkan dapat diimplementasikan dalam berbagai bahasa pemrograman. Untuk pengujian, algoritma telah diimplementasikan dengan Bahasa pemrograman Python dan berhasil melakukan ekstraksi tabel dan menyimpannya dalam basis data.
Abstract
Tables are an important part of a web page. The table contains tabulations of data or information that you want to convey from the web page. This data tabulation can be used for comparisons with similar tables or as a trigger for action. However, tables on web pages are independent of webpage makers. There is no standard form or layout for a table on a web page. One of the table layouts on a web page is column-row wise. This study offers an approach for extracting table contents such that the meaning of the linkage between two attributes and a data in the column-row wise table is not disappeared. The extracted data is stored into a database that forms three tables, ie the table that stores the first attribute, the table that stores the second attribute, and the table that stores the first, second, and second attributes of the two attributes. Output of this research is an algorithm to extract data of column-row wise table in a web page. The algorithm generated from this research is expected to be implemented in various programming languages. For testing, the algorithm is implemented in Python and success to extract table and save the data into database. Cyclomatic complexity number of the proposed algorithm is 12. This means that the complexity of the proposed algorithm is still high.