Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
  • Building Python-Based Topol...
    Martínez-Castaño, Rodrigo; Pichel, Juan C.; Losada, David E.

    Proceedings of the 5th Spanish Conference on Information Retrieval, 06/2018
    Conference Proceeding

    In this paper we propose a streaming approach for real-time processing of huge amounts of data. CATENAE is a library for easy building and execution of Python topologies (e.g., web crawler, classifier). Topologies are designed for their deployment inside Docker containers and, thus, horizontal scaling, granular resource assignment and isolation can be achieved easily. Furthermore, micromodules can have its own dependencies (including the Python version), allowing the user to limit resources such as CPU or memory by instance. We describe an implementation of a use case composed of two topologies: (1) a crawler for tracking users in social media and (2) an early risk detector of depression. We also explain how CATENAE topologies can be connected to non-Python systems.