research.html

<!DOCTYPE html>
<html>
<head>
    <title>A. Molina CVC Page</title>
    <link rel="stylesheet" href="./static/index_whimsical.css">
</head>
<body>
    <div id="header">
        <h2>A. Molina Research Papers</h2>
        <nav>
            <a href="./index.html">Home</a>
        </nav>
    </div>

    <div class="section" id="photo">
        <h3>Historical Photography Management</h3>
        <div class="subsection" id="date-estimation-paper">
            <h4>Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach </h4>
            <p class="abstract">
                Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack the ability to generalize from one dataset to others. This paper presents a robust date estimation system based on a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images.
            </p>
            <a href="https://link.springer.com/chapter/10.1007/978-3-030-86331-9_20">
                <button class="paper-button">Read Paper</button>
            </a>
        </div>


        <div class="subsection" id="date-estimation-paper">
            <h4>The Role of Generative Systems in Historical Photography Management: A Case Study on Catalan Archives</h4>
            <p class="abstract">
                The use of image analysis in automated photography management is an increasing trend in heritage institutions. Such tools alleviate the human cost associated with the manual and expensive annotation of new data sources while facilitating fast access to the citizenship through online indexes and search engines. However, available tagging and description tools are usually designed around modern photographs in English, neglecting historical corpora in minoritized languages, each of which exhibits intrinsic particularities. The primary objective of this research is to study the quantitative contribution of generative systems in the description of historical sources. This is done by contextualizing the task of captioning historical photographs from the Catalan archives as a case study. Our findings provide practitioners with tools and directions on transfer learning for captioning models based on visual adaptation and linguistic proximity.             </p>
            <a href="https://arxiv.org/abs/2409.03911">
                <button class="paper-button">Read Paper</button>
            </a>
        </div>


        <div class="subsection" id="date-estimation-paper">
            <h4>A Transformer-Based Object-Centric Approach for Date Estimation of Historical Photographs</h4>
            <p class="abstract">
                The accurate estimation of the creation date of cultural heritage photographic assets is a challenging and complex task, typically requiring the expertise of qualified archivists, with significant implications for archival and preservation purposes. This paper introduces a new dataset for image date estimation, which complements existing datasets, thus creating a more balanced and realistic training set for deep learning models. On this dataset, we present a set of modern strong baselines that outperform previous state-of-the-art methods for this task. Additionally, we propose a novel approach that leverages “dating indicators” or “dating clues” through object detection and a self-attention based Transformer encoder. Our experiments demonstrate that the proposed approach has promising applicability in real scenarios and that incorporating “dating indicators” through object detection can improve the performance of image date estimation models.     
                </p>
                    <a href="https://link.springer.com/chapter/10.1007/978-3-031-56063-7_9">
    
                    <button class="paper-button">Read Paper</button>
            </a>
        </div>

    </div>
    <div class="section" id="document-understanding">
        <h3>Document Understanding</h3>
        <div class="subsection" id="document-understanding">
            <h4>Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval</h4>
            <p class="abstract">
                This paper introduces Fetch-A-Set (FAS), a comprehensive benchmark tailored for legislative historical document analysis systems, addressing the challenges of large-scale document retrieval in historical contexts. The benchmark comprises a vast repository of documents dating back to the XVII century, serving both as a training resource and an evaluation benchmark for retrieval systems. It fills a critical gap in the literature by focusing on complex extractive tasks within the domain of cultural heritage. The proposed benchmark tackles the multifaceted problem of historical document analysis, including text-to-image retrieval for queries and image-to-text topic extraction from document fragments, all while accommodating varying levels of document legibility. This benchmark aims to spur advancements in the field by providing baselines and data for the development and evaluation of robust historical document retrieval systems, particularly in scenarios characterized by wide historical spectrum.
            </p>
            <a href="https://link.springer.com/chapter/10.1007/978-3-031-70442-0_21">
                <button class="paper-button">Read Paper</button>
            </a>
          </div>
        <div class="subsection" id="learning-to-rank-words-paper">
            <h4>Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting</h4>
            <p class="abstract">
                In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work.
            </p>
            <a href="https://link.springer.com/chapter/10.1007/978-3-030-86331-9_25">
                <button class="paper-button">Read Paper</button>
            </a>
        </div>
        <div class="subsection" id="generic-image-retrieval-paper">
            <h4>A Generic Image Retrieval Method for Date Estimation of Historical Document Collections </h4>
            <p class="abstract">
                Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack the ability to generalize from one dataset to others. This paper presents a robust date estimation system based on a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images.
            </p>
            <a href="https://link.springer.com/chapter/10.1007/978-3-031-06555-2_39">
                <button class="paper-button">Read Paper</button>
            </a>
        </div>


        <div class="subsection" id="generic-image-retrieval-paper">
            <h4> Structured Analysis of Alphabets in Historical Handwritten Ciphers</h4>

            <p class="abstract">

                Historical ciphered manuscripts are documents that were typically used in sensitive communications within military and diplomatic contexts or among members of secret societies. These secret messages were concealed by inventing a method of writing employing symbols from diverse sources such as digits, alchemy signs and Latin or Greek characters. When studying a new, unseen cipher, the automatic search and grouping of ciphers with a similar alphabet can aid the scholar in its transcription and cryptanalysis because it indicates a probability that the underlying cipher is similar. In this study, we address this need by proposing the CSI metric, a novel way of comparing pairs of ciphered documents. We assess their effectiveness in an unsupervised clustering scenario utilising visual features, including SIFT, pre-trained learnt embeddings, and OCR descriptors.

            </p>
            <a href="https://arxiv.org/abs/2410.21913">
                <button class="paper-button">Read Paper</button>
            </a>
        </div>

        <div class="subsection" id="generic-image-retrieval-paper">
            <h4> Faster extraction of matrimonial advertisements from digital archives using a signal processing pipeline: a case study on a 20th-Century Spanish newspaper</h4>

            <p class="abstract">

                With the exponential growth of digitized historical materials, historians and social scientists face the daunting task of navigating vast online collections. While online archives do provide search tools to query their materials, these usually do not meet the scholars’ need to trace and store all relevant materials from the online archive. This case study shows how computer vision methods, and more specifically a signal processing approach, can be used to identify, classify and extract relevant information items from a vast historical data collection. More specifically, this study reports the results of the construction of a data pipeline that extracts matrimonial advertisements from the digitized Catalan newspaper La Vanguardia. Matrimonial advertisements can provide genuine insights into the evolution of partner preferences over time, but they are hard to collect as they are scattered over millions of digitized historical newspapers and magazines. Moreover, to study variation in partner preferences by, for example, sex, social class, matrimonial status and time period, it is necessary to store the data into a database. The pipeline that we have constructed extracts matrimonial advertisements in a stepwise fashion, encompassing identification, through binarisation and segmentation, and classification based on Optical Character Recognition. By ways of a comprehensive evaluation, both qualitatively and quantitatively, the efficacy of the pipeline for the extraction of matrimonial advertisements is demonstrated. The findings not only underscore the viability of the signal processing approach but also underscore its potential for advancing research in historical demography, family history, as well as economic history, as similar pipelines can be set up to extract other relevant newspaper items, such as, marriage, birth, death and moving announcements, job vacancies or business announcements from digitized source collections.
            </p>
            <a href="https://doi.org/10.1080/1081602X.2024.2416946">
                <button class="paper-button">Read Paper</button>
            </a>
        </div>
    </div>
    <div id="contact">
        <h2>Contact</h2>
        <ul>
            <li><a href="mailto:amolina@cvc.uab.cat?Subject=[ASIGNATURA]+ASUMPTE">Mail: amolina@cvc.uab.cat</a></li>
            <li><a href="https://www.linkedin.com/in/adri%C3%A0-molina-927865174/">Linked-In: Adrià Molina Rodríguez</a></li>
            <li><a href="https://orcid.org/0000-0003-0167-8756">Orcid: 0000-0003-0167-8756 (perfil de recerca)</a></li>
        </ul>
    </div>

</body>