Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: find-in-text returning a string instead of a zip-loc #89

Merged
merged 2 commits into from
Oct 16, 2024

Conversation

Mertzenich
Copy link
Contributor

@Mertzenich Mertzenich commented Mar 2, 2024

Find in Text Selector Return Value

This pull request aims to fix #83, caused due to the find-in-text selector
returning a string rather than a zip-loc as expected. I have also added
a test assertion which checks for the expected functionality.

Problem Description

When using the find-in-text selector in the last position of a child
selector no result will be found. Expected behavior is restored when you
wrap find-in-text with the and selector. Here is an example of the
problem:

(require '[hickory.core :as h]
         '[hickory.select :as s])

(def html "<div><span>Apocalypse</span></div>")

(def htree (-> html
               h/parse
               h/as-hickory))

(s/select
 (s/child (s/tag :div)
          (s/find-in-text #"Apocalypse"))
 htree)
;; Expected Result => [{:type :element
;;                      :attrs nil
;;                      :tag :span
;;                      :content ["Apocalypse"]}]
;; Actual Result   => []

(s/select
 (s/child (s/tag :div)
          ;; Wrapping `find-in-text` in `and`
          (s/and (s/find-in-text #"Apocalypse")))
 htree)
;; We get the expected result:
;; => [{:type :element
;;      :attrs nil
;;      :tag :span
;;      :content ["Apocalypse"]}]

Solution

The current version of find-in-text is supposed to return a function that takes
a zip-loc and returns a zip-loc if the node contains a text block matching a
provided regular expression. The current body of the function is as follows (minor
readability changes):

(defn find-in-text
  [re]
  (fn [hzip-loc]
    (some #(re-find re %)
          (->> (zip/node hzip-loc)
               :content
               (filter string?)))))

The issue is in the return value of the returned function. Running
(some #(re-find re %) ...) will return the first string matching the regex
rather than the zip-loc. For example:

(some #(re-find #"Luke" %)
      ["Matthew" "Mark" "Luke" "John"])
;; => "Luke"

The solution was quite simple: instead of returning the result of (some ...) we
check whether the result returned anything and if it did we then can return the
provided zip-loc. The result is as follows:

(defn find-in-text
  [re]
  (fn [hzip-loc]
    (when (some #(re-find re %)
                (->> (zip/node hzip-loc)
                     :content
                     (filter string?)))
      hzip-loc)))

@Mertzenich Mertzenich force-pushed the find-in-text-last-argument branch 2 times, most recently from 2677840 to d5fea4b Compare March 9, 2024 05:01
@Mertzenich Mertzenich changed the title Fix find-in-text returning a string instead of a zip-loc fix: find-in-text returning a string instead of a zip-loc Mar 22, 2024
@slipset slipset merged commit 93b40d1 into clj-commons:master Oct 16, 2024
1 check passed
@Mertzenich Mertzenich deleted the find-in-text-last-argument branch October 16, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

find-in-text selector does not seem to work when it's the last argument of a child selector
2 participants