You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when i use selenium get the "page_source", and find the elements by pyquery, not work; but when i use "doc = pq(url='https://xxxxx')" directly, it works well. codes below:
part one:
from pyquery import PyQuery as pq
doc = pq(url='https://search.jd.com/Search?keyword=%E7%A9%BA%E6%B0%94%E5%87%80%E5%8C%96%E5%99%A8&enc=utf-8&suggest=1.def.0.V18&wq=kongqijingh&pvid=60c4120a5787482e8337c64c2fd4184d')
for item in doc('.gl-i-wrap').items():
price = item('.p-price strong i').text()
print('price:', price)
works well!
part two:
html = self.driver.page_source
doc = pq(html)
for item in doc('.gl-i-wrap').items():
price = item('.p-price strong i').text()
print('price:', price)
not work!
The text was updated successfully, but these errors were encountered:
This issue affects me too. Try print the first 200 characters of page_source, then remove the attribute of <html>. In my case, I have to do this for CSS selectors to work while I am scrapping Facebook WAP.
html = b.page_source.replace('<html xmlns="http://www.w3.org/1999/xhtml">', '<html>')
doc = pq(html)
when i use selenium get the "page_source", and find the elements by pyquery, not work; but when i use "doc = pq(url='https://xxxxx')" directly, it works well. codes below:
part one:
works well!
part two:
not work!
The text was updated successfully, but these errors were encountered: