-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathsample_async.py
128 lines (96 loc) · 4.8 KB
/
sample_async.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import sys
from functools import partial
from hlspy.hls import *
def hello(i,web,url):
"""
Do something with the task
"""
print('hello completed {0}:{1}'.format(i,url))
obj = web[i]
#print(obj.getcookie_string()) #uncomment to print cookie
#print(obj.gethtml()) #uncomment to print html
obj.get_window_object().close() #close widget
if __name__ == "__main__":
app = QtWidgets.QApplication(sys.argv)
url_arr = [
'https://en.wikipedia.org','https://duckduckgo.com',
'https://www.google.com'
]
web_arr = []
for i,url in enumerate(url_arr):
web_obj = BrowseUrlT(
url,out_file=False,quit_now=False,show_window=True,window_dim='500',
js_file='console.log("javascript hello world")')
web_arr.append(web_obj)
web_arr[len(web_arr)-1].loadFinished.connect(partial(hello,i,web_arr,url))
web_arr[len(web_arr)-1].start_loading()
ret = app.exec_()
sys.exit(ret)
"""
In GUI mode, When BrowseUrlT is initiated with show_window=True,
then asynchronous code works properly and all widgets are closed
accordingly one by one as per get_window_object().close() is executed.
Users can use window_dim='min' as an optional argument, to minimize all
widgets. In a complete headless environment, users need to use xvfb and
should run the application by prefixing
'xvfb-run --server-args="-screen 0 640x480x16"' to the command of
the application.
In GUI mode if user wants to run headlessly, then they should set
'show_window=False', however it has some problems. In GUI mode,
while running headlessly, once a task completes and user uses
get_window_object().close() then program quits without waiting for
completion of other tasks. Therefore, users need to be careful
and should use the close() method only once in this case and that too only
after all tasks have been completed so as to free up memory. Alternately user
can use 'timeout' field and leave closing of widgets to library, but in this
case also all widgets are closed with the closing of a single widget in GUI
mode. Therefore, users need to allot larger timeout value within which all tasks
can be completed, in case show_window=False is used.
List of Arguments to BrowseUrlT
1. quit_now=False (It is mandatory when using as library. Command line
version of the Program has been designed in such a way that it should quit
once task completes, therefore in order to change the default behaviour
while using it as a library quit_now needs to be set to False, so that
program will continue with the next task asynchronously)
2. set_cookie=COOKIE_FILE_NAME (Absolute path)
3. use_cookie=COOKIE_FILE_NAME (Absolute path)
4. end_point=COOKIE_ID_NAME (Program will wait till this cookie id appears)
5. domain_name=DOMAIN_NAME_WHOSE_COOKIE_WILL_BE_FETCHED
6. user_agent=USER_AGENT_STRING
7. tmp_dir=TEMP_DIR (for storing temporary data such as cache)
8. js_file=JS_FILE (absolute path of the javascript file that will be
executed last once loading of original page has been finished.
Instead of file name, users can also supply some javascript string
directly.)
9. out_file=None,False or absolute path of file name (By default html
output will be displayed on terminal. If absolute
path of file name is supplied then output will be dumped into that
file. If False is used the output will not be showed on terminal and
won't be dumped in any file)
10. wait_for_cookie=None or True (don't quit till cookie is obtained,
should be used in conjuction with the field end_point)
11. print_request=None or True (print requested resource urls on terminal
in realtime)
12. print_cookies=None or True (print cookies from all domains on terminal
in realtime)
13. timeout=IN_SECONDS (wait for this many seconds before closing, once
loading has finished. It won't be of use if user manually closes the
widget using command get_window_object().close() before timeout)
14. block_request= (comma separated list of resources to be blocked.
If requested url contain any of these substring then it will be blocked
e.g.: block_request=.jpg,.png,.css,.ads)
15. default_block=None or True (enables default simple adblock for headless
browsing)
16. select_request= (print only particular request on terminal, currently
does not support more than one field.
e.g.: select_resource=.css )
17. show_window=True or False (default is False)
18. window_dim=wxh,min,max (e.g.: window_dim=800x600, window_dim=400,
window_dim=min or window_dim=max)
19. grab_window=file_name (get screenshot of page and save as file_name.
file name should be absolute path)
20. print_pdf=file_name (convert page to pdf and save as file_name.
File name should absolute path)
Note: while using application as command-line file path can be relative
or absolute, but while using as library all file paths must be absolute.
"""