-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: initial commit * chore: cleanup * chore: reorg * feat: final and cleanup * chore: final touchups * feat: added build script * style: style fix * feat: updated readme and moved to src * feat: search engine added * feat: finsihed search Engine * chore: search engine Done
- Loading branch information
1 parent
b08d233
commit af4294f
Showing
18 changed files
with
402 additions
and
384 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -164,3 +164,5 @@ cython_debug/ | |
logs.txt | ||
index.json | ||
indexed.json | ||
titles.json | ||
.archive |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,20 @@ | ||
# Phantom | ||
Distributed Crawler Indexing Engine | ||
# Phantom Search | ||
Light weight python based search engine | ||
|
||
## Set-up | ||
1) open `crawl.sh` and update the parameters | ||
|
||
```shell | ||
python phantom.py --num_threads 8 --urls "site1.com" "site2.com" | ||
``` | ||
2) now run crawl.sh by typing | ||
```shell | ||
./crawl.sh | ||
``` | ||
This crawls the web and saves indices into `index.json` file | ||
|
||
3) run `build.sh` to Process the indices and run the `Query Engine` | ||
|
||
4) now everytime you can start the query engine by running the file `query_engine.py` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
source .env/bin/activate | ||
|
||
pip install -r requirements.txt | ||
clear | ||
echo "Installation done" | ||
python3 -m src.phantom_indexing | ||
echo "Phantom Processing done" | ||
clear | ||
python3 -m src.query_engine |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,5 @@ | ||
python3 -m venv .env | ||
source .env/bin/activate | ||
|
||
cd phantom_crawler | ||
pip install -r requirements.txt | ||
python3 phantom_engine.py | ||
python3 -m src.phantom --num_threads 10 --urls "https://www.geeksforgeeks.org/" "https://stackoverflow.com/questions" --show_logs True --print_logs True --sleep 60 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
from flask import Flask, render_template, request | ||
from src.query_engine import Phantom_Query | ||
from src.phantom_engine import Parser | ||
|
||
app = Flask(__name__) | ||
engine = Phantom_Query("src/indexed.json", titles="src/titles.json") | ||
parser = Parser() | ||
|
||
@app.route('/', methods=['GET', 'POST']) | ||
def home(): | ||
input_text = "" | ||
if request.method == 'POST': | ||
input_text = request.form.get('input_text') | ||
result = process_input(input_text) | ||
return render_template('result.html', result=result, input_text=input_text) | ||
return render_template('home.html', input_text=input_text) | ||
|
||
def process_input(input_text): | ||
result = engine.query(input_text, count=20) | ||
#(doc, score, title) | ||
print("results ; \n\n") | ||
print(result) | ||
return result | ||
|
||
if __name__ == '__main__': | ||
app.run() |
Oops, something went wrong.