Skip to content

Commit

Permalink
0.9.2 | Search Improvements
Browse files Browse the repository at this point in the history
Changelogs:

* chore: made consistent versioning

* fix: do not send data when both title and value is None

* feat: donot show result if score=0

* fix: fixed unittests
  • Loading branch information
AnsahMohammad authored May 29, 2024
1 parent 7a79d37 commit f88ac3d
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 12 deletions.
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,18 @@ This project is licensed under the terms of the Apache License. See the LICENSE

## Development and Maintanence

### Bump 0.9.1
### 0.9.2

- [X] restrict send to db if EMPTY title and content
- [X] Do not show result when score is 0

### 0.9.1

- [X] Error handling
- [X] Consistency in logs
- [X] Local db enable

### Bump 0.10+
### 0.10+

- [ ] Distributed query processing
- [ ] Caching locally
Expand All @@ -104,35 +109,35 @@ This project is licensed under the terms of the Apache License. See the LICENSE
- [ ] Use unified crawler system in master-slave arch
- [ ] Create Storage abstraction classes for local and remote client

### Bump 9
### 0.9

- [X] TF-idf only on title
- [X] Better similarity measure on content
- [X] Generalize Storage Class

### Bump 8
### 0.8

- [X] Optimize the deployment
- [X] Remove the nltk processing
- [X] Refactor the codebase
- [X] Migrate from local_db to cloud Phase-1
- [X] Optimize the user interface

### Bump 7
### 0.7

- [X] Replace content with meta data (perhaps?)
- [X] Extract background worker sites from env
- [X] AI support Beta
- [X] Template optimizations

### Bump 6
### 0.6

- [ ] Extract timestamp and sort accordingly
- [X] Remote crawler service (use background workers)
- [X] Analyze the extractable metadata
- [X] Error Logger to supabase for analytics

### Bump 5-
### 0.5-

- [X] Don't download everytime query engine is started
- [ ] Crawler doesn't follow the schema of remote_db
Expand Down
7 changes: 7 additions & 0 deletions phantom/core/query_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ def __init__(self, filename="indexed", title_path=None):

self.CONTENT_WEIGHT = int(os.environ.get("CONTENT_WEIGHT", 1))
self.TITLE_WEIGHT = int(os.environ.get("TITLE_WEIGHT", 2))
self.TIME_WEIGHT = int(os.environ.get("TIME_WEIGHT", 5))

self.data = {}
self.load(filename)
Expand Down Expand Up @@ -84,6 +85,12 @@ def query(self, query, count=10):
# Return the top n results
final_results = []
for doc, score in ranked_docs[:count]:

# send only if score > 0
# Hence may get results < count
if score == 0:
continue

try:
title = self.titles[doc] if self.title_table else doc
final_results.append((doc, score, title))
Expand Down
5 changes: 4 additions & 1 deletion phantom/utils/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,10 @@ def __init__(self, table_name="index", resume=False, remote_db=True):
print("DB Ready")

def add(self, key, value, title=None):
# TODO: Genaralize the function to accept any table name
# TODO: Generalize the function to accept any table name
if not value and not title:
return False

if self.remote_db:
try:
data, count = (
Expand Down
8 changes: 4 additions & 4 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
class TestLogger(unittest.TestCase):
def test_log(self):
logger = Logger(show_logs=True, author="John")
logger.log("Test log", param1="value1", param2="value2")
expected_log = f"{time.strftime('%H:%M:%S')} : John : Test log | {{'param1': 'value1', 'param2': 'value2'}}"
logger.log("Test log", param1="value1", param2="value2", origin="test")
expected_log = f"{time.strftime('%H:%M:%S')} : John-test : Test log | {{'param1': 'value1', 'param2': 'value2'}}"
self.assertIn(expected_log, logger.logs)

def test_save(self):
Expand All @@ -23,8 +23,8 @@ def test_save(self):
with open("test_logs.txt", "r") as f:
saved_logs = f.readlines()
expected_logs = [
f"{time.strftime('%H:%M:%S')} : Jane : Test log 1 | {{'param1': 'value1'}}\n",
f"{time.strftime('%H:%M:%S')} : Jane : Test log 2 | {{'param2': 'value2'}}\n",
f"{time.strftime('%H:%M:%S')} : Jane- : Test log 1 | {{'param1': 'value1'}}\n",
f"{time.strftime('%H:%M:%S')} : Jane- : Test log 2 | {{'param2': 'value2'}}\n",
]
self.assertEqual(saved_logs, expected_logs)

Expand Down

0 comments on commit f88ac3d

Please sign in to comment.