Skip to content

Commit

Permalink
Merge pull request #411 from LokasWiki/missing_topics-v3.1
Browse files Browse the repository at this point in the history
Missing topics v3.1
  • Loading branch information
loka1 authored Feb 8, 2025
2 parents 5c11d8b + 27d6e51 commit 51cadc7
Show file tree
Hide file tree
Showing 15 changed files with 470 additions and 297 deletions.
177 changes: 154 additions & 23 deletions tasks/missingtopics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,15 @@ The project follows Clean Architecture principles and is organized into the foll

### 2. Use Cases
- `UpdateMissingTopicsUseCase`: Contains the business logic for updating missing topics pages
- Implements batch processing and rate limiting
- Uses observer pattern for progress monitoring
- Supports dynamic bot name configuration
- Implements batch processing and rate limiting
- Uses observer pattern for progress monitoring
- Includes dynamic timestamp updates

### 3. Repositories
- `ArticleRepository`: Abstract interface for article operations
- `WikiArticleRepository`: Concrete implementation for fetching missing articles
- Configurable database connections for different wikis
- `TopicRepository`: Abstract interface for topic operations
- `WikiTopicRepository`: Concrete implementation using Pywikibot

Expand All @@ -33,6 +36,7 @@ The project implements several design patterns to achieve flexibility and mainta
- Abstracts data access layer
- Makes it easy to change data sources
- Provides clean interfaces for data operations
- Supports configurable database connections

2. **Observer Pattern**
- Monitors update progress
Expand All @@ -49,25 +53,32 @@ The project implements several design patterns to achieve flexibility and mainta
- Centralizes API configuration
- Makes it easy to modify API parameters
- Supports different environments
- Configurable database settings

## 🚀 Usage

### Basic Usage

```python
from tasks.missingtopics.repositories.article_repository import WikiArticleRepository, MissingTopicsConfig
from tasks.missingtopics.repositories.article_repository import WikiArticleRepository, MissingTopicsConfig, DatabaseConfig
from tasks.missingtopics.repositories.topic_repository import WikiTopicRepository
from tasks.missingtopics.use_cases.update_missing_topics import UpdateMissingTopicsUseCase
from tasks.missingtopics.observers.logging_observer import LoggingObserver

# Initialize repositories
# Initialize repositories with custom configurations
db_config = DatabaseConfig(
host="enwiki.analytics.db.svc.wikimedia.cloud",
db_name="enwiki_p"
)

article_repository = WikiArticleRepository(db_config=db_config)
topic_repository = WikiTopicRepository()
article_repository = WikiArticleRepository()

# Create use case
# Create use case with custom settings
use_case = UpdateMissingTopicsUseCase(
topic_repository=topic_repository,
article_repository=article_repository,
bot_name="CustomBot", # Configure custom bot name
batch_size=50,
delay_seconds=3
)
Expand All @@ -93,7 +104,18 @@ config = MissingTopicsConfig(
limitnum=1
)

article_repository = WikiArticleRepository(config=config)
# Configure database settings
db_config = DatabaseConfig(
host="custom.host",
db_name="custom_wiki",
db_port=3306,
charset='utf8mb4'
)

article_repository = WikiArticleRepository(
config=config,
db_config=db_config
)
```

## 📋 Requirements
Expand Down Expand Up @@ -124,12 +146,23 @@ The system can be configured through various components:
- Project
- Depth and other parameters

2. **Performance Settings**:
2. **Database Configuration**:
- Host and port
- Database name
- Character set
- Connection settings

3. **Performance Settings**:
- Batch size for processing
- Delay between requests
- Rate limiting

3. **Logging**:
4. **Bot Configuration**:
- Custom bot name
- Dynamic timestamps
- Update messages

5. **Logging**:
- Configurable logging levels
- Multiple observers support
- Detailed error tracking
Expand Down Expand Up @@ -169,38 +202,134 @@ tasks/missingtopics/
- Type hints for better IDE support
- Configurable API settings
- Detailed logging system
- Dynamic bot name configuration
- Real-time timestamp updates
- Configurable database connections

## 🧪 Testing

The project includes a comprehensive test suite:
The project includes a comprehensive test suite covering all components:

### Running Tests

```bash
# Run all tests
pytest tests/tasks/missingtopics/

# Run specific test file
pytest tests/tasks/missingtopics/test_article_repository.py

# Run with coverage
pytest tests/tasks/missingtopics/ --cov=tasks.missingtopics

# Run with verbose output
pytest -v tests/tasks/missingtopics/
```

### Test Structure

1. **Unit Tests**:
- Repository tests
- Use case tests
- Entity tests
- Observer tests

2. **Mock Objects**:
- API responses
1. **Entity Tests** (`test_topic_entity.py`):
- Article class tests
- Creation and properties
- English version handling
- Wiki link formatting
- Topic class tests
- Creation and properties
- Article management
- Edge cases

2. **Repository Tests**:
- Article Repository (`test_article_repository.py`)
- Missing articles retrieval
- English version lookups
- Database configuration
- Error handling
- Connection management
- Topic Repository (`test_topic_repository.py`)
- Topic retrieval
- Page saving
- Query formatting
- Error scenarios

3. **Use Case Tests** (`test_update_missing_topics.py`):
- Update process
- Batch processing
- Bot name configuration
- Content generation
- Error handling
- Observer notifications

4. **Observer Tests** (`test_logging_observer.py`):
- Event handling
- Log message formatting
- Error logging
- Integration with logging system

### Test Coverage

Each component is tested for:

1. **Happy Path**:
- Normal operation scenarios
- Expected inputs and outputs
- Successful operations

2. **Error Handling**:
- Invalid inputs
- Network failures
- Database errors
- API errors

3. **Edge Cases**:
- Empty collections
- Boundary conditions
- Special characters
- Resource cleanup

4. **Integration Points**:
- Database interactions
- API calls
- File operations
- External services

### Mocking Strategy

The test suite uses mocking to isolate components:

1. **External Services**:
- Database connections
- API endpoints
- File systems
- Wiki interactions

2. **Internal Components**:
- Repositories
- Observers
- Configuration

### Test Fixtures

Common test fixtures provide:

1. **Mock Data**:
- Sample topics
- Test articles
- Database results
- Wiki pages
- API responses

2. **Configuration**:
- Database settings
- API parameters
- Test environment setup

### Best Practices

The test suite follows these principles:

3. **Test Coverage**:
- Aims for high coverage
- Covers error cases
- Tests edge conditions
1. **Isolation**: Each test is independent
2. **Readability**: Clear arrange-act-assert pattern
3. **Maintainability**: DRY principles with fixtures
4. **Coverage**: Comprehensive testing of all features

## 🔮 Future Improvements

Expand All @@ -212,6 +341,8 @@ pytest tests/tasks/missingtopics/ --cov=tasks.missingtopics
6. Support more languages
7. Add metrics collection
8. Implement rate limiting strategies
9. Add database connection pooling
10. Enhance timestamp formatting options

## 📝 License

Expand Down
4 changes: 2 additions & 2 deletions tasks/missingtopics/entities/topic_entity.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def has_english_version(self) -> bool:
return bool(self.en_title)

def format_wiki_link(self) -> str:
return f"[[{self.title}]]"
return f"{self.title}"

def format_en_wiki_link(self) -> str:
return f"[[:en:{self.en_title}]]" if self.has_english_version else ""
return f"[[:en:{self.en_title}]]" if self.has_english_version else "\n"
48 changes: 33 additions & 15 deletions tasks/missingtopics/repositories/article_repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import wikitextparser as wtp
from pywikibot import config as _config
from pymysql.converters import escape_string
from pymysql.err import Error as PyMySQLError

from core.utils.wikidb import Database
from entities.topic_entity import Article
Expand All @@ -23,6 +24,15 @@ class MissingTopicsConfig:
nosingles: int = 1
limitnum: int = 1

@dataclass
class DatabaseConfig:
"""Configuration for database connection"""
host: str = _config.db_hostname_format.format("enwiki")
db_name: str = _config.db_name_format.format("enwiki")
db_port: int = _config.db_port
charset: str = 'utf8mb4'
read_default_file: str = _config.db_connect_file

class ArticleRepository(ABC):
@abstractmethod
def get_missing_articles(self, topic_name: str) -> List[Article]:
Expand All @@ -33,8 +43,13 @@ def get_english_versions(self, titles: List[str]) -> Dict[str, str]:
pass

class WikiArticleRepository(ArticleRepository):
def __init__(self, config: Optional[MissingTopicsConfig] = None):
def __init__(
self,
config: Optional[MissingTopicsConfig] = None,
db_config: Optional[DatabaseConfig] = None
):
self.config = config or MissingTopicsConfig()
self.db_config = db_config or DatabaseConfig()
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
Expand Down Expand Up @@ -64,12 +79,15 @@ def get_english_versions(self, titles: List[str]) -> Dict[str, str]:
Returns:
Dictionary mapping article titles to their English versions
Raises:
PyMySQLError: If there's a database connection error
"""
if not titles:
return {}
with self._get_enwiki_db_connection() as en_db:
return self._query_english_titles(en_db, titles)

db = self._get_db_connection()
return self._query_english_titles(db, titles)

def _fetch_missing_topics_data(self, topic_name: str) -> str:
"""Makes HTTP request to MissingTopics API"""
Expand Down Expand Up @@ -106,18 +124,18 @@ def _parse_missing_topics_response(self, response_text: str) -> List[Article]:
for row in data
]

def _get_enwiki_db_connection(self) -> Database:
"""Creates connection to English Wikipedia database"""
en_db = Database()
en_db.connection = pymysql.connect(
host=_config.db_hostname_format.format("arwiki"),
read_default_file=_config.db_connect_file,
db=_config.db_name_format.format("arwiki"),
charset='utf8mb4',
port=_config.port,
def _get_db_connection(self) -> Database:
"""Creates database connection with custom configuration"""
db = Database()
db.connection = pymysql.connect(
host=self.db_config.host,
read_default_file=self.db_config.read_default_file,
db=self.db_config.db_name,
charset=self.db_config.charset,
port=self.db_config.db_port,
cursorclass=pymysql.cursors.DictCursor,
)
return en_db
return db

def _query_english_titles(self, db: Database, titles: List[str]) -> Dict[str, str]:
"""Queries database for English article titles"""
Expand All @@ -130,7 +148,7 @@ def _query_english_titles(self, db: Database, titles: List[str]) -> Dict[str, st
WHERE page.page_title IN ({titles_string})
AND page.page_namespace = 0
"""
db.get_content_from_database()
db.get_content_from_database() # This method handles connection closing

return {
title: str(row['p_title'], 'utf-8')
Expand Down
2 changes: 1 addition & 1 deletion tasks/missingtopics/repositories/topic_repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ def get_all_topics(self) -> List[Topic]:
def save_topic_page(self, topic: Topic, content: str):
page = pywikibot.Page(self.site, topic.page_name)
page.text = content
page.save("بوت:تحديث مقالات مطلوبة حسب الاختصاص v2.0.0")
page.save("بوت:تحديث مقالات مطلوبة حسب الاختصاص v3.1.0")
7 changes: 0 additions & 7 deletions tasks/missingtopics/tests/conftest.py

This file was deleted.

Loading

0 comments on commit 51cadc7

Please sign in to comment.