Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds methods to set the IContentParser #24

Merged
merged 2 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions WebReaper/Builders/ScraperEngineBuilder.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
using WebReaper.Core.CookieStorage.Abstract;
using WebReaper.Core.LinkTracker.Abstract;
using WebReaper.Core.LinkTracker.Concrete;
using WebReaper.Core.Parser.Abstract;
using WebReaper.Core.Scheduler.Abstract;
using WebReaper.Core.Scheduler.Concrete;
using WebReaper.Domain;
Expand Down Expand Up @@ -35,9 +36,14 @@

private IScheduler Scheduler { get; set; } = new InMemoryScheduler();
private IScraperConfigStorage? ConfigStorage { get; set; } = new InMemoryScraperConfigStorage();

protected IProxyProvider? ProxyProvider { get; set; }

public ScraperEngineBuilder WithContentParser(IContentParser contentParser)
{
SpiderBuilder.WithContentParser(contentParser);
return this;
}

public ScraperEngineBuilder AddSink(IScraperSink sink)
{
SpiderBuilder.AddSink(sink);
Expand Down Expand Up @@ -186,7 +192,6 @@
ConfigBuilder.GetWithBrowser(startUrls, actionBuilder?.Invoke(new PageActionBuilder()));
return this;
}

public ScraperEngineBuilder GetWithBrowser(params string[] startUrls)
{
ConfigBuilder.GetWithBrowser(startUrls);
Expand All @@ -201,7 +206,7 @@

public ScraperEngineBuilder FollowWithBrowser(
string linkSelector,
Func<PageActionBuilder,
Func<PageActionBuilder,
List<PageAction>>? actionBuilder = null)
{
ConfigBuilder.FollowWithBrowser(linkSelector, actionBuilder?.Invoke(new PageActionBuilder()));
Expand Down Expand Up @@ -278,7 +283,6 @@
logger);
return this;
}

public ScraperEngineBuilder WithFileCookieStorage(string fileName)
{
SpiderBuilder.WithFileCookieStorage(fileName);
Expand Down Expand Up @@ -334,13 +338,11 @@

public async Task<ScraperEngine> BuildAsync()
{
SpiderBuilder.WithConfigStorage(ConfigStorage);

Check warning on line 341 in WebReaper/Builders/ScraperEngineBuilder.cs

View workflow job for this annotation

GitHub Actions / build

Possible null reference argument for parameter 'scraperConfigStorage' in 'SpiderBuilder SpiderBuilder.WithConfigStorage(IScraperConfigStorage scraperConfigStorage)'.

var config = ConfigBuilder.Build();
var spider = SpiderBuilder.Build();

await ConfigStorage.CreateConfigAsync(config);

return new ScraperEngine(_parallelismDegree, ConfigStorage, Scheduler, spider, Logger);
}
}
}
10 changes: 8 additions & 2 deletions WebReaper/Builders/SpiderBuilder.cs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

public class SpiderBuilder
{
private Func<Metadata, JObject, Task> PostProcessor { get; set; }

Check warning on line 28 in WebReaper/Builders/SpiderBuilder.cs

View workflow job for this annotation

GitHub Actions / build

Non-nullable property 'PostProcessor' must contain a non-null value when exiting constructor. Consider declaring the property as nullable.

private List<IScraperSink> Sinks { get; } = new();

Expand All @@ -33,7 +33,7 @@

private ILinkParser LinkParser { get; } = new LinkParserByCssSelector();

private IScraperConfigStorage ScraperConfigStorage { get; set; }

Check warning on line 36 in WebReaper/Builders/SpiderBuilder.cs

View workflow job for this annotation

GitHub Actions / build

Non-nullable property 'ScraperConfigStorage' must contain a non-null value when exiting constructor. Consider declaring the property as nullable.

private IVisitedLinkTracker SiteLinkTracker { get; set; } = new InMemoryVisitedLinkTracker();

Expand All @@ -49,8 +49,14 @@

private ICookiesStorage CookieStorage { get; set; } = new InMemoryCookieStorage();

protected event Action<ParsedData> ScrapedData;

Check warning on line 52 in WebReaper/Builders/SpiderBuilder.cs

View workflow job for this annotation

GitHub Actions / build

Non-nullable event 'ScrapedData' must contain a non-null value when exiting constructor. Consider declaring the event as nullable.

public SpiderBuilder WithContentParser(IContentParser contentParser)
{
ContentParser = contentParser;
return this;
}

public SpiderBuilder WithLogger(ILogger logger)
{
Logger = logger;
Expand Down Expand Up @@ -166,7 +172,7 @@
CookieStorage = new RedisCookieStorage(connectionString, redisKey, Logger);
return this;
}

public SpiderBuilder WithFileCookieStorage(string fileName)
{
CookieStorage = new FileCookieStorage(fileName, Logger);
Expand Down Expand Up @@ -235,4 +241,4 @@

return spider;
}
}
}
Loading