A strongly-typed C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
Please see scrapyrt documentation for complete details on making requests.
You can initialize a new scrapyrt client by passing the base address to the location where your server is running:
var client = new ScrapyRTClient("http://localhost:9080");
... or by passing your own HttpClient
if you want more control over outgoing requests:
var client = new ScrapyRTClient(new HttpClient() {BaseAddress = new Uri("http://localhost:9080")});
Assume we have an item model that correlates to the structure of the items scraped by scrapy:
public class CountryItem
{
public string CountryName { get; set; }
}
The simplest way to get items from the scrapyrt endpoint is using a GET
request. The following examples show how we call ExampleSpider with the url to be scraped:
Get a single item:
CountryItem response = await client.GetSpiderSingleItemAsync<CountryItem>("ExampleSpider", "http://example.webscraping.com");
Get a list of items:
List<CountryItem> response = await client.GetSpiderItemsAsync<CountryItem>("ExampleSpider", "http://example.webscraping.com");
Get the complete crawl response including crawl stats:
CrawlResponse<CountryItem> response = await client.GetSpiderCrawlAsync<CountryItem>("ExampleSpider", "http://example.webscraping.com");
Making a POST
request allows you to specify more advanced configuration for each call. The following examples show how we call ExampleSpider with the url to be scraped.
Get a single item:
CountryItem response = await client.PostSpiderSingleItemAsync<CountryItem>(new CrawlRequest()
{
SpiderName = "ExampleSpider",
Request = new TwistedRequest()
{
Url = new Uri("http://example.webscraping.com")
}
});
Get a list of items:
List<CountryItem> response = await client.PostSpiderItemsAsync<CountryItem>(new CrawlRequest()
{
SpiderName = "ExampleSpider",
Request = new TwistedRequest()
{
Url = new Uri("http://example.webscraping.com")
}
});
Get the complete crawl response including crawl stats:
CrawlResponse<CountryItem> response = await client.PostSpiderCrawlAsync<CountryItem>(new CrawlRequest()
{
SpiderName = "ExampleSpider",
Request = new TwistedRequest()
{
Url = new Uri("http://example.webscraping.com")
}
});
There are tons of other options available to customize how scrapy's Twisted networking library makes the request on your behalf. Here we specify an X-Example-Header
that should be passed when scrapy downloads the web page and to return no more than 3 results in the response:
List<CountryItem> response = await client.PostSpiderItemsAsync<CountryItem>(new CrawlRequest()
{
SpiderName = "ExampleSpider",
MaxRequests = 3,
Request = new TwistedRequest()
{
Url = new Uri("http://example.webscraping.com"),
Headers = new Dictionary<string, object>()
{
{"X-Example-Header", "Scrapy"}
}
}
});