Skip to content

Latest commit

 

History

History
45 lines (36 loc) · 813 Bytes

README.md

File metadata and controls

45 lines (36 loc) · 813 Bytes

HtmlToText

Version

1.0

Summary

A compact library written in C# to parse out all the text from news articles.

Support

Text from the follow tags can be accessed:

  • p
  • div
  • h1 - h6
  • meta
    • og:site_name
    • og:url
    • og:title
    • og:description
    • og:image
    • og:image:alt
    • article:author
    • article:section
    • article:tag
    • article:published_time
    • article:modified_time
  • script
    • application/ld+json

Example

This is how you would make your request.

    HtmlParser hp = new HtmlParser();
    hp.ParseUrl(@"SAMPLE URL HERE");

    foreach(var item in hp.AllExceptions)
        Console.WriteLine(item);

    foreach(var item in hp.Paragraph)
        Console.WriteLine(item);
    
    foreach(var item in hp.Div)
        Console.WriteLine(item);