Introduction


Rest assured, if you are looking a .NET solution how to extract a Text from HTML document, you are in the right place. To illustrate, let's see the simplest C# code:

SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
		  
string htmlString = "Hello World!";
string outputFile = @"c:\Test\result.txt";

if (h.OpenHtml(htmlString))
{
    bool ok = h.ToText(outputFile);

    // Open the result for demonstration purposes.
    if (ok)
        System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(outputFile) {
        UseShellExecute = true });
}

You will likely be surprised at the amount of built in functionality and ability to convert also to other formats:

Download


To verify the functionality of our SDK, download the latest «HTML to RTF .Net» with code examples, 32.6 Mb.

Restrictions:

Free version of «HTML to RTF .Net» has a notification "Created by an unlicensed version of «HTML to RTF .Net»" and the random addition of the word "TRIAL VERSION".

Three examples to extract Text from HTML in C# and VB.NET

1. Simple extraction of Text from HTML file in C#:

SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();

string htmlFile = @"d:\Resurrection.html";
string textFile = Path.ChangeExtension(htmlFile, ".txt");

h.OutputFormat = HtmlToRtf.eOutputFormat.TextUnicode;
h.ConvertFile(htmlFile, textFile);

2. Convert HTML to Text in memory using C#:

SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();

string htmlFile = @"d:\Resurrection.html";
string htmlString = File.ReadAllText(htmlFile);

// Start the conversion.
h.OutputFormat = HtmlToRtf.eOutputFormat.TextAnsi;
string textString = h.ConvertString(htmlString);
			

3. Extract Text from HTML in memory using VB.NET:

Dim h As New SautinSoft.HtmlToRtf();

Dim htmlFile As String = "d:\Resurrection.html"
Dim htmlString As String = File.ReadAllText(htmlFile)

' Start the conversion.
h.OutputFormat = HtmlToRtf.eOutputFormat.TextUnicode
Dim textString As String = h.ConvertString(htmlString)

Technical information and requirements


Requires only .NET Framework 4.0 and up or .NET Core 2.0 and up. Our product is compatible with all languages .NET and supports all Operating Systems where .NET Framework and .NET Core can be used.

Note, that «HTML to RTF .Net» is entirely written in managed C#, which makes it absolutely standalone and an independent library.

.NET Framework, .NET Core
  • .NET Framework 4.0, 4.5, 4.6.1 and higher.
  • .NET Standard 2.0
  • .NET Core 2.0 and higher.

Multi-platform component, runs on:

  • Windows
  • Linux
  • Mac OS
WindowsLinuxMac OS

Our component has proven itself on cloud platforms and services:

SharePoint Google Cloud AWS Microsoft Azure Docker
  • SharePoint
  • Google Cloud Platform
  • Amazon Web Services (AWS)
  • Microsoft Azure
  • Docker etc.