Skip to content

A rust RTF parser & lexer designed for speed and memory efficiency

Notifications You must be signed in to change notification settings

pvichivanives/rtf-parser

 
 

Repository files navigation

rtf-parser

A Rust RTF parser & lexer library designed for speed and memory efficiency.

The library is split into 2 main components:

  1. The lexer
  2. The parser

The lexer scan the document and return a Vec<Token> which represent the RTF file in a code-understandable manner. To use it :

use rtf_parser::{Lexer, Parser, Token};

let tokens: Vec<Token> = Lexer::scan("<rtf>")?;

These tokens can then be passed to the parser to transcript it to a real document : RtfDocument.

let parser = Parser::new(tokens)?;
let doc: RtfDocument = parser.parse()?;

An RtfDocument is composed with :

  • the header, containing among others the font table and the encoding.
  • the body, which is a Vec<StyledBlock>

A StyledBlock contains all the information about the formatting of a specific block of text.
It contains a Painter and the text (&str). The Painter is defined below, and the rendering implementation depends on the user. For now, it only supports font, bold, italic and underline.

struct Painter {
    font_ref: FontRef,
    font_size: u16,
    bold: bool,
    italic: bool,
    underline: bool,
}

Tou can also extract the text without any formatting information, with the to_text() method of the RtfDocument struct.

let rtf = r#"{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard Voici du texte en {\b gras}.\par}"#;
let tokens = Lexer::scan(rtf)?;
let document = Parser::new(tokens)?;
let text = document.to_text();
assert_eq!(text, "Voici du texte en gras.");

Examples

A complete example of rtf parsing is presented below :

use rtf_parser::{Lexer, Parser};

let rtf_text = r#"{ \rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard Voici du texte en {\b gras}.\par }"#;
let tokens = Lexer::scan(rtf_text)?;
let doc = Parser::new(tokens).parse()?;

assert_eq!(
    doc.header,
    RtfHeader {
        character_set: Ansi,
        font_table: FontTable::from([
            (0, Font { name: "Helvetica", character_set: 0, font_family: Swiss })
        ])
    }
);
assert_eq!(
    doc.body,
    [
        StyleBlock {
            painter: Painter { font_ref: 0, font_size: 0, bold: false, italic: false, underline: false },
            text: "Voici du texte en ",
        },
        StyleBlock {
            painter: Painter { font_ref: 0, font_size: 0, bold: true, italic: false, underline: false },
            text: "gras",
        },
        StyleBlock {
            painter: Painter { font_ref: 0, font_size: 0, bold: false, italic: false, underline: false },
            text: ".",
        },
    ]
);

About

A rust RTF parser & lexer designed for speed and memory efficiency

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 100.0%