Skip to content

HTML parsing and querying with CSS selectors

License

Notifications You must be signed in to change notification settings

nacht-org/kuren

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kuren

crates.io downloads test

HTML parsing and querying with CSS selectors.

kuren is on Crates.io and GitHub.

kuren provides an interface to Servo's html5ever and selectors crates, for browser-grade parsing and querying.

Examples

Parsing a document

use kuren::Html;

let html = r#"
    <!DOCTYPE html>
    <meta charset="utf-8">
    <title>Hello, world!</title>
    <h1 class="foo">Hello, <i>world!</i></h1>
"#;

let document = Html::parse_document(html);

Parsing a fragment

use kuren::Html;
let fragment = Html::parse_fragment("<h1>Hello, <i>world!</i></h1>");

Parsing a selector

use kuren::Selector;
let selector = Selector::parse("h1.foo").unwrap();

Selecting elements

use kuren::{Html, Selector};

let html = r#"
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
"#;

let fragment = Html::parse_fragment(html);
let selector = Selector::parse("li").unwrap();

for element in fragment.select(&selector) {
    assert_eq!("li", element.value().name());
}

Selecting descendent elements

use kuren::{Html, Selector};

let html = r#"
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
"#;

let fragment = Html::parse_fragment(html);
let ul_selector = Selector::parse("ul").unwrap();
let li_selector = Selector::parse("li").unwrap();

let ul = fragment.select(&ul_selector).next().unwrap();
for element in ul.select(&li_selector) {
    assert_eq!("li", element.value().name());
}

Accessing element attributes

use kuren::{Html, Selector};

let fragment = Html::parse_fragment(r#"<input name="foo" value="bar">"#);
let selector = Selector::parse(r#"input[name="foo"]"#).unwrap();

let input = fragment.select(&selector).next().unwrap();
assert_eq!(Some("bar"), input.value().attr("value"));

Serializing HTML and inner HTML

use kuren::{Html, Selector};

let fragment = Html::parse_fragment("<h1>Hello, <i>world!</i></h1>");
let selector = Selector::parse("h1").unwrap();

let h1 = fragment.select(&selector).next().unwrap();

assert_eq!("<h1>Hello, <i>world!</i></h1>", h1.html());
assert_eq!("Hello, <i>world!</i>", h1.inner_html());

Accessing descendent text

use kuren::{Html, Selector};

let fragment = Html::parse_fragment("<h1>Hello, <i>world!</i></h1>");
let selector = Selector::parse("h1").unwrap();

let h1 = fragment.select(&selector).next().unwrap();
let text = h1.text().collect::<Vec<_>>();

assert_eq!(vec!["Hello, ", "world!"], text);

Manipulating the DOM

use html5ever::tree_builder::TreeSink;
use kuren::{Html, Selector};

let html = "<html><body>hello<p class=\"hello\">REMOVE ME</p></body></html>";
let selector = Selector::parse(".hello").unwrap();
let mut document = Html::parse_document(html);
let node_ids: Vec<_> = document.select(&selector).map(|x| x.id()).collect();
for id in node_ids {
    document.remove_from_parent(&id);
}
assert_eq!(document.html(), "<html><head></head><body>hello</body></html>");

Contributing

Please feel free to open pull requests. If you're planning on implementing something big (i.e. not fixing a typo, a small bug fix, minor refactor, etc) then please open an issue first.

About

HTML parsing and querying with CSS selectors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 98.0%
  • Roff 2.0%