Skip to content

alturkovic/robots-txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Java JitPack License

Robots.txt

Java library for reading and querying robots.txt files.

Using the library in Kotlin

  1. Parse robots.txt:
val robotsTxt = RobotsTxtReader.read(inputStream)
  1. Query robotsTxt:
val grant = robotsTxt.query("GoogleBot", "/path")
val canAccess = grant.allowed
when(grant) {
    is MatchedGrant -> {
        val crawlDelay = grant.matchedRuleGroup.crawlDelay
    }
    is NonMatchedAllowedGrant -> {
        TODO("Not matched in robots.txt")
    }
}

Using the library in Java

  1. Parse robots.txt:
RobotsTxt robotsTxt = RobotsTxtReader.read(inputStream);
  1. Query robotsTxt:
Grant grant = robotsTxt.query("GoogleBot", "/path");
boolean canAccess = grant.getAllowed();
if (grant instanceof MatchedGrant) {
  Duration crawlDelay = ((MatchedGrant) grant).getMatchedRuleGroup().getCrawlDelay();
}

Importing into your project

Maven

Add the JitPack repository into your pom.xml.

<repositories>
  <repository>
    <id>jitpack.io</id>
    <url>https://jitpack.io</url>
  </repository>
</repositories>

Add the following under your <dependencies>:

<dependencies>
  <dependency>
    <groupId>com.github.alturkovic</groupId>
    <artifactId>robots-txt</artifactId>
    <version>[insert latest version here]</version>
  </dependency>
</dependencies>