Skip to content

buren/site_mapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SiteMapper

Code Climate Coverage Status Docs badge Build Status Dependency Status Gem Version

Map all links on a given site.
SiteMapper will try to respect /robots.txt

Works great with Wayback Archiver a gem that crawls your site and submits each URL to the Internet Archive (Wayback Machine).

Installation

Install the gem:

gem install site_mapper

Usage

Command line usage:

# Crawl all found links on page
# that has example.com domain
site_mapper example.com

Ruby usage:

# Crawl all found links on page
# that has example.com domain
require 'site_mapper'
SiteMapper.map('example.com') do |new_url|
  puts "New URL found: #{new_url}"
end
# Log to STDOUT
SiteMapper.map('example.com', logger: :system) do |new_url|
  puts "New URL found: #{new_url}"
end

Docs

You can find the docs online on RubyDoc.

This gem is documented using yard (run from the root of this respository).

yard # Generates documentation to doc/

Contributing

Contributions, feedback and suggestions are very welcome.

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Notes

  • Special thanks to the robots gem, which provided the bulk of the code in lib/robots.rb

Alternatives

There are a couple of great alternatives, which are more mature and has more features than this Gem and has. Please feel free to check them out:

License

MIT License

Releases

No releases published

Packages

No packages published

Languages