Updated docs and cli code

This commit is contained in:
Chris Watson 2019-06-30 20:38:18 -07:00
parent 2eae3a250d
commit 2ac33f6e40
No known key found for this signature in database
GPG Key ID: 37DAEF5F446370A4
4 changed files with 46 additions and 55 deletions

View File

@ -4,6 +4,9 @@ Arachnid is a fast and powerful web scraping framework for Crystal. It provides
- [Arachnid](#Arachnid)
- [Installation](#Installation)
- [The CLI](#The-CLI)
- [Summarize](#Summarize)
- [Sitemap](#Sitemap)
- [Examples](#Examples)
- [Usage](#Usage)
- [Configuration](#Configuration)
@ -65,6 +68,45 @@ Arachnid is a fast and powerful web scraping framework for Crystal. It provides
2. Run `shards install`
To build the CLI
1. Run `shards build --release`
2. Add the `./bin` directory to your path or symlink `./bin/arachnid` with `sudo ln -s /home/path/to/arachnid /usr/local/bin`
## The CLI
Arachnid provides a CLI for basic scanning tasks, here is what you can do with it so far:
### Summarize
The `summarize` subcommand allows you to generate a report for a website. It can give you the number of pages, the internal and external links for every page, and a list of pages and their status codes (helpful for finding broken pages).
You can use it like this:
```
arachnid summarize https://crystal-lang.org --ilinks --elinks -c 404 503
```
This will generate a report for crystal-lang.org which will include every page and it's internal and external links, and a list of every page that returned a 404 or 503 status. For complete help use `arachnid summarize --help`
### Sitemap
Arachnid can also generate a XML or JSON sitemap for a website by scanning the entire site, following internal links. To do so just use the `arachnid sitemap` subcommand.
```
# XML sitemap
arachnid sitemap https://crystal-lang.org --xml
# JSON sitemap
arachnid sitemap https://crystal-lang.org --json
# Custom output file
arachnid sitemap https://crystal-lang.org --xml -o ~/Desktop/crystal-lang.org-sitemap.xml
```
Full help is available with `arachnid sitemap --help`
## Examples
Arachnid provides an easy to use, powerful DSL for scraping websites.

View File

@ -44,8 +44,8 @@ module Arachnid
if args.empty?
STDERR.puts "At least one site is required"
else
count = Arachnid::Cli::Count.new
count.run(opts, args)
summarize = Arachnid::Cli::Summarize.new
summarize.run(opts, args)
end
end
end

View File

@ -1,51 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://forum.crystal-lang.org</loc>
<lastmod>2019-06-30</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://forum.crystal-lang.org/privacy</loc>
<lastmod>2019-06-30</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://forum.crystal-lang.org/tos</loc>
<lastmod>2019-06-30</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://forum.crystal-lang.org/guidelines</loc>
<lastmod>2019-06-30</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://forum.crystal-lang.org/categories</loc>
<lastmod>2019-06-30</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://forum.crystal-lang.org/c/offtopic</loc>
<lastmod>2019-06-30</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://forum.crystal-lang.org/c/offtopic?page=1</loc>
<lastmod>2019-06-30</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://forum.crystal-lang.org/c/offtopic?page=2</loc>
<lastmod>2019-06-30</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
</urlset>

View File

@ -5,7 +5,7 @@ require "json"
module Arachnid
class Cli < Clim
class Count < Cli::Action
class Summarize < Cli::Action
def run(opts, urls)
spinner = Spinner::Spinner.new("Wait...")
@ -65,7 +65,7 @@ module Arachnid
report["codes"] = codes if codes
if outfile
File.write(outfile.to_s, report.to_json, mode: "w+")
File.write(File.expand_path(outfile.to_s, __DIR__), report.to_json, mode: "w+")
puts "Report saved to #{outfile}"
else
pp report