Updated docs and cli code
This commit is contained in:
parent
2eae3a250d
commit
2ac33f6e40
42
README.md
42
README.md
|
@ -4,6 +4,9 @@ Arachnid is a fast and powerful web scraping framework for Crystal. It provides
|
|||
|
||||
- [Arachnid](#Arachnid)
|
||||
- [Installation](#Installation)
|
||||
- [The CLI](#The-CLI)
|
||||
- [Summarize](#Summarize)
|
||||
- [Sitemap](#Sitemap)
|
||||
- [Examples](#Examples)
|
||||
- [Usage](#Usage)
|
||||
- [Configuration](#Configuration)
|
||||
|
@ -65,6 +68,45 @@ Arachnid is a fast and powerful web scraping framework for Crystal. It provides
|
|||
|
||||
2. Run `shards install`
|
||||
|
||||
To build the CLI
|
||||
|
||||
1. Run `shards build --release`
|
||||
|
||||
2. Add the `./bin` directory to your path or symlink `./bin/arachnid` with `sudo ln -s /home/path/to/arachnid /usr/local/bin`
|
||||
|
||||
## The CLI
|
||||
|
||||
Arachnid provides a CLI for basic scanning tasks, here is what you can do with it so far:
|
||||
|
||||
### Summarize
|
||||
|
||||
The `summarize` subcommand allows you to generate a report for a website. It can give you the number of pages, the internal and external links for every page, and a list of pages and their status codes (helpful for finding broken pages).
|
||||
|
||||
You can use it like this:
|
||||
|
||||
```
|
||||
arachnid summarize https://crystal-lang.org --ilinks --elinks -c 404 503
|
||||
```
|
||||
|
||||
This will generate a report for crystal-lang.org which will include every page and it's internal and external links, and a list of every page that returned a 404 or 503 status. For complete help use `arachnid summarize --help`
|
||||
|
||||
### Sitemap
|
||||
|
||||
Arachnid can also generate a XML or JSON sitemap for a website by scanning the entire site, following internal links. To do so just use the `arachnid sitemap` subcommand.
|
||||
|
||||
```
|
||||
# XML sitemap
|
||||
arachnid sitemap https://crystal-lang.org --xml
|
||||
|
||||
# JSON sitemap
|
||||
arachnid sitemap https://crystal-lang.org --json
|
||||
|
||||
# Custom output file
|
||||
arachnid sitemap https://crystal-lang.org --xml -o ~/Desktop/crystal-lang.org-sitemap.xml
|
||||
```
|
||||
|
||||
Full help is available with `arachnid sitemap --help`
|
||||
|
||||
## Examples
|
||||
|
||||
Arachnid provides an easy to use, powerful DSL for scraping websites.
|
||||
|
|
|
@ -44,8 +44,8 @@ module Arachnid
|
|||
if args.empty?
|
||||
STDERR.puts "At least one site is required"
|
||||
else
|
||||
count = Arachnid::Cli::Count.new
|
||||
count.run(opts, args)
|
||||
summarize = Arachnid::Cli::Summarize.new
|
||||
summarize.run(opts, args)
|
||||
end
|
||||
end
|
||||
end
|
||||
|
|
|
@ -1,51 +0,0 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
||||
<url>
|
||||
<loc>https://forum.crystal-lang.org</loc>
|
||||
<lastmod>2019-06-30</lastmod>
|
||||
<changefreq>never</changefreq>
|
||||
<priority>0.5</priority>
|
||||
</url>
|
||||
<url>
|
||||
<loc>https://forum.crystal-lang.org/privacy</loc>
|
||||
<lastmod>2019-06-30</lastmod>
|
||||
<changefreq>never</changefreq>
|
||||
<priority>0.5</priority>
|
||||
</url>
|
||||
<url>
|
||||
<loc>https://forum.crystal-lang.org/tos</loc>
|
||||
<lastmod>2019-06-30</lastmod>
|
||||
<changefreq>never</changefreq>
|
||||
<priority>0.5</priority>
|
||||
</url>
|
||||
<url>
|
||||
<loc>https://forum.crystal-lang.org/guidelines</loc>
|
||||
<lastmod>2019-06-30</lastmod>
|
||||
<changefreq>never</changefreq>
|
||||
<priority>0.5</priority>
|
||||
</url>
|
||||
<url>
|
||||
<loc>https://forum.crystal-lang.org/categories</loc>
|
||||
<lastmod>2019-06-30</lastmod>
|
||||
<changefreq>never</changefreq>
|
||||
<priority>0.5</priority>
|
||||
</url>
|
||||
<url>
|
||||
<loc>https://forum.crystal-lang.org/c/offtopic</loc>
|
||||
<lastmod>2019-06-30</lastmod>
|
||||
<changefreq>never</changefreq>
|
||||
<priority>0.5</priority>
|
||||
</url>
|
||||
<url>
|
||||
<loc>https://forum.crystal-lang.org/c/offtopic?page=1</loc>
|
||||
<lastmod>2019-06-30</lastmod>
|
||||
<changefreq>never</changefreq>
|
||||
<priority>0.5</priority>
|
||||
</url>
|
||||
<url>
|
||||
<loc>https://forum.crystal-lang.org/c/offtopic?page=2</loc>
|
||||
<lastmod>2019-06-30</lastmod>
|
||||
<changefreq>never</changefreq>
|
||||
<priority>0.5</priority>
|
||||
</url>
|
||||
</urlset>
|
|
@ -5,7 +5,7 @@ require "json"
|
|||
|
||||
module Arachnid
|
||||
class Cli < Clim
|
||||
class Count < Cli::Action
|
||||
class Summarize < Cli::Action
|
||||
|
||||
def run(opts, urls)
|
||||
spinner = Spinner::Spinner.new("Wait...")
|
||||
|
@ -65,7 +65,7 @@ module Arachnid
|
|||
report["codes"] = codes if codes
|
||||
|
||||
if outfile
|
||||
File.write(outfile.to_s, report.to_json, mode: "w+")
|
||||
File.write(File.expand_path(outfile.to_s, __DIR__), report.to_json, mode: "w+")
|
||||
puts "Report saved to #{outfile}"
|
||||
else
|
||||
pp report
|
Loading…
Reference in New Issue