Updated readme

This commit is contained in:
Chris Watson 2019-06-26 21:16:43 -07:00
parent 41291aa54b
commit 6f63c92780
No known key found for this signature in database
GPG Key ID: 37DAEF5F446370A4
1 changed files with 14 additions and 18 deletions

View File

@ -153,7 +153,6 @@ Arachnid has a ton of configration options which can be passed to the mehthods l
- **history** - Links that should not be visited - **history** - Links that should not be visited
- **limit** - Maximum number of resources to visit - **limit** - Maximum number of resources to visit
- **max_depth** - Maximum crawl depth - **max_depth** - Maximum crawl depth
- **filter_options** - Passed to [`initialize_filters`]()
There are also a few class properties on `Arachnid` itself which are used as the defaults, unless overrided. There are also a few class properties on `Arachnid` itself which are used as the defaults, unless overrided.
@ -185,27 +184,24 @@ Arachnid provides 3 interfaces to use for crawling:
Arachnid has the concept of **filters** for the purpose of filtering urls before visiting them. They are as follows: Arachnid has the concept of **filters** for the purpose of filtering urls before visiting them. They are as follows:
- **schemes**
- [visit_schemes_like(pattern : String | Regex)]()
- [ignore_schemes_like(pattern : String | Regex)]()
- **hosts** - **hosts**
- [visit_hosts_like(pattern : String | Regex)]() - [visit_hosts_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#visit_hosts_like%28pattern%29-instance-method)
- [ignore_hosts_like(pattern : String | Regex)]() - [ignore_hosts_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#ignore_hosts_like%28pattern%29-instance-method)
- **ports** - **ports**
- [visit_ports_like(pattern : String | Regex)]() - [visit_ports_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#visit_ports-instance-method)
- [ignore_ports_like(pattern : String | Regex)]() - [ignore_ports_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#ignore_ports-instance-method)
- **ports** - **ports**
- [visit_ports_like(pattern : String | Regex)]() - [visit_ports_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#visit_ports_like%28pattern%29-instance-method)
- [ignore_ports_like(pattern : String | Regex)]() - [ignore_ports_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#ignore_ports_like%28pattern%29-instance-method)
- **links** - **links**
- [visit_links_like(pattern : String | Regex)]() - [visit_links_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#visit_links_like(pattern)-instance-method)
- [ignore_links_like(pattern : String | Regex)]() - [ignore_links_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#ignore_links_like(pattern)-instance-method)
- **urls** - **urls**
- [visit_urls_like(pattern : String | Regex)]() - [visit_urls_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#visit_urls_like%28pattern%29-instance-method)
- [ignore_urls_like(pattern : String | Regex)]() - [ignore_urls_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#ignore_urls_like%28pattern%29-instance-method)
- **exts** - **exts**
- [visit_exts_like(pattern : String | Regex)]() - [visit_exts_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#visit_exts_like%28pattern%29-instance-method)
- [ignore_exts_like(pattern : String | Regex)]() - [ignore_exts_like(pattern : String | Regex)](https://watzon.github.io/arachnid/Arachnid/Agent.html#ignore_exts_like%28pattern%29-instance-method)
All of these methods have the ability to also take a block instead of a pattern, where the block returns true or false. The only difference between `links` and `urls` in this case is with the block argument. `links` receives a `String` and `urls` a `URI`. Honestly I'll probably get rid of `links` soon and just make it `urls`. All of these methods have the ability to also take a block instead of a pattern, where the block returns true or false. The only difference between `links` and `urls` in this case is with the block argument. `links` receives a `String` and `urls` a `URI`. Honestly I'll probably get rid of `links` soon and just make it `urls`.
@ -319,11 +315,11 @@ Passes every origin and destination URI of each link to a given block.
### Content Types ### Content Types
Every resource has an associated content type and the `Resource` class itself provides several easy methods to check it. You can find all of them [here](). Every resource has an associated content type and the `Resource` class itself provides several easy methods to check it. You can find all of them [here](https://watzon.github.io/arachnid/Arachnid/Resource/ContentTypes.html).
### Parsing HTML ### Parsing HTML
Every HTML/XML resource has full access to the suite of methods provided by [Crystagiri]() allowing you to more easily search by css selector. Every HTML/XML resource has full access to the suite of methods provided by [Crystagiri](https://github.com/madeindjs/Crystagiri/) allowing you to more easily search by css selector.
## Contributing ## Contributing