Knowledgebase: LinkCrawler Rules
What is a LinkCrawler Rule?
Posted by pspzockerscene psp, Last modified by pspzockerscene psp on 07 January 2021 07:02 PM

What is a LinkCrawler Rule?

LinkCrawler Rules can be used to do automatically treat added URLs of websites which are not supported via plugin/by default and perform specified actions on them.
You can add as many rules as you like and they can also be chained e.g. the results of rule 1 get processed again by rule 2.
LinkCrawler Rules are part of JDownloaders advanced features.
You can find them under Settings -> Advanced Settings -> LinkCrawler: LinkCrawlerRules
Click in the Value field so you can modify the field and replace the content with your rule(s).
Also make sure that the "LinkCrawlerRules Checkbox" (first setting in screenshot below) is enabled.
Screenshot:

There is no GUI available for this feature.
If you are only here to find out how to add a pre-given LinkCrawler rule to JD, you may stop reading here but if you want to know how to create your own LinkCrawler Rules, continue reading.
Here is a list of LinkCrawler Rule types and simple examples on what they can be used for.

  • DEEPDECRYPT: Auto-deep scan URLs of websites which are not supported via specified plugin
  • REWRITE: Change URLs added to JD to rewriteReplaceWith
  • DIRECTHTTP: Make JD accept certain URLs as direct-downloadable URLs e.g. URLs that do not have a file-extension in them (DIRECTHTTP)
    Can also be used to make JD accept URLs containing unsupported/rare file-extensions
  • FOLLOWREDIRECT: Allows JD to accept unsupported URLs that simply redirect from website/location A to B
  • SUBMITFORM: Allows JD to accept certain URLs and submit a HTTP Form found inside html code which matches pattern formPattern

No matter which type of rule you use - afterwards JD will auto-grab URLs matching your defined "pattern" (see below) also via clipboard observation.
A basic knowledge of Regular Expressions is recommended before you get started.
You can easily test your regular expressions with the regex101.com online tool.

Our knowledgebase contains common examples but if you need to create "more complicated" rules you may find examples in our support forum and of course you can contact our staff if you get stuck.

Basic example of the structure of a LinkCrawler Rule:

[{
"enabled" : true,
"cookies" : [ ["key", "value"] ],
"updateCookies" : true,
"logging" : false,
"maxDecryptDepth" : 1,
"id" : 0000001540111,
"name" : "example rule",
"pattern" : "https://(?:www\\.)?example\\.com/(.+)",
"rule" : "DEEPDECRYPT",
"packageNamePattern" : "<title>(.*?)</title>",
"passwordPattern" : null,
"formPattern" : null,
"deepPattern" : null,
"rewriteReplaceWith" : "https://example2.com/$1"
}]

LinkCrawler Rules are stored as a json array. Especially if you have multiple rules it can be a good idea to use a json editor to work on them.
JD will only allow you to add rules with a valid json structure!

Explanation of all possible fields:
Depending on the type of your LinkCrawler rule, only some of these fields are required.

Field name Description Data-type / example Relevant for rule- type(s)
enabled enables/disables this rule boolean ALL
cookies Here you can put in your personal cookies e.g. login cookies of websites which JD otherwise fails to parse.
Also if "updateCookies" is enabled, JD will update these with all cookies it receives from the website(s) that match pattern.
List[List[String, String]]
Example:
"cookies" : [ ["phpssid", "ffffffffffvoirg7ffffffffff"] ]
DIRECTHTTP,
DEEPDECRYPT,
SUBMITFORM,
FOLLOWREDIRECT
updateCookies If the target websites returns new cookies, save these inside this rule and update this rule. boolean DIRECTHTTP,
DEEPDECRYPT,
SUBMITFORM,
FOLLOWREDIRECT
logging Enable this for support purposes.
Logs of your LinkCrawler Rules can be found in your JD install dir/logs/:
LinkCrawlerRule.<RuleID>.log.0
and
/LinkCrawlerDeep.*
boolean ALL
maxDecryptDepth How many layers deep do should your rule crawl (e.g. rule returns URLs matching the same rule again - how often is this chain allowed to happen?) int ALL
id Auto generated ID of the rule int ALL
name Name of the rule String ALL
pattern RegEx: This rule will be used for all URLs matching this pattern String
<title>(.*?)</title>
ALL
rule Type of the rule e.g. String
DEEPDECRYPT or REWRITE or DIRECTHTTP or
FOLLOWREDIRECT or SUBMITFORM
ALL
packageNamePattern

HTML RegEx: All URLs crawled by this rule will go into one package if the RegEx returns a result

String
https://(?:www\\.)?example\\.com/(.+)
DEEPDECRYPT
passwordPattern  HTML RegEx: Pattern to find extraction passwords String
password:([^>]+)>
DEEPDECRYPT
formPattern HTML RegEx: Find- and submit HTML Form String
<form id="example">(.*?)</form>
SUBMITFORM
deepPattern HTML RegEx: Which URLs should this rule inside HTML code.
null = auto scan- and return all supported URLs found in HTML code.

String

src="(https?://[^\"]+)"

DEEPDECRYPT
rewriteReplaceWith Pattern for new URL String
https://example2.com/$1
REWRITE



Attachments 
 
 settings_advanced_settings_linkcrawlerrules.png (17.02 KB)
(2 vote(s))
Helpful
Not helpful