Folder /extension/lib/recorder/filters contains custom filter files that extend the capabilities of the recorder for specific sites
Simple filter url by string or regular expression
Common file for this type of filters validators.js
KellyPageWatchdog.validators[] - contains filters that do not require additional logic and associate image url links with categories by matching url with string or regular expression.
Example:
KellyPageWatchdog.validators.push({ url: 'deviantart', host: 'deviantart.com', /* array or string, currently used only for recomendations, may be used for filter urls by host in future */ templates: [['images-wixmp', 'imageAny']], patternsDoc: [['original-doc', 'imageOriginal']] });
Match all urls that contains url substing "deviantart".
Urls that contain substring "images-wixmp" will be marked by "imageAny" category.
For "load realed doc" procedure, all images in related document that contains substring "original-doc" will be marked by "imageOriginal" category
KellyPageWatchdog.bannedUrls[] - contains url strings for exclude specific urls from parsing process
Presetted categories used in associations
imageOriginal - original image
imagePreview - preview
imageAny - useful media picture (possibly original or preview)
imageByDocument - original image (same as imageOriginal, but used to identify original images in "Load related links" process)
Custom filter class with callbacks and manifest
All filter classes stored in folder /extension/lib/recorder/filters in separated files.
KellyRecorderFilterExample.manifest - contain several keys used to check filter compatibility using the "Load related links" feature, possibly excluding from the page by host parameter in the future.
Available keys :
- host - array of strings or a string - a list of hosts that are relevant for the filter
- detectionLvl - array with list of supported filtering options - 'imageAny' - common group for preview and originals, 'imagePreview' - filter can detect preview images, 'imageOriginal' - detect original images, 'imageByDocument' - can detect original images in related documents
Example :
KellyRecorderFilterExample.manifest = { host : 'example.com', detectionLvl : ['imageAny', 'imagePreview', 'imageOriginal', 'imageByDocument'] }; /* enable filter and adds filter to common filters list */ KellyPageWatchdog.filters.push(KellyRecorderFilterExample);
Manifest declare that filter works on sites with the host example.com and recognizes original image links (auto adds category imageOriginal), previews image links (imagePreview) and is able to find originals from a preview (imageByDocument)
addItemByDriver
KellyRecorderFilterExample.addItemByDriver( handler?: KellyPageWatchdog, data?: object {el?: DOM element, item?: object {}}, )
Method is called before data.el DOM element default parsing by kellyPageWatchDog.parseItem
Arguments:
- handler - instance of the kellyPageWatchDog class
- data - object with structure {el, item}, where el is the DOM element processed on the page (every Node.ELEMENT_NODE document element is checked, the exception is 'script', 'iframe', 'frame', 'include-fragment', 'svg' ), item - the current processing object with the images collected in it, created based on the DOM of the el element
Return handler.addDriverAction
You can search image links by kellyPageWatchDog methods
handler.addSrcFromAttributes(el, item, excludeAttributes)
handler.addSrcFromStyle(el, item, groups)
handler.addSingleSrc(item, src, contextStr, el, groups)
In build in methods quality of image links (item.relatedSrc) is determined based on several conditions
- Level of trust for a specific tag (IMG | SOURCE | DIV ...) depending on the tag name and attribute key
- Element attribute name
- Ability to check the file extension based on the link string
Example:
KellyRecorderFilterExample.addItemByDriver = function (handler, data) { /* check if its needed site url, check data.el */ if ( handler.url.indexOf('example.com') != -1 && (data.el.classList.contains('photo') || data.el.classList.contains('thumb')) ) { /* add single image to item.relatedSrc, from data.el */ handler.addSingleSrc(data.item, data.el.src, 'addSrcFromAttributes-src', data.el, 'imagePreview'); /* if no images was added prevent default behavior by return handler.addDriverAction.SKIP and add data.item to common handler.imagesPool array on success */ return data.item.relatedSrc.length > 0 ? handler.addDriverAction.ADD : handler.addDriverAction.SKIP; } }
parseImagesDocByDriver
KellyRecorderFilterExample.parseImagesDocByDriver( handler?: KellyPageWatchdog, data?: object {thread}, )
Method is called on related link document (item.relatedDoc) succesful loaded
Arguments:
- handler - instance of the kellyPageWatchDog class
- data - object {thread}, where thread - result object that contain request controller with result data
Example:
KellyRecorderFilterExample.parseImagesDocByDriver = function(handler, data) { /* is matched url, and response is object (response "content-type" = json) */ if ( handler.url.indexOf('example.ru') != -1 && typeof data.thread.response == 'object' && handler.url.indexOf('original-image-page') != -1 ) { if (data.thread.response.originalImage) { /* add image directly to images array (skip handler.addSingleSrc and such) */ handler.imagesPool.push({relatedSrc : [data.thread.response.originalImage]}); /* prevent default related document parser */ return true; } /* is matched url, and response is setted */ } else if (handler.url.indexOf('example.ru') != -1 && data.thread.response) { var parser = new DOMParser(); var doc = parser.parseFromString(data.thread.response, 'text/html'); var src = doc.querySelector('[property="og:image"]').getAttribute('content'); if (src) handler.imagesPool.push({relatedSrc : [src]}); } }
onInitDocLoader
KellyRecorderFilterExample.onInitDocLoader( handler?: KellyPageWatchdog, data?: object {docLoader?: KellyLoadDocControll, hostList?: []}, )
Event called before load related documents process. Some of configuration params for docLoader can be updated. If return "false" - stops load related document process
Arguments:
- handler - instance of the kellyPageWatchDog class. Default instance used - location (url, host, hostname) setted from first recorded tab
- data - object {docLoader, hostList}, where docLoader - related documents requests controller, hostList - array of all hosts for current image list
Example:
KellyRecorderFilterExample.onInitDocLoader = function(handler, data) { data.docLoader.parser.updateCfg({ pauseEvery : '50', pauseTimer : '1.2,1.8,2,2.4,2.8', maxThreads : '1', }); }
onInitLocation
KellyRecorderFilterExample.onInitLocation( handler?: KellyPageWatchdog, data?: object {url?: string, host?: string, hostname?: string}, )
Event calls after init parser controller (click on "Record" / "Images from Tab")
Arguments:
- handler - instance of the kellyPageWatchDog class
- data - object {url, host, hostname}, url - current parsed document (tab or related document link) url, host - url.hostname + protocol, hostname - url.hostname
Example:
KellyRecorderFilterExample.onInitLocation = function(handler, data) { /* handler.url, handler.host see all accessible variables in kellyPageWatchDog */ }
onInitOptions
KellyRecorderFilterExample.onInitOptions( handler?: KellyPageWatchdog, data?: object {options?: KellyOptions}, )
Event calls on options page shown
Arguments:
- handler - instance of the kellyPageWatchDog class. Default instance used - location (url, host, hostname) setted from first recorded tab
- data - object {options}, where options - instance of KellyOptions (KellyOptions.js)
validateByDriver
KellyRecorderFilterExample.validateByDriver( handler?: KellyPageWatchdog, data?: object {item?: object {}}, )
Calls after all parsing process by parseItem, addItemByDriver when data.item is ready
Arguments:
- handler - instance of the kellyPageWatchDog class
- data - object {item}
Example:
KellyRecorderFilterExample.validateByDriver = function(handler, data) { /* any changes for data.item {relatedSrc, relatedGroups, ...} return false; - prevent adding data.item to common pool */ }
onStartRecord
KellyRecorderFilterExample.onStartRecord( handler?: KellyPageWatchdog, data?: object {context?: string}, )
Calls before parsing process
Arguments:
- handler - instance of the kellyPageWatchDog class
- data - object {context}, where context - string that represent initiator action ("parseImages" - click on "Images from Tab", "isRecorded" - continue after page \ tab loads, "startRecord" - click on "Record")
Example:
KellyRecorderFilterExample.onStartRecord = function(handler, context) { if (handler.url.indexOf('example.ru') == -1) return; handler.additionCats = { example_comment : {name : 'Comment', color : '#b7dd99'}, example_post : {name : 'Post', color : '#b7dd99'}, }; }
Method kellyPageWatchDog.parseItem - private unaccessable method of kellyPageWatchDog.
If addItemByDriver does not return any handler.addDriverAction or return handler.addDriverAction.SKIP, by default collects image links from element attributes (exception attributes - 'name', 'class', 'id', 'type', 'alt', 'title', 'data-md5') to item.relatedSrc from the data.el and tries to find the document link associated with the picture (item.relatedDoc). (search for structures like <a href="[related link]"> ... <img src =" preview "> ... </a>)
Method is used in handler.addSrcFromAttributes and handler.addSrcFromStyle. Adds single image from src string to item.relatedSrc.
Success depends on [contextStr + el.tagName + src string] values validation, that represent overall quality of image url source. Use "addSrcFromAttributes-src" value and el.tagName = 'IMG' for trusted links.
String that contain where image was taken from, in format "addSrcFromAttributes|addSrcFromStyle-[attribute-name]".
Success depends on [contextStr + el.tagName + src string] values validation, that represent overall quality of image url source. Use "addSrcFromAttributes-src" value and el.tagName = 'IMG' for trusted links.
thread - request controller with response data
thread = {response, request {XMLHttpRequest : contentType, status, ...}, job : {url, data}, rules}
where :
- response: string or object (json), type depends on response contentType
- request: XMLHttpRequest object
- rules - array of XMLHttpRequest request custom input settings from relatedDoc ##FETCH_RULES## - ['mark_comments=1', 'method=GET', 'responseType=json', ...],
- job.url: requested url
- job.data: related item {relatedSrc, relatedDoc, ...} object, that initiate related link request
Array of group indexes. Default groups :
imageOriginal - original image
imagePreview - preview
imageAny - useful media picture (possibly original or preview)
imageByDocument - original image (category of identity imageOriginal, but also taken into account when processing through "Upload additional documents")
List can be extended by set array handler.additionCats
array handler.additionCats format
{groupKey1 : {name : 'Test group', color : '#b7dd99', selected : 90, nameTpl : true, groupKey2 : {...}, ...}
handler.addDriverAction possible values :
- handler.addDriverAction.SKIP - skip data.el and do not include data.item in the general data array for images
- handler.addDriverAction.CONTINUE - handling processing by method (kellyPageWatchDog.parseItem). Same behaviour if no return value
- handler.addDriverAction.ADD - add data.item and continue walking through the list of DOM elements
Parser controller that throw addition filters callbacks. Usually can be accessed via handler variable
Accessable variables :
kellyPageWatchDog = {url : string, host : string, hostname : string, imagesPool : [], additionCats : [], srcs : [] }
where
- url - current parsed document (tab or related document link) url
- host - url.hostname + protocol
- hostname - url.hostname
- imagesPool - array of added valid data.item's
- additionCats - addition categories created by custom filter during parsing process
- srcs - array of log of all added data.item.relatedSrc's from all data.items during current parsing session
data.item object structure :
data.item = { relatedDoc, relatedSrc, relatedGroups, referrer }
where :
- relatedDoc: string (link) - link to related documents (by default it is taken from the parent or child DOM element [A] of data.el), see optional request params ##FETCH_RULES## here
- relatedSrc: [array of strings (links)] - an array of absolute links to images
- relatedGroups: [array of array of (group keys ids)] - array of group keys must have same index as link in relatedSrc with which it is associated. Predeclared group keys that can be used - groups
Addition categories can be declared in handler.additionCats array on onInitLocation event - referrer: string - auto generated referrer url. By default used same host, that have tab from which the picture was requested, used for correct requests from the extension page
item.relatedDoc - [string] - related link
Default request configuration for XMLHttpRequest is {method : "GET", responseType : 'text'}
Related link can have addition parametrs after url string to change default request configuration
Example :
http://example.com/original-image-page?get=1&get2=234##FETCH_RULES##method=POST&responseType=json&contentType=application/x-www-form-urlencoded&xRequestedWith=XMLHttpRequest&mark_comment=1
Make POST request with GET params {get : 1, get2 : 234}, responseType = json, and headers {contentType : application/x-www-form-urlencoded, x-requested-with : XMLHttpRequest}
Parametr {mark_comment : 1} and other with prefix mark_ not used in requests, but can be accessed after finish request in KellyRecorderFilterExample.parseImagesDocByDriver (request configuration is stored in thread.rules - array ['mark_comments=1', 'responseType=json', ...])
POST data currently not supported
Addition headers currently not supported
KellyLoadDocControll - related documents requests controller
To dinamicly update request speed params :
KellyLoadDocControll.parser.updateCfg({ pauseEvery : '50', pauseTimer : '1.2,1.8,2,2.4,2.8', timeout : '5', timeoutOnEnd : '0.8', maxThreads : '1', })
All params array. Params with prefix "img" affects on image proportions loading step
KellyLoadDocControll.threadOptions = { pauseEvery : '50', pauseTimer : '1.2,1.8,2,2.4,2.8', timeout : 5, timeoutOnEnd : '0.8', maxThreads : 1, imgLoaderTimeout : 25, imgLoaderMaxThreads : 3, }