DirectoryDownloader

MacroModule
author Wolf Spindler
package FMEwork/ReleaseMeVis
definition DirectoryDownloader.def
see also HTTPFileDownload
keywords web, protocol, file, http, https, directory, download, mirror

Purpose

DirectoryDownloader downloads files from a web “directory”. The files to download are determined by parsing a given web page and stored in a Destination Directory. It is useful to download or mirror files from a web site.

Notes:

  • The HTTP(s) protocol does not support explicit folder or directory accesses which allowed an automatic access to the files shown on a page. The only way to download a web “directory” is to parse the downloaded web page displaying the “directory” for file links, for example by searching for introducing (see Start Sub String) and finishing (see End Sub String) strings of hyper links and to get the strings in between. The found candidate strings are additionally checked for specific suffixes from Supported Suffixes as well as for obviously invalid file names (for example finishing slash or containing “*” or “?” characters) to download only desired file types. This is a weak approach which may not work fine for a number of web pages, however, for many simple sites it works out of the box.
  • If the web site finds files located in other directories than the one of the website, DirectoryDownloader replaces forwards and backward slashes with underscores “_” and stores them with this file name in Destination Directory.
  • Downloading the web page containing the web “directory” is downloaded synchronously, thus its download locks the application GUI a while. Thus ensure that Remote Directory Check Interval is not too small.
  • Downloading found files is done asynchronously in the background and the GUI is updated after completing or interrupting each of them. Therefore this does not affect the responsiveness very much.
  • Web/html pages are usually transferred via the HTTP or HTTPS protocol. Currently other protocols (such as ftp) are not supported.
  • There is still no explicit authentication support implemented.

Windows

Default Panel

../../../Projects/DirectoryDownloader/Modules/mhelp/Images/Screenshots/DirectoryDownloader._default.png

Parameter Fields

Visible Fields

Web Directory URL

name: webDirectoryURL, type: String

The URL of the web file “directory” which shall be parsed for (new) files.

Start Sub String

name: startSubString, type: String, default: <a href="

A string which is searched as introducing string for file names to download, see Purpose for parsing details.

End Sub String

name: endSubString, type: String, default: ">

A string which is searched as string which follows file names to download, see Purpose for parsing details.

Supported Suffixes

name: supportedSuffixes, type: String

A number of space separated and case insensitive suffixes can be listed in Supported Suffixes which shall be downloaded from Web Directory URL. Leaving Supported Suffixes empty will download all files even if they have no suffixes.

Include Start And End Sub Strings

name: includeStartAndEndSubStrings, type: Bool, default: FALSE

If disabled, then Start Sub String and End Sub String are not included in the paths to be determined from the analyzed web page. If enabled they are inserted also in the path.

Suppress File Paths With

name: suppressFilePathsWith, type: String, default: "<>

Some characters are technically allowed in file names, however, when they appear in parsed file paths they probably indicate wrongly detected files. Thus parsing results are usually better if file names containing them are not allowed.

Remote Directory Check Interval

name: remoteDirectoryCheckInterval, type: Float, default: 60, minimum: 0.1, deprecated name: checkTimeInSeconds

The time in seconds between two check for new files in the remote directory. Should not be too short to avoid high work load.

File Download Time Interval

name: fileDownloadTimeInterval, type: Float, default: 5, minimum: 0.1

The waiting time in seconds before a file download is started.

Skip Check Time

name: skipCheckTime, type: Float, default: 0.5, minimum: 0.05

If Skip Existing Files is checked then this is the waiting time in seconds between skipping the download of an existing file and the check for the download of another file. This allows a certain time between two file checks which allows user interaction with the application. If Skip Existing Files is not checked then Skip Check Time is deactivated.

Destination Directory

name: destinationDirectory, type: String

The directory where the matching files in Web Directory URL shall be stored; the directory must exist or error will occur.

Start

name: start, type: Trigger

Manually starts checking Web Directory URL for files and downloads them into Destination Directory.

Stop

name: stop, type: Trigger

Stops checking Web Directory URL for new files.

Log Level

name: logLevel, type: Integer, default: 0, minimum: 0, maximum: 2, deprecated name: consoleLogging

Log Level 0 logs only warnings and errors, 1 additionally general processing information, and 2 also less important or larger amounts of information in the console.

Auto Stop

name: autoStop, type: Bool, default: TRUE

If checked then Stop is automatically performed after the full found file list has been downloaded. Otherwise Web Directory URL will regularly be checked for new files.

Skip Existing Files

name: skipExistingFiles, type: Bool, default: TRUE

If checked then a file already existing in Destination Directory will not be downloaded; if not checked it will be overwritten silently.

Is Running

name: isRunning, type: Bool, persistent: no

Output only: enabled if the module currently runs checks of Web Directory URL for new files each Remote Directory Check Interval or disabled otherwise.

File Download In Progress

name: fileDownloadInProgress, type: Bool, persistent: no

Output only: checked if a file is currently downloaded, otherwise not checked.

New Filename

name: newFilename, type: String, persistent: no

Output only: the full path of the most recently downloaded file or empty otherwise.

New File

name: newFile, type: Trigger

Output only: triggered if a new file has been downloaded successfully and whose full path is available in New Filename.

Check cycle finished

name: checkCycleFinished, type: Trigger

Trigger field which is notified after each file check cycle regardless of any downloaded file or successful download.

Hidden Fields

downloadProgress

name: downloadProgress, type: Float, default: 0, minimum: 0, maximum: 1

Output only: shows the current percentage of a current file download.