Fork me on GitHub

Writing a new File Processor

Required Knowledge

Knowledge of the following topics is recommended:

A file processor is a construct that locates files with a specific name in the directory tree and reads from them file patterns that are translated into RAT include or exclude expressions. These files are normally found in the file directory tree and their restrictions normally only applies to files at the same directory level as the processed file or below. When these files are processed the result is a MatcherSet indicating the files to be explicitly included and the files to be excluded. The include and exclude together are called a org.apache.rat.config.exclusion.MatcherSet. MatcherSets are build by a org.apache.rat.config.exclusion.MatcherSet.Builder.

MatcherSet

The matcher set comprises two collections of patterns, one to include and one to exclude. These collections are implemented as DocumentNameMatcher instances. The DocumentNameMatcher patterns are fully qualified to the directory in which the document specified by the DocumentName is found.

The order of the Match patterns are retained. Multiple MatcherSets may be combined into a single MatcherSet.

DocumentNameMatcher

The document name matcher is, as the name says, used to determine if a document name is matched. It comprises a Predicate to match the file name, the name of the DocumentNameMatcher and a flag to indicate if the matcher is a collection of matchers.

The name is used to provide feedback to identify where the restriction comes from. For example the pattern “/**/foo.txt” may have the pattern as the name of the DocumentNameMatcher while a DocumentNameMatcher of exclusions generated by an exclude file called /MyExcludeFile may be called “exluded /MyExcludeFile”.

Multiple DocumentNameMatchers may be combined together using the DocumentNameMatcher.Or or DocumentNameMatcher.And classes. Additionally, DocumentNameMatchers may be negated by use of the DocumentNameMatcher.Not class.

AbstractFileProcessorBuilder

In many cases a file processor should process multiple files in the source tree. For example the .gitignore or .hgignore files. To implement a file processor that performs a walk down the source tree the AbstractFileProcessorBuilder is used.

The AbstractFileProcessorBuilder constructor takes a file name, one or more comment prefixes, and a flag to indicate whether the file name should be listed in the exclude list. The file name normally is a file that is generally hidden on Linux systems like “.gitignore” or “.hgignore”. The AbstractFileProcessorBuilder will scan the directories looking for files with the specified name. If one is found it is passed to the process(DocumentName) method which reads the document and returns a MatcherSet.

Classes that extend the AbstractFileProcessorBuilder have two main extension points: modifyEntry(DocumentName, String) and process(DocumentName).

Extension Points

modifyEntry

The modifyEntry method accepts the source DocumentName and a non-comment string. It is expected to process the string and return an exclude expression or null if the line does not result in an exclude expression. The default implementation simply returns the string argument.

An example of modifyEntry is found in the BazaarIgnoreBuilder where lines that start with “RE:” are regular expressions and all other lines are standard exclude patterns. The BazaarIgnoreBuilder.modifyEntry method converts “RE:” prefixed strings into the standard exclude regular expression string.

process

In many cases the process method does not need to be modified. In general the process method:

  • Opens a File on the DocumentName
  • Reads each line in the file
  • Calls the modifyEntry on the line.
  • If the line is not null:
    • Uses the FileProcessor.localizePattern() to create a DocumentName for the pattern with the baseName specified as the name of the file being read.
    • Stores the new document name in the list of names being returned.
  • Repeats until all the lines in the input file have been read.

Classes that override the process method generally do so because they have some special cases. For example the GitIgnoreBuilder has some specific rules about when to add wildcard paths and when the paths are literal. Thus a special process is required.

Theory of Operation

The AbstractFileProcessorBuilder creates MatcherSets for each instance of the target file it finds in the source tree. Those MatcherSets are organized into levels based on how far down the tree the target file is. MatcherSets generated from files in the root of the tree are at level zero while files found in a subdirectory of root are are level 1, and subdirectories of subdirectories of root are at level 2 and so on.

The builder constructs a list of MatcherSets with the MatcherSets from the deepest level combined followed by the MatcherSets from the next deepest level and so on to the shallowest level. This ensures that later files override earlier files.

If files outside the source tree need to be processed they will need to override the process method to add the processed files at the appropriate level. An example of this can be seen in the GitIgnoreBuilder code where a global ignore file is added at level -1 because it must be processed after all the explicit includes and excludes found in the source tree.

Debugging

Debugging a DocumentNameMatcher might be difficult due to the nested Predicate nature of the structure. However, the decompose() method provides a view into the inner operation of the class without having to execute a stepwise debugging session.

Assuming there is a candidate document name that needs to be checked the following code block will output the call tree of the DocumentNameMatcher and show exactly what the result of each test is.

    DocumentNameMatcher matcher = ...;
    DocumentName candidate = DocumentName.builder()
            .setName(dirName+"/dir1/file1.log")
            .setBaseName(dirName).build();
    System.out.println("Decomposition for " + candidate);
    matcher.decompose(candidate).forEach(System.out::println);

The result will list the name of the test, the result of the test, the name of the document being tested, and the predicate being executed. If the predicate is a CompoundPredicate then each of the matchers from the CompoundPredicate will be decomposed as well. The result is a display of all the predicates and an indication of which one, if any, fired.

Examples

All the examples below use /testName as the candidate name to match.

FileFilter

A DocumentNameMatcher created as: DocumentNameMatcher matcher1 = new DocumentNameMatcher("FileFilterTest", new NameFileFilter("File.name"));

will produce:

FileFilterTest: >>false<< /testName
  NameFileFilter(File.name) >>false<<

Multiple patterns

A DocumentNameMatcher created as: DocumentNameMatcher matcher2 = new DocumentNameMatcher("MatchPatternsTest", MatchPatterns.from("/", "**/test1*", "**/*Name"));

will produce:

MatchPatternsTest: >>true<< /testName
  **/test1*: >>false<<
    org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@465232e9 >>false<<
  **/*Name: >>true<<
    org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@798162bc >>true<<

Combined patterns

If the above 2 patterns are combined into a single DocumentNameMatcher as: DocumentNameMatcher.matcherSet(matcher1, matcher2);

it will produce:

matcherSet(FileFilterTest, MatchPatternsTest): >>false<< /testName
  FileFilterTest: >>false<<
    NameFileFilter(File.name) >>false<<
  MatchPatternsTest: >>true<<
    **/test1*: >>false<<
      org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@6f36c2f0 >>false<<
    **/*Name: >>true<<
      org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@f58853c >>true<<