Writing a new File Processor
Required Knowledge
Knowledge of the following topics is recommended:
- DocumentName: The DocumentName class that is used to identify files.
- RAT Exclude Expressions: The expressions that are used to match file names.
A file processor is a construct that locates files with a specific name in the directory tree and reads from them file patterns that are translated into RAT include or exclude expressions. These files are normally found in the file directory tree and their restrictions normally only applies to files at the same directory level as the processed file or below. When these files are processed the result is a MatcherSet indicating the files to be explicitly included and the files to be excluded. The include and exclude together are called a org.apache.rat.config.exclusion.MatcherSet
. MatcherSets are build by a org.apache.rat.config.exclusion.MatcherSet.Builder
.
MatcherSet
The matcher set comprises two collections of patterns, one to include and one to exclude. These collections are implemented as DocumentNameMatcher instances. The DocumentNameMatcher patterns are fully qualified to the directory in which the document specified by the DocumentName is found.
The order of the Match patterns are retained. Multiple MatcherSets may be combined into a single MatcherSet.
DocumentNameMatcher
The document name matcher is, as the name says, used to determine if a document name is matched. It comprises a Predicate
to match the file name, the name of the DocumentNameMatcher and a flag to indicate if the matcher is a collection of matchers.
The name is used to provide feedback to identify where the restriction comes from. For example the pattern “/**/foo.txt” may have the pattern as the name of the DocumentNameMatcher while a DocumentNameMatcher of exclusions generated by an exclude file called /MyExcludeFile
may be called “exluded /MyExcludeFile”.
Multiple DocumentNameMatchers may be combined together using the DocumentNameMatcher.Or
or DocumentNameMatcher.And
classes. Additionally, DocumentNameMatchers may be negated by use of the DocumentNameMatcher.Not
class.
AbstractFileProcessorBuilder
In many cases a file processor should process multiple files in the source tree. For example the .gitignore
or .hgignore
files. To implement a file processor that performs a walk down the source tree the AbstractFileProcessorBuilder
is used.
The AbstractFileProcessorBuilder
constructor takes a file name, one or more comment prefixes, and a flag to indicate whether the file name should be listed in the exclude list. The file name normally is a file that is generally hidden on Linux systems like “.gitignore” or “.hgignore”. The AbstractFileProcessorBuilder
will scan the directories looking for files with the specified name. If one is found it is passed to the process(DocumentName)
method which reads the document and returns a MatcherSet.
Classes that extend the AbstractFileProcessorBuilder
have two main extension points: modifyEntry(DocumentName, String)
and process(DocumentName)
.
Extension Points
modifyEntry
The modifyEntry
method accepts the source DocumentName
and a non-comment string. It is expected to process the string and return an exclude expression or null if the line does not result in an exclude expression. The default implementation simply returns the string argument.
An example of modifyEntry
is found in the BazaarIgnoreBuilder
where lines that start with “RE:” are regular expressions and all other lines are standard exclude patterns. The BazaarIgnoreBuilder.modifyEntry
method converts “RE:” prefixed strings into the standard exclude regular expression string.
process
In many cases the process method does not need to be modified. In general the process method:
- Opens a File on the
DocumentName
- Reads each line in the file
- Calls the modifyEntry on the line.
- If the line is not null:
- Uses the
FileProcessor.localizePattern()
to create a DocumentName for the pattern with the baseName specified as the name of the file being read. - Stores the new document name in the list of names being returned.
- Uses the
- Repeats until all the lines in the input file have been read.
Classes that override the process
method generally do so because they have some special cases. For example the GitIgnoreBuilder
has some specific rules about when to add wildcard paths and when the paths are literal. Thus a special process is required.
Theory of Operation
The AbstractFileProcessorBuilder creates MatcherSets for each instance of the target file it finds in the source tree. Those MatcherSets are organized into levels based on how far down the tree the target file is. MatcherSets generated from files in the root of the tree are at level zero while files found in a subdirectory of root are are level 1, and subdirectories of subdirectories of root are at level 2 and so on.
The builder constructs a list of MatcherSets with the MatcherSets from the deepest level combined followed by the MatcherSets from the next deepest level and so on to the shallowest level. This ensures that later files override earlier files.
If files outside the source tree need to be processed they will need to override the process
method to add the processed files at the appropriate level. An example of this can be seen in the GitIgnoreBuilder
code where a global ignore file is added at level -1 because it must be processed after all the explicit includes and excludes found in the source tree.
Debugging
Debugging a DocumentNameMatcher might be difficult due to the nested Predicate nature of the structure. However, the decompose()
method provides a view into the inner operation of the class without having to execute a stepwise debugging session.
Assuming there is a candidate document name that needs to be checked the following code block will output the call tree of the DocumentNameMatcher and show exactly what the result of each test is.
DocumentNameMatcher matcher = ...;
DocumentName candidate = DocumentName.builder()
.setName(dirName+"/dir1/file1.log")
.setBaseName(dirName).build();
System.out.println("Decomposition for " + candidate);
matcher.decompose(candidate).forEach(System.out::println);
The result will list the name of the test, the result of the test, the name of the document being tested, and the predicate being executed. If the predicate is a CompoundPredicate then each of the matchers from the CompoundPredicate will be decomposed as well. The result is a display of all the predicates and an indication of which one, if any, fired.
Examples
All the examples below use /testName
as the candidate name to match.
FileFilter
A DocumentNameMatcher created as: DocumentNameMatcher matcher1 = new DocumentNameMatcher("FileFilterTest", new NameFileFilter("File.name"));
will produce:
FileFilterTest: >>false<< /testName
NameFileFilter(File.name) >>false<<
Multiple patterns
A DocumentNameMatcher created as: DocumentNameMatcher matcher2 = new DocumentNameMatcher("MatchPatternsTest", MatchPatterns.from("/", "**/test1*", "**/*Name"));
will produce:
MatchPatternsTest: >>true<< /testName
**/test1*: >>false<<
org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@465232e9 >>false<<
**/*Name: >>true<<
org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@798162bc >>true<<
Combined patterns
If the above 2 patterns are combined into a single DocumentNameMatcher as: DocumentNameMatcher.matcherSet(matcher1, matcher2);
it will produce:
matcherSet(FileFilterTest, MatchPatternsTest): >>false<< /testName
FileFilterTest: >>false<<
NameFileFilter(File.name) >>false<<
MatchPatternsTest: >>true<<
**/test1*: >>false<<
org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@6f36c2f0 >>false<<
**/*Name: >>true<<
org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@f58853c >>true<<