MISS_HIT includes a simple style checker (mh_style). It can detect and correct (when the --fix options given) a number of coding style issues, most of which are configurable.
Using MISS_HIT Style
The easiest way to use the style checker is to just invoke it on the command-line:
To analyse one or more files:
$ mh_style my_file.m
It is possible to also style-check and fix code embedded inside Simulink® models. To do you need to use a special command-line flag. Once the feature is stable enough, this flag will be removed.
$ mh_style --process-slx --fix my_model.slx
To analyse all files in a directory tree:
$ mh_style src/
To analyse all files in the current directory tree:
Setting up configuration in your project (a worked example)
However, it is very likely that you do not like all default options. MISS_HIT can be configured for projects using configuration files which must be named miss_hit.cfg (or .miss_hit, this alternative exists for people who do not want to have them visible).
The configuration system is based on inheriting options. This is best explained by example. Lets say we have a project that has the following structure:
foo/ foo/foo_main.m foo/lib/potato.m foo/lib/kitten.m foo/external/some_toolkit.mWe have a main program, some library code, but we also use an external toolkit that we've included for convenience.
Lets say we want to configure a tab-width of 8 for our project. We then place a new file in the tree at foo/miss_hit.cfg that contains the following:
However, now we get tons of warnings for the external tool-kit if we just run miss_hit in the project root. We can exclude this directory by adding the following to our config file:
Finally, we want to relax the line length to 120 characters for our library, but not for anything else. To do this we create another config file in foo/lib/miss_hit.cfg and write:
line_length: 120Note that we do not have to repeat the tab-width, this setting is inherited from foo/miss_hit.cfg.
Our tree now looks like this:
foo/ foo/miss_hit.cfg foo/foo_main.m foo/lib/miss_hit.cfg foo/lib/potato.m foo/lib/kitten.m foo/external/some_toolkit.m
Now, when running the style checker on a file or directory the correct settings are automatically applied, using the entire tree of configuration.
Configuration on the command-line
Some options (like line length) can also be configured on the command-line. Command-line options are intended to be temporary, and they take precedence over any options specified in config files.
Options are usually read from configuration files miss_hit.cfg. This behaviour can be disabled entirely with the --ignore-config option.
Configuration file syntax reference
In general the config files follow a simply sytax:
key: valueThe key is some identifier like tab_width, and the value is the configuration for that key. Integers are written directly, and strings are enclosed in double quotes. Comments start with #.
Enable/disable analysis ("enable")
A special entry "enable" into a miss_hit.cfg can be used to enable or disable analysis for the subtree.
For example if you have a lot of legacy code you can put this into your root configuration:
enable: 0 line_length: 100
And then enable analysis for some subdirectories, e.g. in foo/new_code/miss_hit.cfg you can write:
Like any other option, the "closest one" takes precedence. Specifically this means you can disable for a large tree, and enable again for specific sub-trees.
Excluding directories ("exclude_dir")
You can also specify the special "exclude_dir" property in configuration files. This property must name a directory directly inside (i.e. you can't specify foo/bar) the same directory the configuration file resides in. This is especially useful when including an external repository, over which we have limited control.
This is a much more drastic option that "enable: 0", since this permanently excludes a tree from analysis. It cannot be turned on again since the tool will never search excluded directories.
Below is given a more realistic root configuration:
file_length: 1000 line_length: 120 copyright_entity: "Potato Inc." # We include the delightful # miss_hit tools in our repo, # but don't want to accidentally # check their weird test cases exclude_dir: "miss_hit"
Style issues can be justified by placing "mh:ignore_style" into a comment or line continuation. The justification applies to all style issues on that line. Please refer to MISS_HIT Pragmas for a full description of all pragmas understood by MISS_HIT.
% we normally get a message % about no whitespace % surrounding the = x=5; % mh:ignore_style
Justifications that are useless generate a warning.
There are three types of rules:
- Mandatory rules: they are always active and can be automatically fixed
- Autofix rules: they are optional and can be automatically fixed
- Rules: they are optional and cannot be automatically fixed
Rules with a name (for example "whitespace_keywords") can be individually suppressed in or re-enabled in configuration files. For example:
suppress_rule: "operator_whitespace" enable_rule: "file_length"
By default all rules are active.
These rules are always active. The technical reason for this is that it would be too difficult to autofix issues without autofixing these. If you pay me an excessive amount of money I could look into this but I'd rather keep the lexer vaguely sane. All of them are automatically fixed by mh_style when the --fix option is specified.
Trailing newlines at end of file
This mandatory rule makes sure there is a single trailing newline at the end of a file.
Consecutive blank lines
This rule allows a maximum of one blank line to separate code blocks. Comments are not considered blank lines.
Use of tab
This rule enforces the absence of the tabulation character *everywhere*. When auto-fixing, a tab-width of 4 is used by default, but this can be configured with the options 'tab_width'.
Note that the fix replaces the tab everywhere, including in strings literals. This means
might be fixed to
"a b" "a b"
Which may or may not what you had intended originally. I am not sure if this is a bug or a feature, but either way it would be *painful* to change so I am going to leave this as is.
- tab_width: Tab-width, by default 4.
This rule enforces that there is no trailing whitespace in your files. You *really* want to do this, even if the MATLAB default editor makes this really hard. The reason is that it minimises conflicts when using modern version control systems.
The following rules are automatically fixed by mh_style when the --fix option is specified.
File should not start with whitespace ("no_starting_newline")
This rule makes sure the first line in a file is not whitespace.
Whitespace surrounding commas ("whitespace_comma")
This rule enforces whitespace after commas, and no whitespace before commas, e.g. 'foo, bar, baz'.
Whitespace surrounding semicolons ("whitespace_semicolon")
This rule enforces whitespace after semicolons, and no whitespace before semicolons, e.g. 'x = [foo; bar; baz]'.
Whitespace surrounding colon ("whitespace_colon")
This rule enforces no whitespace around colons, except after commas.
Whitespace around assignment ("whitespace_assignment")
This rule enforces whitespace around the assignment operation (=).
Whitespace surrounding brackets ("whitespace_brackets")
This rule enforces whitespace after square and round brackets, and no whitespace before their closing counterpart. For example: [foo, bar]
Whitespace after some words ("whitespace_keywords")
This rule makes sure there is whitespace after some words such as "if" or "properties".
Whitespace in comments ("whitespace_comments")
This rule makes sure there is some whitespace between the comment character (%) and the rest of the comment. The exception is "divisor" comments like "%%%%%%%%%%%%%%" and the pragmas such as "%#codegen".
Whitespace in continuation ("whitespace_continuation")
This rule makes sure there is some whitespace between the last thing on a line and a line continuation.
Continuations followed by terminators ("useless_continuation")
This rule flags up line continuations that are followed by things that would end the statement anyway. For example:
if potato ... x = 1; end
Dangerously misleading continuations ("dangerous_continuation")
This rule identifies continuations that are one code change away from introducing difficult to find bugs. In the MATLAB and Octave language statements are usually terminated by a ; and a newline, but there are a few places where nothing is required. Consider this example:
if potato ... if kitten x = 1; end end
Since the expression for the if-guard does not need termination, the continuation here just happens to work. This rule removes these continuations (or replaces them with comments).
Whitespace around operators ("operator_whitespace")
This rule makes sure binary operators (except for the power operators) are surrounded by whitespace, and unary operators are not followed by whitespace. Like so:
x = -foo + bar; y = x^2;
Whitespace around functions ("whitespace_around_functions")
This rule makes sure functions (including nested functions and class methods) are surrounded by whitespace. In other words:
% (c) Copyright 2020 Florian Schanda function Test_05 x = 12; % This is a function function Potato disp(x); end Potato; end
Is changed to this:
% (c) Copyright 2020 Florian Schanda function Test_05 x = 12; % This is a function function Potato disp(x); end Potato; end
This also works for functions without the end keyword.
Consistent semicolons ("end_of_statements")
This rule enforces consistent statement terminators. It effectively bans commas and requires semicolons + newline at the end of most statements. The exceptions are things like 'return' or the end of compound statements such as 'if'.
x = y, y = z; % commas not allowed x = y; y = z;; % newline required, and spurious semicolon if foo; % useless semicolon disp hello % missing semicolon end; % useless semicolon
All of these issues can be auto-fixed, if the indentation rule is enabled. Otherwise only the subset of issues that does not require adding newlines can be fixed.
This rule enforces consistent indentation and line continuations. It fixes indentation, but leaves the exact amount of extra whitespace added for continuations untouched (for now).
While indentation is usually a popular religious flame-war topic, for the MATLAB language there is not so much room for creativity. The main reason for this is that the language lacks brackets for blocks. If you do feel that you have a specific indentation style that is not catered for here please raise an issue and I will see what I can do. For now there is just one style.
if potato disp (['Hello', ... ' World!']); end
In the above example there is no indentation for the if since it is the top-level statement in a script. The call to disp is indented, since it is part of a compound statement. The continuation is indented to the level of the opening square bracket.
x = some + ... expression;
The continuation in the above example is offset 3 spaces, and this offset will be preserved. If you change the setting of tab_width at any point, this means that the continuation is still properly aligned as chosen by the programmer.
The following constructs cause indentation:
- Any compound statements (e.g. if, switch, etc.)
- Function and class definitions
- The four special blocks (properties, methods, enumeration, or events) inside classes
- The argument validation block for functions
- tab_width: Indent by this many spaces. By default this is 4.
- align_round_brackets: Align continuations inside normal brackets to the opening brace. By default this is true.
- align_other_brackets: Align continuations inside matrix or cell expressions to the opening brace. By default this is true.
Redundant brackets ("redundant_brackets")
This rule enforces removes some brackets that are clearly useless: top-level brackets and double brackets. Brackets that have been added to clarify operator precedence are not touched.
This is an example of redundant brackets:
if (potato) x = ((x + 1)); end
This set of brackets are technically redundant due to operator precedence, but they are left alone since they were probably added for clarity:
x = (a * b) + (b * c);
Spurious commas inside cells and rows ("spurious_row_comma")
This rule complains about unnecessary commas inside matrix and cell expressions. Specifically, both of these mean the same thing, but the trailing and starting comma for a and b respectively are spurious.
a = [1, 2,]; b = [, 1, 2];
Spurious semicolons inside cells and rows ("spurious_row_semicolon")
This rule complains about unnecessary semicolons inside matrix and cell expressions. For example here the semicolon in the first row is useless, because the newline also introduces a new row.
a = [1, 0; 0, 1];
Annotation whitespace ("annotation_whitespace")
This rule enforces whitespace after the annotation indication, i.e. we we make sure things look like this:
%| pragma Potato;
These rules cannot be auto-fixed because there is no "obvious" fix.
Copyright notice ("copyright_notice")
This rules looks for a copyright notice (by default in the docstring of the primary entity). The list of acceptable copyright holders can be configured with copyright_entity. This option can be given more than once to permit a set of valid copyright holders. If this options is not set, the rule just looks for any copyright notice.
copyright_location: The desired format for
copyright notices. This can take one of the following
- docstring - The default. Search the primary function, class, or script docstring for copyright information.
- file_header - Look only at the first line in each file.
- copyright_primary_entity: Can be specified only once, multiple uses of this override each other. This is supposed to be the key copyright holder. This setting is the same as below for the style checker, but has special significance for the MH Copyright tool.
- copyright_entity: Can be specified more than once. Make sure each copyright notice mentions one of these. These should all be your legal entities.
- copyright_3rd_party_entity: Can be specified more than once. These are other copyright holders (e.g. for other code that you have integrated into your project). For the style checker this has no special meaning (it means the same as above), but the copyright utility treats these differently.
- copyright_in_embedded_code: Normally this rule is not enabled on MATLAB code embedded in Simulink® models, since most models carry their copyright notice elsewhere. This flag can be used to turn on this rule for embedded code tool.
copyright_regex: The magic regex to detect
copyright and years. I very strongly
suggest that you do not change this. If you absolutely
must have a different notice than the default, then the
regex must include at least these named groups: copy,
ystart, yend, and org.
The default is the highly readable:
(?P<copy>(Copyright \([cC]\))|((\([cC]\) )?Copyright)) +((?P<ystart>\d\d\d\d)(-| - ))?(?P<yend>\d\d\d\d)( by)? *(?P<org>.*)Again, please do not change this. Right now the tools don't validate this and you will get strange behaviour if you mess this up. Please. Just don't.
For example, an acceptable copyright notice using the docstring approach looks like this:
function rv = Byte_Add_One(x) % BYTE_ADD_ONE This adds one to the input % Note: on overflow, it saturates % % (c) Copyright 2021 Florian Schanda rv = x; if x < 255 rv = rv + 1; end end
With the file_header approach, our notice should look like this:
% (c) Copyright 2021 Florian Schanda function rv = Byte_Add_One(x) % BYTE_ADD_ONE This adds one to the input % Note: on overflow, it saturates rv = x; if x < 255 rv = rv + 1; end end
Note that if a function or class does not contain a docstring, then we look at the docstring of the file instead, so generally speaking the docstring setting is a superset of, and compatible with, the file_header setting. However, if your file has a copyright notices in *both* the file header and the primary function or class, then this is not accepted.
Naming scheme for classes ("naming_classes")
This rule enforces a consistent naming for all user-defined classes.
regex_class_name: A regular expression that every
class must match. By default it is:
([A-Z]+|[A-Z][a-z]*)(_([A-Z]+|[A-Z][a-z]*|[0-9]+))*This regular expression encodes the "Ada" naming scheme which is in my opinion probably the most descriptive and consistent naming scheme. It requires underscore-separated acronyms or capitalised words. Good class names under this scheme are:
- PotatoFarmer (no underscore)
- hamster_Monitor (not capitalised)
- LASERActuator (no underscore)
- Sharks_ (trailing underscore)
- Bad__Name (double underscore)
Naming scheme for functions ("naming_functions")
This rule enforces a consistent naming for all user-defined functions and methods.
- regex_function_name: A regular expression that every ordinary function must match. The default is the same as it is for classes. (See above.)
- regex_nested_name: A regular expression that every nested function must match. The default is the same as it is for classes. (See above.)
regex_method_name: A regular expression that
every class method must match. The default is
[a-z]+(_[a-z]+)*This is all lower-case, underscore separated names.
Naming scheme for scripts ("naming_scripts")
This rule enforces a consistent naming for all script files. Note that function files and class files are not covered by this rule, only pure script files.
- regex_script_name: A regular expression that every script file (without .m extension) must match. The default is the same as it is for classes. (See above.)
Naming scheme for parameters ("naming_parameters")
This rule enforces a consistent naming for input and output parameters of functions and methods.
- regex_parameter_name: A regular expression that every parameter must match. The default all lower-case with underscores.
Naming scheme for parameters ("naming_enumerations")
This rule enforces a consistent naming enumerations in a class definition.
- regex_enumeration_name: A regular expression that every enumeration must match. The default is the same as for classes. (See above.)
Non-ASCII characters in source ("unicode")
This rule enforces source files to only contain ASCII characters. This is generally a good idea, because allowing non-ascii characters creates all sorts of annoying portability issues.
- enforce_encoding: A string that can be any valid Python encoding to enforce. By default this is "ASCII".
- enforce_encoding_comments: A boolean, by default true. This controls if the rule also checks comments and continuations, not just program text.
Note: currently nothing can be auto-fixed here, but I plan to add support to automatically convert from one valid encoding to another. However even then, characters that are outside the valid set will never be auto-fixed (e.g: it is impossible to decide if ä should be translated as a or ae or something else entirely).
Maximum file length ("file_length")
This is configurable with 'file_length'. It is a good idea to keep the length of your files under some limit since it forces your project into avoiding the worst spaghetti code.
- file_length: Maximum lines in a file, 1000 by default.
Max characters per line ("line_length")
This is configurable with 'line_length', default is 80. It is a good idea for readability to avoid overly long lines. This can help you avoid extreme levels of nesting and avoids having to scroll around.
- line_length: Maximum characters per line, 80 by default.