MISS_HIT includes a simple style
checker (mh_style). It can detect
and correct (when the --fix options given) a number of coding
style issues, most of which are configurable.
Configuration
Using MISS_HIT Style
The easiest way to use the style checker is to just invoke it
on the command-line:
To analyse one or more files:
$ mh_style my_file.m
It is possible to also style-check and fix code embedded
inside Simulink® models. To do you need to use a special
command-line flag. Once the feature is stable enough, this
flag will be removed.
$ mh_style --process-slx --fix my_model.slx
To analyse all files in a directory tree:
$ mh_style src/
To analyse all files in the current directory tree:
$ mh_style
Setting up configuration in your project (a worked example)
However, it is very likely that you do not like all default
options. MISS_HIT can be configured for projects using
configuration files which must be
named miss_hit.cfg
(or .miss_hit, this alternative
exists for people who do not want to have them visible).
The configuration system is based on inheriting options. This
is best explained by example. Lets say we have a project that
has the following structure:
foo/ foo/foo_main.m foo/lib/potato.m foo/lib/kitten.m foo/external/some_toolkit.mWe have a main program, some library code, but we also use an external toolkit that we've included for convenience.
Lets say we want to configure a tab-width of 8 for our
project. We then place a new file in the tree at
foo/miss_hit.cfg that contains the
following:
tab_width: 8
However, now we get tons of warnings for the external tool-kit
if we just run miss_hit in the project root. We can exclude
this directory by adding the following to our config file:
exclude_dir: "external"
Finally, we want to relax the line length to 120 characters
for our library, but not for anything else. To do this we
create another config file
in foo/lib/miss_hit.cfg and write:
line_length: 120Note that we do not have to repeat the tab-width, this setting is inherited from foo/miss_hit.cfg.
Our tree now looks like this:
foo/ foo/miss_hit.cfg foo/foo_main.m foo/lib/miss_hit.cfg foo/lib/potato.m foo/lib/kitten.m foo/external/some_toolkit.m
Now, when running the style checker on a file or directory the
correct settings are automatically applied, using the entire
tree of configuration.
Configuration on the command-line
Some options (like line length) can also be configured on the
command-line. Command-line options are intended to be
temporary, and they take precedence over any options specified
in config files.
Options are usually read from configuration
files miss_hit.cfg. This behaviour
can be disabled entirely with the --ignore-config option.
Configuration file syntax reference
In general the config files follow a simply sytax:
key: valueThe key is some identifier like tab_width, and the value is the configuration for that key. Integers are written directly, and strings are enclosed in double quotes. Comments start with #.
Enable/disable analysis ("enable")
A special entry "enable" into
a miss_hit.cfg can be used to enable
or disable analysis for the subtree.
For example if you have a lot of legacy code you can put this
into your root configuration:
enable: 0 line_length: 100
And then enable analysis for some subdirectories, e.g. in
foo/new_code/miss_hit.cfg you can
write:
enable: 1
Like any other option, the "closest one" takes
precedence. Specifically this means you can disable for a
large tree, and enable again for specific sub-trees.
Excluding directories ("exclude_dir")
You can also specify the special "exclude_dir" property in
configuration files. This property must name a directory
directly inside (i.e. you can't
specify foo/bar) the same directory
the configuration file resides in. This is especially useful
when including an external repository, over which we have
limited control.
This is a much more drastic option that "enable: 0", since
this permanently excludes a tree from analysis. It cannot be
turned on again since the tool will never search excluded
directories.
Below is given a more realistic root configuration:
file_length: 1000 line_length: 120 copyright_entity: "Potato Inc." # We include the delightful # miss_hit tools in our repo, # but don't want to accidentally # check their weird test cases exclude_dir: "miss_hit"
Justifications
Style issues can be justified by placing "mh:ignore_style"
into a comment or line continuation. The justification applies
to all style issues on that line. Please refer
to MISS_HIT Pragmas for a full
description of all pragmas understood by MISS_HIT.
% we normally get a message % about no whitespace % surrounding the = x=5; % mh:ignore_style
Justifications that are useless generate a warning.
Style Rules
There are three types of rules:
- Mandatory rules: they are always active and can be automatically fixed
- Autofix rules: they are optional and can be automatically fixed
- Rules: they are optional and cannot be automatically fixed
Rules with a name (for example "whitespace_keywords") can be
individually suppressed in or re-enabled in configuration files. For
example:
suppress_rule: "operator_whitespace" enable_rule: "file_length"
By default all rules are active.
Mandatory rules
These rules are always active. The technical reason for this is
that it would be too difficult to autofix issues without
autofixing these. If you pay me an excessive amount of money I
could look into this but I'd rather keep the lexer vaguely
sane. All of them are automatically fixed
by mh_style when the --fix option
is specified.
Trailing newlines at end of file
This mandatory rule makes sure there is a single trailing newline
at the end of a file.
Consecutive blank lines
This rule allows a maximum of one blank line to separate code blocks.
Comments are not considered blank lines.
Use of tab
This rule enforces the absence of the tabulation character
*everywhere*. When auto-fixing, a tab-width of 4 is used by default,
but this can be configured with the options 'tab_width'.
Note that the fix replaces the tab everywhere, including in strings
literals. This means
"a<tab>b" "a<tab>b"
might be fixed to
"a b" "a b"
Which may or may not what you had intended originally. I am not sure
if this is a bug or a feature, but either way it would be *painful* to
change so I am going to leave this as is.
Configuration parameters:
- tab_width: Tab-width, by default 4.
Trailing whitespace
This rule enforces that there is no trailing whitespace in your files.
You *really* want to do this, even if the MATLAB default editor makes
this really hard. The reason is that it minimises conflicts when using
modern version control systems.
Newlines
This rule enforces consistent newlines in your files, with the
default being your platform's newlines.
Configuration parameters:
- newline_style: "native" (the default), "lf", "crlf", or "cr"
It is strongly advised to not change this, and instead
configure git correctly to deal with this problem.
Autofix rules
The following rules are automatically fixed
by mh_style when the --fix option
is specified.
File should not start with whitespace ("no_starting_newline")
This rule makes sure the first line in a file is not whitespace.
Whitespace surrounding commas ("whitespace_comma")
This rule enforces whitespace after commas, and no whitespace
before commas, e.g. 'foo, bar, baz'.
Whitespace surrounding semicolons ("whitespace_semicolon")
This rule enforces whitespace after semicolons, and no whitespace
before semicolons, e.g. 'x = [foo; bar; baz]'.
Whitespace surrounding colon ("whitespace_colon")
This rule enforces no whitespace around colons, except after
commas.
Whitespace around assignment ("whitespace_assignment")
This rule enforces whitespace around the assignment operation
(=).
Whitespace surrounding brackets ("whitespace_brackets")
This rule enforces whitespace after square and round brackets,
and no whitespace before their closing counterpart. For
example: [foo, bar]
Whitespace after some words ("whitespace_keywords")
This rule makes sure there is whitespace after some words such
as "if" or "properties".
Whitespace in comments ("whitespace_comments")
This rule makes sure there is some whitespace between the
comment character (%) and the rest of the comment. The exception
is "divisor" comments like "%%%%%%%%%%%%%%" and the pragmas such
as "%#codegen".
Whitespace in continuation ("whitespace_continuation")
This rule makes sure there is some whitespace between the last
thing on a line and a line continuation.
Continuations followed by terminators ("useless_continuation")
This rule flags up line continuations that are followed by
things that would end the statement anyway. For example:
if potato ... x = 1; end
Dangerously misleading continuations ("dangerous_continuation")
This rule identifies continuations that are one code change
away from introducing difficult to find bugs. In the MATLAB
and Octave language statements are usually terminated by a ;
and a newline, but there are a few places where nothing is
required. Consider this example:
if potato ... if kitten x = 1; end end
Since the expression for the if-guard does not need
termination, the continuation here just happens to work. This
rule removes these continuations (or replaces them with
comments).
Whitespace around operators ("operator_whitespace")
This rule makes sure binary operators (except for the power
operators) are surrounded by whitespace, and unary operators
are not followed by whitespace. Like so:
x = -foo + bar; y = x^2;
Whitespace around functions ("whitespace_around_functions")
This rule makes sure functions (including nested functions and
class methods) are surrounded by whitespace. In other words:
% (c) Copyright 2020 Florian Schanda function Test_05 x = 12; % This is a function function Potato disp(x); end Potato; end
Is changed to this:
% (c) Copyright 2020 Florian Schanda function Test_05 x = 12; % This is a function function Potato disp(x); end Potato; end
This also works for functions without the end keyword.
Consistent semicolons ("end_of_statements")
This rule enforces consistent statement terminators. It
effectively bans commas and requires semicolons + newline at
the end of most statements. The exceptions are things like
'return' or the end of compound statements such as 'if'.
x = y, y = z; % commas not allowed x = y; y = z;; % newline required, and spurious semicolon if foo; % useless semicolon disp hello % missing semicolon end; % useless semicolon
All of these issues can be auto-fixed, if the indentation rule
is enabled. Otherwise only the subset of issues that does not
require adding newlines can be fixed.
Indentation ("indentation")
This rule enforces consistent indentation and line
continuations. It fixes indentation, but leaves the exact
amount of extra whitespace added for continuations untouched
(for now).
While indentation is usually a popular religious flame-war
topic, for the MATLAB language there is not so much room for
creativity. The main reason for this is that the language
lacks brackets for blocks. If you do feel that you have a
specific indentation style that is not catered for here please
raise an issue and I will see what I can do. For now there is
just one style.
if potato disp (['Hello', ... ' World!']); end
In the above example there is no indentation for the if since
it is the top-level statement in a script. The call to disp is
indented, since it is part of a compound statement. The
continuation is indented to the level of the opening square
bracket.
x = some + ... expression;
The continuation in the above example is offset 3 spaces, and
this offset will be preserved. If you change the setting of
tab_width at any point, this means that the continuation is
still properly aligned as chosen by the programmer.
The following constructs cause indentation:
- Any compound statements (e.g. if, switch, etc.)
- Function and class definitions
- The four special blocks (properties, methods, enumeration, or events) inside classes
- The argument validation block for functions
Configuration parameters:
- tab_width: Indent by this many spaces. By default this is 4.
- indent_function_file_body: Indent the body of the top-level functions in a function file. By default this is true. What you get with this is sensibly and consistently indented code. If you set it to false, then you get the odd convention of NOT indenting the function body, which appears to be somewhat common in the MATLAB world. This only applies to functions in function files. Functions in e.g. classes are always indented normally.
- align_round_brackets: Align continuations inside normal brackets to the opening brace. By default this is true.
- align_other_brackets: Align continuations inside matrix or cell expressions to the opening brace. By default this is true.
Redundant brackets ("redundant_brackets")
This rule enforces removes some brackets that are clearly
useless: top-level brackets and double brackets. Brackets that
have been added to clarify operator precedence are not
touched.
This is an example of redundant brackets:
if (potato) x = ((x + 1)); end
This set of brackets are technically redundant due to operator
precedence, but they are left alone since they were probably
added for clarity:
x = (a * b) + (b * c);
Spurious commas inside cells and rows ("spurious_row_comma")
This rule complains about unnecessary commas inside matrix and
cell expressions. Specifically, both of these mean the same
thing, but the trailing and starting comma for a
and b respectively are spurious.
a = [1, 2,]; b = [, 1, 2];
Spurious semicolons inside cells and rows ("spurious_row_semicolon")
This rule complains about unnecessary semicolons inside matrix
and cell expressions. For example here the semicolon in the
first row is useless, because the newline also introduces a
new row.
a = [1, 0; 0, 1];
Annotation whitespace ("annotation_whitespace")
This rule enforces whitespace after the annotation indication,
i.e. we we make sure things look like this:
%| pragma Potato;
Rules
These rules cannot be auto-fixed because there is no "obvious"
fix.
Copyright notice ("copyright_notice")
This rules looks for a copyright notice (by default in the
docstring of the primary entity). The list of acceptable
copyright holders can be configured with
copyright_entity. This option can be given more than once to
permit a set of valid copyright holders. If this options is
not set, the rule just looks for any copyright notice.
Configuration parameters:
-
copyright_location: The desired format for
copyright notices. This can take one of the following
values:
- docstring - The default. Search the primary function, class, or script docstring for copyright information.
- file_header - Look only at the first line in each file.
- copyright_primary_entity: Can be specified only once, multiple uses of this override each other. This is supposed to be the key copyright holder. This setting is the same as below for the style checker, but has special significance for the MH Copyright tool.
- copyright_entity: Can be specified more than once. Make sure each copyright notice mentions one of these. These should all be your legal entities.
- copyright_3rd_party_entity: Can be specified more than once. These are other copyright holders (e.g. for other code that you have integrated into your project). For the style checker this has no special meaning (it means the same as above), but the copyright utility treats these differently.
- copyright_in_embedded_code: Normally this rule is not enabled on MATLAB code embedded in Simulink® models, since most models carry their copyright notice elsewhere. This flag can be used to turn on this rule for embedded code tool.
-
copyright_regex: The magic regex to detect
copyright and years. I very strongly
suggest that you do not change this. If you absolutely
must have a different notice than the default, then the
regex must include at least these named groups: copy,
ystart, yend, and org.
The default is the highly readable:
(?P<copy>(Copyright \([cC]\))|((\([cC]\) )?Copyright)) +((?P<ystart>\d\d\d\d)(-| - ))?(?P<yend>\d\d\d\d)( by)? *(?P<org>.*)
Again, please do not change this. Right now the tools don't validate this and you will get strange behaviour if you mess this up. Please. Just don't.
For example, an acceptable copyright notice using the docstring
approach looks like this:
function rv = Byte_Add_One(x) % BYTE_ADD_ONE This adds one to the input % Note: on overflow, it saturates % % (c) Copyright 2021 Florian Schanda rv = x; if x < 255 rv = rv + 1; end end
With the file_header approach, our notice should look
like this:
% (c) Copyright 2021 Florian Schanda function rv = Byte_Add_One(x) % BYTE_ADD_ONE This adds one to the input % Note: on overflow, it saturates rv = x; if x < 255 rv = rv + 1; end end
Note that if a function or class does not contain a docstring,
then we look at the docstring of the file instead, so
generally speaking the docstring setting is a
superset of, and compatible with, the file_header
setting. However, if your file has a copyright notices in
*both* the file header and the primary function or class, then
this is not accepted.
Naming scheme for classes ("naming_classes")
This rule enforces a consistent naming for all user-defined
classes.
Configuration parameters:
-
regex_class_name: A regular expression that every
class must match. By default it is:
([A-Z]+|[A-Z][a-z]*)(_([A-Z]+|[A-Z][a-z]*|[0-9]+))*
This regular expression encodes the "Ada" naming scheme which is in my opinion probably the most descriptive and consistent naming scheme. It requires underscore-separated acronyms or capitalised words. Good class names under this scheme are:- Kitten_Class
- LASER
- OS_Monitor
- PotatoFarmer (no underscore)
- hamster_Monitor (not capitalised)
- LASERActuator (no underscore)
- Sharks_ (trailing underscore)
- Bad__Name (double underscore)
Naming scheme for functions ("naming_functions")
This rule enforces a consistent naming for all user-defined
functions, methods, getters, and setters.
Configuration parameters:
- regex_function_name: A regular expression that every ordinary function must match. The default is the same as it is for classes. (See above.)
- regex_nested_name: A regular expression that every nested function must match. The default is the same as it is for classes. (See above.)
-
regex_method_name: A regular expression that
every class method must match. The default is
[a-z]+(_[a-z]+)*
This is all lower-case, underscore separated names. - regex_attribute_name: A regular expression that every class attribute and associated getter/setter must match. The default is the same as it is for classes. (See above.)
Naming scheme for scripts ("naming_scripts")
This rule enforces a consistent naming for all script
files. Note that function files and class files are not
covered by this rule, only pure script files.
Configuration parameters:
- regex_script_name: A regular expression that every script file (without .m extension) must match. The default is the same as it is for classes. (See above.)
Naming scheme for parameters ("naming_parameters")
This rule enforces a consistent naming for input and output
parameters of functions and methods.
Configuration parameters:
- regex_parameter_name: A regular expression that every parameter must match. The default all lower-case with underscores.
Naming scheme for parameters ("naming_enumerations")
This rule enforces a consistent naming enumerations in a class
definition.
Configuration parameters:
- regex_enumeration_name: A regular expression that every enumeration must match. The default is the same as for classes. (See above.)
Non-ASCII characters in source ("unicode")
This rule enforces source files to only contain ASCII
characters. This is generally a good idea, because allowing
non-ascii characters creates all sorts of annoying portability
issues.
Configuration parameters:
- enforce_encoding: A string that can be any valid Python encoding to enforce. By default this is "ASCII".
- enforce_encoding_comments: A boolean, by default true. This controls if the rule also checks comments and continuations, not just program text.
Note: currently nothing can be auto-fixed here, but I plan to
add support to automatically convert from one valid encoding
to another. However even then, characters that are outside the
valid set will never be auto-fixed (e.g: it is impossible to
decide if ä should be translated as a or ae or something
else entirely).
Maximum file length ("file_length")
This is configurable with 'file_length'. It is a good idea to keep
the length of your files under some limit since it forces your
project into avoiding the worst spaghetti code.
Configuration parameters:
- file_length: Maximum lines in a file, 1000 by default.
Max characters per line ("line_length")
This is configurable with 'line_length', default is 80. It is a
good idea for readability to avoid overly long lines. This can help
you avoid extreme levels of nesting and avoids having to scroll
around.
Configuration parameters:
- line_length: Maximum characters per line, 80 by default.