Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coding Style #6831

Open
guserav opened this issue Jul 22, 2019 · 10 comments
Open

Coding Style #6831

guserav opened this issue Jul 22, 2019 · 10 comments

Comments

@guserav
Copy link
Contributor

guserav commented Jul 22, 2019

Background information

After @jsquyres comment on my Pull Request I have looked into the Coding Style and noticed the first comment:

NEVER use actual tab characters; always use spaces. Both emacs and vim have secret mojo that can automatically use spaces when you hit the key. This makes the code look the same in every browser, regardless of individual tab display settings.

That seemed a little bit odd to me as I was working on some files which definitely contained a funny mix of tabs and spaces.

At the time I didn't say anything about it and thought it was no big deal. Later I had a boring day and decided to grep for tabs and noticed that there are quite a few in the current code base. After mentioning this to @hkuno she mentioned it in a side note in an email to @jsquyres. This resulted in a small email thread and some accurate numbers (I was counting files not maintained in this repo) for c files from @jsquyres:

# Total number of .c/.h non-imported files
$ find . | grep -v .git/ | egrep '\.[ch]$' | egrep -v 'hwloc/|libevent/|pmix4x/|treematch/' |  wc -l
    3919
# How many of those have tabs
$ ack '\t' --ignore-dir=hwloc --ignore-dir=treematch --ignore-dir=libevent --ignore-dir=pmix4x --type=cc  -l | wc -l
     925

So ~23% of the code base has mixed indentation files (Thus violating the Coding Style). As @jsquyres agrees that this is a issue he asked me to open this issue so that the community can discuss how to handle the problem.

Solution approach

The best solution for this would be to make a big Pull request (Or four one for (ompi, orte, opal, oshmen) as requested by @jsquyres) that fixes this issue and then have a CI check running that flags PRs introducing <tab> characters.

@jsquyres has asked me to open these PRs but I found two completely different approaches of fixing the <tab> situation. Therefor I would like to hold off before doing that until this discussion has yielded a decision.

Possibilities

As described above, this started with just checking for the tabs in the code base, but while looking into CI tools that can check for tabs and the best way of replacing the tabs in the existing code base I noticed that it would also be possible to decide on a complete coding style. Therefore in the below possibilities I also go into clang-format.

Only tabs

Replacing them in the existing files

Possibly the best solution would be to just replace them with a small command like:

find . \( -name '.git' -o -name 'hwloc' -o -name 'libevent' -o -name 'pmix4x' -o -name 'treematch' \) -prune -o \( \( -name '*.h' -o -name '*.c' \) -type f -exec bash -c 'expand -t 8 "$0" | sponge "$0"' \{\} \; \)

I am assuming (based on the small number of files I saw) that tabs are used instead of 8 spaces.

Checking further PRs

No for the more interesting list of possibilities on how to prevent further <Tabs>.

@jsquyres has proposed to use Probotfor that. As I had never heard of Probot I searched for a existing bot doing what we want and found eslint-disable-probot which could be easily changed into checking for tabs.

As I only did a short search on this topic I would be glad if anyone could point out an existing tools which checks for PRs introducing <Tabs>. This would be nice so there is no need in maintaining a custom solution to this problem.

Complete Coding Style with clang-format

Because I think that there is the possibility that there are some files using tabs instead of 4 spaces I also looked into the possibility of having a tool that can replace the existing <Tabs> whilst being aware of the surrounding code and thus the correct indentation.

I found clang-format:

A tool to format C/C++/Java/JavaScript/Objective-C/Protobuf/C# code.

And some wrappers around it that can be used in a CI solution and development work.
git clang-format

A clang-format integration for git.

run-clung-format

A wrapper script around clang-format, suitable for linting multiple files and to use for continuous integration

The "problem" with clang-format is that it seems to be not possible to just correct indentation in files as it will also enforce other coding styles aspects even when they are not specified in the style file (At least I didn't find a option for really only doing the config). This means when using clang-format it is probably best to decide on a coding style and then to make the PRs to change the current code base to reflect the decision.

Discussion

In general it has to be decided whether to only enforce the small portion of guidelines that are pointed out in the existing Coding Style or to decide on a complete Coding Style and enforce it by tools like clang-format. As this decision can be influenced by the tools that are used to enforce it I would be glad if you would point out other tools that can help with the problem of coding style and tabs.

@rhc54
Copy link
Contributor

rhc54 commented Jul 22, 2019

Please look at the existing tools for this problem in the contrib directory:

https://github.com/open-mpi/ompi/blob/master/contrib/purge-tab-indents.pl

https://github.com/open-mpi/ompi/blob/master/contrib/whitespace-purge.sh

https://github.com/open-mpi/ompi/blob/master/contrib/purge-trailing-blank-lines.pl

https://github.com/open-mpi/ompi/blob/master/contrib/fix_indent.pl

https://github.com/open-mpi/ompi/blob/master/contrib/update-my-copyright.pl

The truth is that we have developed multiples of these tools over the years, and some of us remember to use them - but not everyone does. I have several of these activated in my git commit configuration so everything I edit gets updated. Easy enough to just run a scan across the code base.

Completely changing our coding style just to swat this mosquito seems excessive.

@ggouaillardet
Copy link
Contributor

@guserav note ompi/mca/io/romio321/romio is "imported" from MPICH (with some modifications).

FWIW, MPICH policy is not to replace tabs by white spaces only for the sake of it. But if a file is updated in a PR, it should have all leading tabs replaced by white spaces in order to be accepted. That sounds like a sane and pragmatic policy to me, and as far as I am concerned, I will welcome a bot/CI test that flags PR contains new code with leading tabs. There should be exceptions though. Code review can sometimes be made much easier by splitting code changes and tabs cleanup in distinct commits.

@rhc54
Copy link
Contributor

rhc54 commented Jul 23, 2019

The problem I have encountered with some of these policies is that they tend to discourage automation like we have developed (as cited above). Every time I edit a file, the automation goes thru and cleans up the whitespace violations...and then someone complains that the diff includes things other than code changes. This pushes one to turn off the automation - and opens the door to committing code style violations.

It would be easier if we just learned to live with a mix of code and whitespace changes for the few times they are encountered, or just do a pass thru the code base and make one giant cleanup commit. Then we can encourage people to add the automation to their git config, and add CI automation to check the style (at least for whitespace problems).

@jsquyres
Copy link
Member

Good discussion on the webex today. I think we landed on a few things:

  1. Let's restrict a first round of this stuff to just doing whitespace. No copyright or other stylistic checks.
  2. It would be good to have a solution that checks for whitespace problems in both git hooks and CI.
  3. For git hooks / CI, we only want to check whitespace for the new changes in a PR (vs. all the whitespace in a file)
  4. Brian would like a shell script that, given the first and last git commit hashes, can check each of those commits for the whitespace issues and return 0 for success and 1 for failure.
    • This shell script should not actually make any changes -- it just checks for whitespace issues.
    • This shell script can be easily launched via our existing Jenkins infrastructure (specifically: yes, disregard anything Jeff previously said about Probot!).
    • Brian: do we need some kind of stdout/log so that people can click on "Details" and see exactly where the whitespace issues are in their commit(s)?
  5. We need to define exactly what whitespace issues should be flagged:
    • Tabs
    • ...anything else? (e.g., any whitespace at the end of a line, extra [effectively] blank lines at the end of a file, ...)
  6. It would be ok to have PRs for the following branches to fix all the tab->8 spaces instances on the following branches:
    • master
    • v3.0.x
    • v3.1.x
    • v4.0.x

@guserav
Copy link
Contributor Author

guserav commented Jul 23, 2019

@jsquyres Thank you for the summary.
The script would be as easy as:

#! /bin/bash
#\t has to be replaces with a literal tab character
#.* is included because some files contain comments with tabs or use it to separate type and variable name
git diff $1 $2 | grep -C $3 -E "\+.*\t+"
#Just invert the exit code of grep
if [[ $? -eq 1 ]]
then
    exit 0
else
    exit 1
fi

The -C would be for context if Brian wants to have some in the details. If that is not needed then the -C can be omitted and the output of grep could be piped to /dev/null respectively.

Depending on whether you want to include the changes made in $1 you might want to use $1^ (I don't know what the github API gives Jenkins as a start commit)

I didn't see any good point to include the above script in the .travis.yml so I hesitated on opening a PR.

To point 6: @jsquyres you mentioned in the offline discussion that you want to have one PR for every code tree that would make a total of 16 PRs? That seems a lot, am I correct in assuming that you want one PR per branch.

@jsquyres
Copy link
Member

Does the regexp need to be:

... | egrep -C $3 -E "^\+.*\t"

I.e., a tab anywhere on a diff line that begins with +.

Also, just had a thought: what about committing in the directories that we know are allowed to have tabs (libevent and friends)?

@guserav
Copy link
Contributor Author

guserav commented Jul 23, 2019

Thank you. Forget the directories and the ^. How about:

#! /usr/bin/env bash
foundTab=0
for file in $(git diff --name-only $1 $2 | grep -vE "/(hwloc|libevent|pmix4x|treematch|romio)/" | grep -E "(\.c|\.h)$")
do
    git diff $1 $2 -- $file | grep -C $3 -E "^\+.*\t+"
    if [[ $? -eq 0 ]]
    then
        foundTab=1
    fi
done
if [[ $foundTab -ne 0 ]]
then
    exit 1
fi

@jsquyres
Copy link
Member

That looks feasible. How's this for a minor addition (disclaimer: typed here in the github UI; untested):

#! /bin/bash

config_file=.whitespace-checker-config.txt
if [[ -r $config_file ]]; then
    exclude_dirs=`cat $config_file`
else
    exclude_dirs='hwloc|libevent|pmix4x|treematch|romio'
fi

foundTab=0
for file in $(git diff --name-only $1 $2 | grep -vE "/($exclude_dirs)/")
do
    git diff $1 $2 -- $file | grep -C $3 -E "^\+.*\t+"
    if [[ $? -ne 0 ]]
    then
        foundTab=1
    fi
done
if [[ $foundTab -ne 0 ]]
then
    exit 1
fi

@guserav
Copy link
Contributor Author

guserav commented Jul 25, 2019

That is a good addition.

I believe you missed the | grep -E "(\.c|\.h)$") in the for loop header, because GitHub introduced a scroll bar for the long line...

When someone tells what to do for the following questions I could open the PR for enabling Travis to check the white space:

  • Where does that script need to be included? contrib/?
  • How to integrate it into the .travis.yml?

@jsquyres
Copy link
Member

Yes, I think I did (miss that additional grep -E).

This can probably go in github.com/openmpi/ompi-scripts, because this shell script will get launched via Open MPI's Jenkins instance (we got tired of the the instability of Travis and moved away from it a while ago).

Assume that you get 2 CLI params: the oldest git hash and the newest git hash (it'll probably always be HEAD, but hey, we might as well have the 2nd param there if we need it for something someday).

guserav added a commit to guserav/ompi-scripts that referenced this issue Jul 26, 2019
In order to prevent violations of the coding style in ompi it was
decided to check any future PRs for white space violations. This script
checks whether a certain commit range has introduced a tab character.

See open-mpi/ompi#6831 for details.

Signed-off-by: guserav <erik.zeiske@hpe.com>
bwbarrett pushed a commit to open-mpi/ompi-scripts that referenced this issue Jan 7, 2020
In order to prevent violations of the coding style in ompi it was
decided to check any future PRs for white space violations. This script
checks whether a certain commit range has introduced a tab character.

See open-mpi/ompi#6831 for details.

Signed-off-by: guserav <erik.zeiske@hpe.com>
bwbarrett pushed a commit to bwbarrett/ompi-scripts that referenced this issue Jan 8, 2020
In order to prevent violations of the coding style in ompi it was
decided to check any future PRs for white space violations. This script
checks whether a certain commit range has introduced a tab character.

See open-mpi/ompi#6831 for details.

Signed-off-by: guserav <erik.zeiske@hpe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants