Copyright © 2002 Eric Laroche
All rights reserved
The
author's
C programming language coding guidelines
[EL,1998]
can be found at
<URL:http://www.lrdev.com/lr/c/ccgl.html>.
Introduction
This paper discusses the author's guidelines in C programming
concerning their applicability or non-applicability in the Java
programming language.
The original paper gave a comprehensive insight into the author's
guidelines in C programming. Many aspects that were needed to define
those guidelines were thoroughly discussed. That paper provided much
background information needed for the decisions about do's and don't's
in C coding.
As in the original paper [EL,1998], some of the focus goes to the abstractions and considerations that lead to coding guidelines.
The original paper's considerations that are valid too for
Java, either because of C's and Java's similarities, or due to a general
language independence, are typically kept in
brackets [].
The paper follows the original paper's structure.
Aim of coding
General coding aims do not depend on the programming language used, so
it is not astonishing that C and Java aim for the same.
[Software implementations should have error free design, error free algorithms and an error free code architecture.]
[Software defects can be a security problem.]
Software defects are a security problem in Java too. Some of the more common flaws in C and C++, buffer overflows, are however caught in Java (i.e. they either get properly handled or end in a graceful process termination triggered by an uncaught exception, the latter can of course lead to a denial-of-service exploit, however that one being much less a threat than the classical buffer overflow exploits).
[Then, software engineering aims are maintainability (ease of expanding or correcting code) and reusability.]
The fact that all Java code is class method code and that class design is typically considered library design may lead implicitly to a better code reusability. This fact too implies that any Java code fragment is part of a kind of software library (and hence follows somewhat stricter design requirements).
With an object oriented design approach, one generally tends not to make a distinction between small projects, which could be done without much software engineering consideration, and larger ones which definitely require such.
To the (easy) reusability comes the easy usability of
classes (and interfaces, say libraries), since one Java pattern is
offering a huge amount of classes/interfaces.
Aim of the original paper
[The intention of the original paper was to summarize
considerations made about fulfilling the coding aims mentioned above and
making (C) code less defective, more robust (against
changes in code architecture) and more readable (for easier
maintenance).]
[Design issues were not so much covered, since they are often independent of the programming language used. Neither were covered efficiency and economy of a programming language or runtime environment. The presence of abilities of runtime environments to compile functions (possibly from virtual machine bytecode into native binary code, maybe in a hotspot manner) was however mentioned.]
Since runtime efficiency was not part of the considerations,
the difference of C/C++'s (optional) non-virtualness in method
calls versus Java's enforced virtualness, as well as dynamic
link issues are not considered here.
Considered C language definition
The C programming language definition considered in the original paper
was ISO/IEC 9899:1990. (Older names were ANSI-C or
K&R2.) [BK/DR,1988]
The compared Java version is the Java Language
Specification Second Edition. [JG/BJ/GS/GB,2000]
Internationalization
Coding character set
[One should limit code to the ASCII character set. Comments,
strings and character literals should not contain non-ASCII characters.]
The unicode character representation should be used for
non-ASCII characters, e.g. '\u00e8'
. In C, the
'\xe8'
notation was to be used.
Coding language
[English is still the language best understood by most software
engineers, and should exclusively be used.]
[Locales typically control a program's user interface
customization to a local language.]
Names
[Names are used to identify functions, variables,
types, members, macros, etc.]
In Java, functions are manifested as class members (which would by the way not affect their naming, considering typical C++ naming rules not mentioned here), and additionally there is the package construct.
[There were many styles of choosing identifier names in C. The important rule was to be consequent about naming.]
Java introduced a (strict) naming scheme. That naming scheme builds on the use of mixed case letters and the distinction of types (classes) from variables, methods and packages by letting them begin with an upper case letter.
[The smaller the scope of a name, the less important a good choice of the name is.]
That is still true for Java variable name choice. Member naming should however be strict, even with private members. Parameter names seem less critical, since a parameter's semantic is often adequately defined by member name and parameter type (considered those follow the same strict naming conventions, i.e. are 'speaking').
In Java, the missing restriction on name lengths (either by e.g. C linker or e.g. older, narrow terminals; see e.g. C specification samples) allows a consequent non-abbreviated naming. The importance of this non-abbreviation lies in the intuitive use of both the name and its semantic, what makes interfaces to be understood quicker, which is especially important with the many classes/interfaces seen in Java.
There have been further approaches to distinguish data member
names from local variable and parameter names. One
approach often seen is to prefix data member names with
m_
.
The usage of such a prefix allows such things as to use unprefixed names
in the same class' context without (superficial) ambiguities and quickly
spotting class' state (data member) usage.
(Note that otherwise, the underscore character is not seen in names.)
[The naming should still be in english only,
internationalization done at a totally different location.]
Name styles
[In C, other schemes than the Java naming scheme were used too (use of
underscore characters, use of natural language context, etc).]
Type prefixes were sometimes used in C with primitive types, e.g. sz (zero terminated string), p (pointer), etc. These are of less use in an object oriented approach, and in Java especially.
Package prefixes used in function names, such as t_ (in the TLI
API) are not needed in Java or other package/namespace using
programming languages.
Name restrictions
Problems with case-insensitive linkers or restrictions on the
name length are not known to be present in Java. However, it
is imaginable that for really small environments (e.g. small embedded
systems) names could be shortened (but it is assumed that that
step takes place after bytecode compiling, therefore not considered a
Java language relevant issue).
Namespace
[A program's identifier namespace is not partitioned in the C
programming language, unlike in Lisp, Java or C++ 3rd Ed..]
In C, libraries used to tend to include package membership information
(e.g. t_open
, t_bind
, etc. of the TLI network
interface library) to omit name collisions.
However, it was quite impossible to find small unique package
prefixes.
Java does not suffer from the problem of finding unique package
names, since there is a proposed unique package naming scheme,
based on another unique hierarchical namespace, namely the
internet domain name scheme.
File names
In C, the naming of source files, header files, libraries, directories
and projects was not a programming language issue. In Java, source
file names are bound to class names and directory names are bound to
package names. Header files are not present. However, library (jar)
naming seems not (yet) specified.
Since file names and directory names are bound to class names
respectively package names, the allowed character set for file names and
directory names is about only letters, digits and underscore.
C file name policies were (a little bit) more open (see original paper
for the suggested character set).
Compiler warnings
[It is demanded that the sources compile without warnings at
the highest compiler warning level.]
This of course holds true for Java as well.
Hardware-near C warnings such as "possible bad alignment" are of no
relevance in Java.
Style warnings
[Some compiler warnings revealed bad coding style rather than
errors or incompatibilities. These bad styles should be avoided.]
The warning "assignment in conditional expression" is not so relevant in Java since (non-boolean) assignments do not implicitly cast to a boolean value. In C, they had to be avoided by using more suitable code constructs.
C/C++'s warning "comparison of signed and unsigned values" is not
relevant in Java since Java does not employ unsigned numbers.
One may have noticed that the author suggested typically not to use
unsigned values in C at all.
Helper tools
[Helper tools (such as the lint checker) should be employed as
much as possible, to enhance code quality.]
[A further (however not automated) tool can be seen in code reviews by
humans.]
Lint
A check tool similar to C's lint is (currently) not
known in Java. Lint is the traditional Unix development tool,
originally designed for an older, less type-checking version of the C
language (K&R1).
Most of lint's facilities are not needed in Java. These include e.g. insufficient type checking with older C compilers. Another lint feature, a check for inter-module incompatibilities, is not needed since Java does not support implicit or separate explicit function or data declaration. Also, Java allows less casts and even less implicit casts, which are, by nature, problematic.
Nothing speaks against incorporating lint's stricter syntax
checks into the (Java) compilers.
Also, things such as detecting dead (unreachable) code are assumed to be
performed by current Java compilers.
Metrics
Metrics may be less important in a strict object oriented language such
as Java, since the desired level of modularization may be determined by
class design already.
Part of commenting required by metrics may be enforced with
javadoc requirements (e.g. a javadoc string per method, etc).
Assertions
[An alternative to runtime checkers are assertions. They allow
to check functions' preconditions and postconditions (and more)
at runtime.]
In Java, assertions are typically replaced by argument checking
and possibly throwing IllegalArgumentExceptions or other
RuntimeExceptions. These are easy to use, since they do not
require to be declared in the function's signature.
Runtime checkers
The runtime checkers that check for array boundaries, dynamic
data boundaries, missing memory deallocation, function/system calls with
bad arguments, etc., are typically not needed in Java, since most of
this is caught explicitly by the Java runtime environment, which of
course makes Java per se more robust.
Code complexity
[One of the aims of coding guidelines may be to keep code
complexity as low as possible (complex meaning: hard to
read, error prone).]
The object oriented design approach is one means to do this.
The thorough use of fine-granular methods, possibly by
implementing many small Java interfaces, may lead to more
readable and more robust implementations. Further, this approach tends
to produce more (and possibly better) abstractions, which
further enhances code quality.
Nested expressions
Java is, as C, an orthogonal language which allows to chain and
nest expressions, with the known problems: code can become
unreadable, source code debugging does not show intermediate
data, etc.. Therefore, nested expressions should be avoided in both
languages.
[Temporary variables should not be avoided in order to
enhance performance. Temporary variables can be optimized away by the
compiler.]
Redundancy
[Code fragments should not be repeated. Redundant code is harder to
maintain and increases the probability of introducing defects.]
[Implementing more general functions/classes might be considered
instead.]
Determinism
[Deterministic code should be written. Searching for bugs that
show non-deterministic symptoms is an unpleasant task, at
best.]
C's explicit initialization of variables or buffers may be replaced by Java's implicit initialization.
The often used null assignment to freed C pointers of course has another
semantic in Java. Java does not require to explicitly free memory
(since it is garbage collected), however, explicit release of object
references is needed to let the objects be released (garbage collected).
Modularity
[Modularity is an important key for code maintainability and handling
complexity.]
A module abstraction layer used in C, to decrease problem
complexity is replaced by Java's object oriented approach, i.e.
its class concept and class inheritance. Class design
determines the modularization.
Interfaces
Java does not separate interface definition from
implementation (for concrete classes), so the corresponding C
guidelines for C header files do not apply here.
The tendency of designing one interface file for each C source file or each C++ class is well enforced in Java by typically having exactly one (public) class per source file.
However, Java does not allow to create facades on top of
existing interface definitions, unless proxy code or additional
facade interfaces are involved. Java's interfaces are well
suited to fulfill the facade pattern. So C's 'external header'
typically consists of one or more interfaces.
Header files
The explicit requirement of C header files to be included by both the
implementing module and the calling module(s) is not present in Java,
since the interface and implementation are not separated.
Having no header files seems not only easier in maintenance but also
avoids C's nasty bugs in case of an altered implementation file not
including its out-of-sync header file.
C's file scope is replaced by Java's class scope, so
the file scope access modifier (static
) is not used in that
context in Java.
C/C++'s include guards are not needed as well.
Resources
[Resources are data-only modules. Samples are X11 bitmap
files.]
Java VM may allow copy-on-demand on any data to share it
between threads (or other entities), without requiring an explicit
const
modifier.
Code order
Java code order (i.e. order of function members in a source file) is
determined by interface order. That one is probably guided by
readability, grouping of semantics, etc.
The 'Pascal style' local C function usage (definition before use, no declaration) allowed it to leave away the declaration's overhead. This overhead is not present in Java anyway.
[Error handling should typically be done without delay.]
Conditional compiling
Java does not provide conditional compiling. Note that the
opinions about conditional compiling in C differed (i.e. some did not
use it at all).
For platform specific stuff, the alternative of choice in Java may be to
provide different implementations for an interface (implementation
polymorphism).
Code nesting
[Block constructs should not be nested to deeply. Nested loops tend to
get hard to read. Not more than about two levels of nesting should be
used.]
The use of throw
statements may reduce nesting and make
code more readable.
Scope
Java provides additionally class scope, class hierarchy
(protected) scope and package (default) scope. C's
global scope, function scope and block scope
are provided as well, file scope is replaced by class scope, since Java
source files often (and preferably) consist of exactly one class.
[Choosing scope is an important micro architecture instrument.
Generally, scope should be chosen as narrow as possible.]
Scope of functions
In Java, function member scope is defined by a class'
(public) interface. Function members that are not
part of that public interface (e.g. 'helper'
functions) should be private, protected or of default access,
but not public.
[Narrow scopes encapsulate code.]
Scope of variables
[The variable scope should be as small as possible. The opinions about
using variable declarations in C block scope differed.]
Java, as C++ but unlike C, lets local variables be declared anywhere in the code (not just at the beginning of a block), which leads, together with the rule declaration is initialization, to an implicit innermost-possible-block variable declaration.
[Global (application) scope should generally be avoided.]
Public member variables should be avoided. An even stricter
approach is to only use private member variables. This
approach would correspond to avoiding C's global scope.
Scope of types
Types (classes) can have global scope or be limited to package
scope. These two scopes would correspond to the proposed global and
file scope in C.
Scope of macros
Macros are not supported by Java.
Error prone constructs
C keeps (unlike some other languages) some error prone
constructs ready.
Java does not experience problems with explicit casts, array sizes,
buffer sizes, macros and less problems with error checking (through the
use of exceptions).
Explicit casts
C Compilers do not generate errors or warnings on semantically false
explicit casts. Java forbids some of the casts, and throws
ClassCastExceptions on others, so problems rarely arise in Java
from explicit casts.
[Explicit casts should be used as rarely as possible. In object
oriented programming, casts can be seen as indication of a design error
in some cases.]
Type size
In C, it was advantageous to know the integer size
(int
could e.g. be 16 bits, 32 bits, 64 bits), which
depended on processor, operating system and compiler. Java on
the other hand defines the number types byte, short, int
and long to be 8, 16, 32 and 64 bits wide.
[Signed types (in contrast to unsigned types) were
often enough for the problem domain.]
Array size
Array size references could be done in an unrobust manner in
the C programming language. The robust sizeof
constructs
are not needed in Java, since arrays carry an (implicit)
length attribute.
Buffer sizes
The problem of buffer function arguments lacking their size, is
not given in Java, since buffers implicitly carry their length.
In case of out-of-bound writes, appropriate exceptions are
thrown and technically allow to handle the problem. In garbage
collected systems (such as Lisp or Java), often
dynamic data is returned instead of buffer arguments used.
C suffered of the problem of buffer overflows, that corrupted other data or stack frames or memory heap management data, which were sometimes hard to detect and locate.
[Buffer overflows are security problems.]
Macro parameters
Java does not support macros and their call-by-name
semantics. So Java does not suffer from operator precedence
problems associated with this, or unwanted multiple
side effects.
Macro side effects
Java does not support macros.
Sign extension
C's typical sign extension problem with converting
char
to int
is not so much a problem in Java.
Error checking
[Missing error checks will lead to bugs.]
Exceptions theoretically make it easier to implement error
checking code.
Sequence points
The missing sequence points problem and the associated
undefined behavior, as expressed in C statements like
*p++ = *p++ = 0;
seems not so much
present in object oriented languages and in Java especially.
[It would be nice if undefined behavior through missing sequence point
definition was generally diagnosed by compilers.]
Optimizer errors
[Sometimes hard-to-track errors origin from errors in the compiler
optimizer step.]
[Most of the development shall be done without optimization. When (and
if) switching to optimized release code, test cases must be run to check
integrity.]
Style
Numbers
[Numbers (numeric constants) should typically not appear
hardcoded in source code. (Few of the numbers make an exception,
e.g. 0 or 1.) Most of the values can be made configurable
(maybe with reasonable defaults) or can be deduced from other
quantities.]
Numerical error return values (in C often -1) are not so much used in Java, which rather throws exceptions in error cases.
Numbers should especially not be hardcoded if they impose limits on something (e.g. on input sizes). This is of lesser importance in Java since (fixed) buffers (as often used in C), are not so much present. Java heavier uses dynamic data than C.
[Hidden dependencies among constants should not be generated.
Constants should be defined by means of constants they derive from.
(Compilers are quite able to do arithmetics at compile time.)]
Unsigned numbers
In C one may consider not to use unsigned values at all. Java
does not support unsigned numbers.
C did not bring runtime checks on underflows, so the programs were not more robust by means of using unsigned values.
[Again: the problem domain should be known.]
Longs, shorts
Using longs was often an issue on 16 bits systems, to get (at
least) 32 bits. In Java, longs are 64 bits.
Sometimes the assumption was that C implementations define a
long
to be exactly 32 bits, which is however not
defined by the C language standard. Java on the other hand, exactly
defines the bit sizes.
Problems with data structure alignment are not present in Java
since object data should not and cannot be copied directly.
Floats
[One may possibly avoid floating point numbers, since the
problem domain often is of discrete nature.]
[Many problems are solvable without using floats. E.g. a typical
hashtable high-water-mark of 0.75
may be expressed by a
ratio and handled by integer arithmetics:
if (4 * items > 3 * size)
.]
[Avoid single precision float
. Use
double
.]
Parameter types
C/C++ knew the notation char* argv[]
instead of the
preferred char** argv
. Java does not use the
dereference operator *.
Variable arguments
Variable arguments are not supported in Java. They are not type
safe. Java and C++ have better ways for typical variable arguments
applications: overloaded toString methods respectively ostream shift
operators.
Portability types
Type aliases (by C's typedef
) are not an issue in
Java. If a type has an own name, it should get its own reference type,
that one possibly only delegating things to the original type.
Standard library
The large Java class library takes the place of C's (and C++'s) standard
library. Code implemented there will also be portable and may
also be optimized.
[Use stuff offered from libraries.]
NULL macro
Java has a null
keyword instead of C's (optional)
NULL
macro.
Java is clear about the usage of null
, C was not clear
about NULL
and therefore needed guidelines about that
issue.
Register
The altruistic register
keyword does not make so much sense
in Java that is compiled to JVM bytecode.
Auto
[Do not use auto
.]
Goto
[Do not use goto
. Gotos may lead to a confusing
program flow. Appreciate also break
and
continue
instead of goto
.]
Multiple returns
[The opinions differed about using multiple return
statements in a function. The author sees multiple return statements as
a good micro design construct.]
Java additionally offers exceptions for erroneous control flow.
Obscurities
[Avoid the use of the logic operators &&
and
||
as standalone statements. Do not use
f() && g();
instead of
if (f()) g();
.]
[Do not overuse the comma operator.]
Topics left out
[The style topics intentionally left out in the original paper
were such things as the pros and cons of the ternary operator
or the pros and cons of code optimization.]
Indentation
[Indentation style may be considered as not so important.
Filters or editor facilities could be used to adjust source code.]
[However, independently of the indent style used, it should be
consequent and may be independent of tabulator
settings.]
Tabulators
[Making no assumptions about tabulator settings restricts to
either use only tabulators to indent or only blanks, but not both.
One tab per indentation level may make some sense.]
Braces
[Opening and closing braces can either appear on a line of their own or
on the preceding line. The closing braces being right after the last
statement (Lisp style) being rarely seen.]
[The use of the above styles can differ between code and data and can differ between top level code braces and function level code braces.]
[Braces may or may not be omitted in control blocks if the block covers
one or zero statements.
It is more consequent and more robust to generally use braces with the
control constructs.]
Labels
[Switch labels can either appear adjusted to the outer block indent
level or to the inner.]
[Goto labels can be adjusted to the left margin, one indent level less than the next statement, or on the same level as the next statement.]
[Label placement should be consequent.]
Blanks
[Only few C tokens (identifiers, operators, etc.) really
require a blank as delimiter (e.g. else if
and
int i
). However, one blank is typically used between
two tokens, to make source code more readable. Exceptions are
the high precedence and unary operators, parentheses/brackets/braces
and comma/semicolon.]
[Often, there is an intentional distinction between control constructs
(if (...)
) and function calls (f(...)
)
concerning their spacing.]
[Be consequent about using blanks between tokens.]
Comments
[A programmer may decide to create only comments on lines of their own.]
[Some evident information might be included in source files: a short description of functionality, revision date (maybe generated by version control), author(s), maybe a copyright and maybe revision history.]
[Do not comment the obvious. Do not use comments as substitutes for speaking identifier names.]
Tool directive comments (e.g. Lint's /*ARGSUSED*/
or /*EMPTY*/
) do not apply in Java.
Java introduced the javadoc comments, used to comment classes'
and methods' semantics. These comment's format is somewhat specified,
and additionally defines a set of applicable attributes (version,
author, etc).
Some of the javadoc ideas have been adapted to C++ and may also be
supported in a C context.
Block comments
[As blocks represent a functional entity, they are candidates
for comments.]
[Commenting the closing brace of a block was topic of discussion. Note
that a suited editor typically lets the user jump to the corresponding
opening brace and is able to highlight both braces or whole blocks.
The same applies for #endif
.]
Conclusions
Unlike C, Java is a programming language that enforces
some syntactical issues that make the need for guidelines less
vital than in the C programming language.
However, where Java did not improve upon C, both languages typically
share some guidelines, especially style guidelines.
References
[BK/DR,1988] The C Programming Language, Brian W. Kernighan, Dennis M. Ritchie, Prentice Hall, 2nd Ed. 1988, ISBN 0-13-110362-8
[BS,1997] The C++ Programming Language, Bjarne Stroustrup, Addison-Wesley, 3rd Ed. 1997, ISBN 0-201-88954-4
[EL,1998] C programming language coding guidelines, Eric Laroche, 1998, URL http://www.lrdev.com/lr/c/ccgl.html
[JG/BJ/GS/GB,2000] The Java Language Specification, James Gosling, Bill Joy, Guy Steele, Gilad Bracha, Addison-Wesley, 2rd Ed. 2000, ISBN 0-201-31008-2