A critical discussion of the author's C programming language coding guidelines concerning their applicability in the Java programming language

Eric Laroche
March 17 2002

Copyright © 2002 Eric Laroche
All rights reserved

The author's C programming language coding guidelines [EL,1998] can be found at <URL:http://www.lrdev.com/lr/c/ccgl.html>.

Introduction

This paper discusses the author's guidelines in C programming concerning their applicability or non-applicability in the Java programming language. The original paper gave a comprehensive insight into the author's guidelines in C programming. Many aspects that were needed to define those guidelines were thoroughly discussed. That paper provided much background information needed for the decisions about do's and don't's in C coding.

As in the original paper [EL,1998], some of the focus goes to the abstractions and considerations that lead to coding guidelines.

The original paper's considerations that are valid too for Java, either because of C's and Java's similarities, or due to a general language independence, are typically kept in brackets []. The paper follows the original paper's structure.

Aim of coding

General coding aims do not depend on the programming language used, so it is not astonishing that C and Java aim for the same.

[Software implementations should have error free design, error free algorithms and an error free code architecture.]

[Software defects can be a security problem.]

Software defects are a security problem in Java too. Some of the more common flaws in C and C++, buffer overflows, are however caught in Java (i.e. they either get properly handled or end in a graceful process termination triggered by an uncaught exception, the latter can of course lead to a denial-of-service exploit, however that one being much less a threat than the classical buffer overflow exploits).

[Then, software engineering aims are maintainability (ease of expanding or correcting code) and reusability.]

The fact that all Java code is class method code and that class design is typically considered library design may lead implicitly to a better code reusability. This fact too implies that any Java code fragment is part of a kind of software library (and hence follows somewhat stricter design requirements).

With an object oriented design approach, one generally tends not to make a distinction between small projects, which could be done without much software engineering consideration, and larger ones which definitely require such.

To the (easy) reusability comes the easy usability of classes (and interfaces, say libraries), since one Java pattern is offering a huge amount of classes/interfaces.

Aim of the original paper

[The intention of the original paper was to summarize considerations made about fulfilling the coding aims mentioned above and making (C) code less defective, more robust (against changes in code architecture) and more readable (for easier maintenance).]

[Design issues were not so much covered, since they are often independent of the programming language used. Neither were covered efficiency and economy of a programming language or runtime environment. The presence of abilities of runtime environments to compile functions (possibly from virtual machine bytecode into native binary code, maybe in a hotspot manner) was however mentioned.]

Since runtime efficiency was not part of the considerations, the difference of C/C++'s (optional) non-virtualness in method calls versus Java's enforced virtualness, as well as dynamic link issues are not considered here.

Considered C language definition

The C programming language definition considered in the original paper was ISO/IEC 9899:1990. (Older names were ANSI-C or K&R2.) [BK/DR,1988]

The compared Java version is the Java Language Specification Second Edition. [JG/BJ/GS/GB,2000]

Internationalization

Coding character set

[One should limit code to the ASCII character set. Comments, strings and character literals should not contain non-ASCII characters.]

The unicode character representation should be used for non-ASCII characters, e.g. '\u00e8'. In C, the '\xe8' notation was to be used.

Coding language

[English is still the language best understood by most software engineers, and should exclusively be used.]

[Locales typically control a program's user interface customization to a local language.]

Names

[Names are used to identify functions, variables, types, members, macros, etc.]

In Java, functions are manifested as class members (which would by the way not affect their naming, considering typical C++ naming rules not mentioned here), and additionally there is the package construct.

[There were many styles of choosing identifier names in C. The important rule was to be consequent about naming.]

Java introduced a (strict) naming scheme. That naming scheme builds on the use of mixed case letters and the distinction of types (classes) from variables, methods and packages by letting them begin with an upper case letter.

[The smaller the scope of a name, the less important a good choice of the name is.]

That is still true for Java variable name choice. Member naming should however be strict, even with private members. Parameter names seem less critical, since a parameter's semantic is often adequately defined by member name and parameter type (considered those follow the same strict naming conventions, i.e. are 'speaking').

In Java, the missing restriction on name lengths (either by e.g. C linker or e.g. older, narrow terminals; see e.g. C specification samples) allows a consequent non-abbreviated naming. The importance of this non-abbreviation lies in the intuitive use of both the name and its semantic, what makes interfaces to be understood quicker, which is especially important with the many classes/interfaces seen in Java.

There have been further approaches to distinguish data member names from local variable and parameter names. One approach often seen is to prefix data member names with m_. The usage of such a prefix allows such things as to use unprefixed names in the same class' context without (superficial) ambiguities and quickly spotting class' state (data member) usage. (Note that otherwise, the underscore character is not seen in names.)

[The naming should still be in english only, internationalization done at a totally different location.]

Name styles

[In C, other schemes than the Java naming scheme were used too (use of underscore characters, use of natural language context, etc).]

Type prefixes were sometimes used in C with primitive types, e.g. sz (zero terminated string), p (pointer), etc. These are of less use in an object oriented approach, and in Java especially.

Package prefixes used in function names, such as t_ (in the TLI API) are not needed in Java or other package/namespace using programming languages.

Name restrictions

Problems with case-insensitive linkers or restrictions on the name length are not known to be present in Java. However, it is imaginable that for really small environments (e.g. small embedded systems) names could be shortened (but it is assumed that that step takes place after bytecode compiling, therefore not considered a Java language relevant issue).

Namespace

[A program's identifier namespace is not partitioned in the C programming language, unlike in Lisp, Java or C++ 3rd Ed..]

In C, libraries used to tend to include package membership information (e.g. t_open, t_bind, etc. of the TLI network interface library) to omit name collisions. However, it was quite impossible to find small unique package prefixes.

Java does not suffer from the problem of finding unique package names, since there is a proposed unique package naming scheme, based on another unique hierarchical namespace, namely the internet domain name scheme.

File names

In C, the naming of source files, header files, libraries, directories and projects was not a programming language issue. In Java, source file names are bound to class names and directory names are bound to package names. Header files are not present. However, library (jar) naming seems not (yet) specified.

Since file names and directory names are bound to class names respectively package names, the allowed character set for file names and directory names is about only letters, digits and underscore. C file name policies were (a little bit) more open (see original paper for the suggested character set).

Compiler warnings

[It is demanded that the sources compile without warnings at the highest compiler warning level.]

This of course holds true for Java as well.

Hardware-near C warnings such as "possible bad alignment" are of no relevance in Java.

Style warnings

[Some compiler warnings revealed bad coding style rather than errors or incompatibilities. These bad styles should be avoided.]

The warning "assignment in conditional expression" is not so relevant in Java since (non-boolean) assignments do not implicitly cast to a boolean value. In C, they had to be avoided by using more suitable code constructs.

C/C++'s warning "comparison of signed and unsigned values" is not relevant in Java since Java does not employ unsigned numbers. One may have noticed that the author suggested typically not to use unsigned values in C at all.

Helper tools

[Helper tools (such as the lint checker) should be employed as much as possible, to enhance code quality.]

[A further (however not automated) tool can be seen in code reviews by humans.]

Lint

A check tool similar to C's lint is (currently) not known in Java. Lint is the traditional Unix development tool, originally designed for an older, less type-checking version of the C language (K&R1).

Most of lint's facilities are not needed in Java. These include e.g. insufficient type checking with older C compilers. Another lint feature, a check for inter-module incompatibilities, is not needed since Java does not support implicit or separate explicit function or data declaration. Also, Java allows less casts and even less implicit casts, which are, by nature, problematic.

Nothing speaks against incorporating lint's stricter syntax checks into the (Java) compilers. Also, things such as detecting dead (unreachable) code are assumed to be performed by current Java compilers.

Metrics

Metrics may be less important in a strict object oriented language such as Java, since the desired level of modularization may be determined by class design already.

Part of commenting required by metrics may be enforced with javadoc requirements (e.g. a javadoc string per method, etc).

Assertions

[An alternative to runtime checkers are assertions. They allow to check functions' preconditions and postconditions (and more) at runtime.]

In Java, assertions are typically replaced by argument checking and possibly throwing IllegalArgumentExceptions or other RuntimeExceptions. These are easy to use, since they do not require to be declared in the function's signature.

Runtime checkers

The runtime checkers that check for array boundaries, dynamic data boundaries, missing memory deallocation, function/system calls with bad arguments, etc., are typically not needed in Java, since most of this is caught explicitly by the Java runtime environment, which of course makes Java per se more robust.

Code complexity

[One of the aims of coding guidelines may be to keep code complexity as low as possible (complex meaning: hard to read, error prone).]

The object oriented design approach is one means to do this.

The thorough use of fine-granular methods, possibly by implementing many small Java interfaces, may lead to more readable and more robust implementations. Further, this approach tends to produce more (and possibly better) abstractions, which further enhances code quality.

Nested expressions

Java is, as C, an orthogonal language which allows to chain and nest expressions, with the known problems: code can become unreadable, source code debugging does not show intermediate data, etc.. Therefore, nested expressions should be avoided in both languages.

[Temporary variables should not be avoided in order to enhance performance. Temporary variables can be optimized away by the compiler.]

Redundancy

[Code fragments should not be repeated. Redundant code is harder to maintain and increases the probability of introducing defects.]

[Implementing more general functions/classes might be considered instead.]

Determinism

[Deterministic code should be written. Searching for bugs that show non-deterministic symptoms is an unpleasant task, at best.]

C's explicit initialization of variables or buffers may be replaced by Java's implicit initialization.

The often used null assignment to freed C pointers of course has another semantic in Java. Java does not require to explicitly free memory (since it is garbage collected), however, explicit release of object references is needed to let the objects be released (garbage collected).

Modularity

[Modularity is an important key for code maintainability and handling complexity.]

A module abstraction layer used in C, to decrease problem complexity is replaced by Java's object oriented approach, i.e. its class concept and class inheritance. Class design determines the modularization.

Interfaces

Java does not separate interface definition from implementation (for concrete classes), so the corresponding C guidelines for C header files do not apply here.

The tendency of designing one interface file for each C source file or each C++ class is well enforced in Java by typically having exactly one (public) class per source file.

However, Java does not allow to create facades on top of existing interface definitions, unless proxy code or additional facade interfaces are involved. Java's interfaces are well suited to fulfill the facade pattern. So C's 'external header' typically consists of one or more interfaces.

Header files

The explicit requirement of C header files to be included by both the implementing module and the calling module(s) is not present in Java, since the interface and implementation are not separated. Having no header files seems not only easier in maintenance but also avoids C's nasty bugs in case of an altered implementation file not including its out-of-sync header file.

C's file scope is replaced by Java's class scope, so the file scope access modifier (static) is not used in that context in Java.

C/C++'s include guards are not needed as well.

Resources

[Resources are data-only modules. Samples are X11 bitmap files.]

Java VM may allow copy-on-demand on any data to share it between threads (or other entities), without requiring an explicit const modifier.

Code order

Java code order (i.e. order of function members in a source file) is determined by interface order. That one is probably guided by readability, grouping of semantics, etc.

The 'Pascal style' local C function usage (definition before use, no declaration) allowed it to leave away the declaration's overhead. This overhead is not present in Java anyway.

[Error handling should typically be done without delay.]

Conditional compiling

Java does not provide conditional compiling. Note that the opinions about conditional compiling in C differed (i.e. some did not use it at all).

For platform specific stuff, the alternative of choice in Java may be to provide different implementations for an interface (implementation polymorphism).

Code nesting

[Block constructs should not be nested to deeply. Nested loops tend to get hard to read. Not more than about two levels of nesting should be used.]

The use of throw statements may reduce nesting and make code more readable.

Scope

Java provides additionally class scope, class hierarchy (protected) scope and package (default) scope. C's global scope, function scope and block scope are provided as well, file scope is replaced by class scope, since Java source files often (and preferably) consist of exactly one class.

[Choosing scope is an important micro architecture instrument. Generally, scope should be chosen as narrow as possible.]

Scope of functions

In Java, function member scope is defined by a class' (public) interface. Function members that are not part of that public interface (e.g. 'helper' functions) should be private, protected or of default access, but not public.

[Narrow scopes encapsulate code.]

Scope of variables

[The variable scope should be as small as possible. The opinions about using variable declarations in C block scope differed.]

Java, as C++ but unlike C, lets local variables be declared anywhere in the code (not just at the beginning of a block), which leads, together with the rule declaration is initialization, to an implicit innermost-possible-block variable declaration.

[Global (application) scope should generally be avoided.]

Public member variables should be avoided. An even stricter approach is to only use private member variables. This approach would correspond to avoiding C's global scope.

Scope of types

Types (classes) can have global scope or be limited to package scope. These two scopes would correspond to the proposed global and file scope in C.

Scope of macros

Macros are not supported by Java.

Error prone constructs

C keeps (unlike some other languages) some error prone constructs ready.

Java does not experience problems with explicit casts, array sizes, buffer sizes, macros and less problems with error checking (through the use of exceptions).

Explicit casts

C Compilers do not generate errors or warnings on semantically false explicit casts. Java forbids some of the casts, and throws ClassCastExceptions on others, so problems rarely arise in Java from explicit casts.

[Explicit casts should be used as rarely as possible. In object oriented programming, casts can be seen as indication of a design error in some cases.]

Type size

In C, it was advantageous to know the integer size (int could e.g. be 16 bits, 32 bits, 64 bits), which depended on processor, operating system and compiler. Java on the other hand defines the number types byte, short, int and long to be 8, 16, 32 and 64 bits wide.

[Signed types (in contrast to unsigned types) were often enough for the problem domain.]

Array size

Array size references could be done in an unrobust manner in the C programming language. The robust sizeof constructs are not needed in Java, since arrays carry an (implicit) length attribute.

Buffer sizes

The problem of buffer function arguments lacking their size, is not given in Java, since buffers implicitly carry their length. In case of out-of-bound writes, appropriate exceptions are thrown and technically allow to handle the problem. In garbage collected systems (such as Lisp or Java), often dynamic data is returned instead of buffer arguments used.

C suffered of the problem of buffer overflows, that corrupted other data or stack frames or memory heap management data, which were sometimes hard to detect and locate.

[Buffer overflows are security problems.]

Macro parameters

Java does not support macros and their call-by-name semantics. So Java does not suffer from operator precedence problems associated with this, or unwanted multiple side effects.

Macro side effects

Java does not support macros.

Sign extension

C's typical sign extension problem with converting char to int is not so much a problem in Java.

Error checking

[Missing error checks will lead to bugs.]

Exceptions theoretically make it easier to implement error checking code.

Sequence points

The missing sequence points problem and the associated undefined behavior, as expressed in C statements like *p++ = *p++ = 0; seems not so much present in object oriented languages and in Java especially.

[It would be nice if undefined behavior through missing sequence point definition was generally diagnosed by compilers.]

Optimizer errors

[Sometimes hard-to-track errors origin from errors in the compiler optimizer step.]

[Most of the development shall be done without optimization. When (and if) switching to optimized release code, test cases must be run to check integrity.]

Style

Numbers

[Numbers (numeric constants) should typically not appear hardcoded in source code. (Few of the numbers make an exception, e.g. 0 or 1.) Most of the values can be made configurable (maybe with reasonable defaults) or can be deduced from other quantities.]

Numerical error return values (in C often -1) are not so much used in Java, which rather throws exceptions in error cases.

Numbers should especially not be hardcoded if they impose limits on something (e.g. on input sizes). This is of lesser importance in Java since (fixed) buffers (as often used in C), are not so much present. Java heavier uses dynamic data than C.

[Hidden dependencies among constants should not be generated. Constants should be defined by means of constants they derive from. (Compilers are quite able to do arithmetics at compile time.)]

Unsigned numbers

In C one may consider not to use unsigned values at all. Java does not support unsigned numbers.

C did not bring runtime checks on underflows, so the programs were not more robust by means of using unsigned values.

[Again: the problem domain should be known.]

Longs, shorts

Using longs was often an issue on 16 bits systems, to get (at least) 32 bits. In Java, longs are 64 bits.

Sometimes the assumption was that C implementations define a long to be exactly 32 bits, which is however not defined by the C language standard. Java on the other hand, exactly defines the bit sizes.

Problems with data structure alignment are not present in Java since object data should not and cannot be copied directly.

Floats

[One may possibly avoid floating point numbers, since the problem domain often is of discrete nature.]

[Many problems are solvable without using floats. E.g. a typical hashtable high-water-mark of 0.75 may be expressed by a ratio and handled by integer arithmetics: if (4 * items > 3 * size).]

[Avoid single precision float. Use double.]

Parameter types

C/C++ knew the notation char* argv[] instead of the preferred char** argv. Java does not use the dereference operator *.

Variable arguments

Variable arguments are not supported in Java. They are not type safe. Java and C++ have better ways for typical variable arguments applications: overloaded toString methods respectively ostream shift operators.

Portability types

Type aliases (by C's typedef) are not an issue in Java. If a type has an own name, it should get its own reference type, that one possibly only delegating things to the original type.

Standard library

The large Java class library takes the place of C's (and C++'s) standard library. Code implemented there will also be portable and may also be optimized.

[Use stuff offered from libraries.]

NULL macro

Java has a null keyword instead of C's (optional) NULL macro. Java is clear about the usage of null, C was not clear about NULL and therefore needed guidelines about that issue.

Register

The altruistic register keyword does not make so much sense in Java that is compiled to JVM bytecode.

Auto

[Do not use auto.]

Goto

[Do not use goto. Gotos may lead to a confusing program flow. Appreciate also break and continue instead of goto.]

Multiple returns

[The opinions differed about using multiple return statements in a function. The author sees multiple return statements as a good micro design construct.]

Java additionally offers exceptions for erroneous control flow.

Obscurities

[Avoid the use of the logic operators && and || as standalone statements. Do not use f() && g(); instead of if (f()) g();.]

[Do not overuse the comma operator.]

Topics left out

[The style topics intentionally left out in the original paper were such things as the pros and cons of the ternary operator or the pros and cons of code optimization.]

Indentation

[Indentation style may be considered as not so important. Filters or editor facilities could be used to adjust source code.]

[However, independently of the indent style used, it should be consequent and may be independent of tabulator settings.]

Tabulators

[Making no assumptions about tabulator settings restricts to either use only tabulators to indent or only blanks, but not both. One tab per indentation level may make some sense.]

Braces

[Opening and closing braces can either appear on a line of their own or on the preceding line. The closing braces being right after the last statement (Lisp style) being rarely seen.]

[The use of the above styles can differ between code and data and can differ between top level code braces and function level code braces.]

[Braces may or may not be omitted in control blocks if the block covers one or zero statements. It is more consequent and more robust to generally use braces with the control constructs.]

Labels

[Switch labels can either appear adjusted to the outer block indent level or to the inner.]

[Goto labels can be adjusted to the left margin, one indent level less than the next statement, or on the same level as the next statement.]

[Label placement should be consequent.]

Blanks

[Only few C tokens (identifiers, operators, etc.) really require a blank as delimiter (e.g. else if and int i). However, one blank is typically used between two tokens, to make source code more readable. Exceptions are the high precedence and unary operators, parentheses/brackets/braces and comma/semicolon.]

[Often, there is an intentional distinction between control constructs (if (...)) and function calls (f(...)) concerning their spacing.]

[Be consequent about using blanks between tokens.]

Comments

[A programmer may decide to create only comments on lines of their own.]

[Some evident information might be included in source files: a short description of functionality, revision date (maybe generated by version control), author(s), maybe a copyright and maybe revision history.]

[Do not comment the obvious. Do not use comments as substitutes for speaking identifier names.]

Tool directive comments (e.g. Lint's /*ARGSUSED*/ or /*EMPTY*/) do not apply in Java.

Java introduced the javadoc comments, used to comment classes' and methods' semantics. These comment's format is somewhat specified, and additionally defines a set of applicable attributes (version, author, etc). Some of the javadoc ideas have been adapted to C++ and may also be supported in a C context.

Block comments

[As blocks represent a functional entity, they are candidates for comments.]

[Commenting the closing brace of a block was topic of discussion. Note that a suited editor typically lets the user jump to the corresponding opening brace and is able to highlight both braces or whole blocks. The same applies for #endif.]

Conclusions

Unlike C, Java is a programming language that enforces some syntactical issues that make the need for guidelines less vital than in the C programming language.

However, where Java did not improve upon C, both languages typically share some guidelines, especially style guidelines.

References

[BK/DR,1988] The C Programming Language, Brian W. Kernighan, Dennis M. Ritchie, Prentice Hall, 2nd Ed. 1988, ISBN 0-13-110362-8

[BS,1997] The C++ Programming Language, Bjarne Stroustrup, Addison-Wesley, 3rd Ed. 1997, ISBN 0-201-88954-4

[EL,1998] C programming language coding guidelines, Eric Laroche, 1998, URL http://www.lrdev.com/lr/c/ccgl.html

[JG/BJ/GS/GB,2000] The Java Language Specification, James Gosling, Bill Joy, Guy Steele, Gilad Bracha, Addison-Wesley, 2rd Ed. 2000, ISBN 0-201-31008-2


Eric Laroche, laroche@lrdev.com, Sun Mar 17 2002
URL: <URL:http://www.lrdev.com/lr/java/cdccglj.html>