How to identify the end of lines used in a text file

It’s pretty easy to find files that have Windows end of lines (CRLF) with the GNU grep:

grep -lUP '\r$'

And if you need to find files with Unix end of lines:

grep -lUP '[^\r]$'

But you may need more, for instance to find out if files have mixed end of lines. Then you should give “file” a try:

$ file mixed unix windows
mixed:   ASCII text, with CRLF, LF line terminators
unix:    ASCII text
windows: ASCII text, with CRLF line terminators

If you need even more information, for instance the number of CRLF and LF in each file, then you could use the following C program (eol-id). It will tell you this:

$ ./eol-id mixed unix windows
mixed LF=3 CRLF=3 VERDICT:MIXED
unix LF=3 VERDICT:LF
windows CRLF=3 VERDICT:CRLF

Here is the code (eol-id.c):

#include <stdio.h>
#include <stdlib.h>

#define CR 0x0D
#define LF 0x0A

int readfile(char *filename) {
    FILE * fptr = fopen(filename, "r");

    if ( fptr == NULL ) {
        fprintf(stderr, "Failed to open %s\n", filename);
        exit(1);
    }

    int current;
    int previous = 0;
    long cr = 0;
    long lf = 0;
    long crlf = 0;
    long lfcr = 0;
    int result = 0;

    do {
        current = fgetc (fptr);
        switch (current) {
            case CR:
                if ( previous == LF ) {
                    lf--;
                    lfcr++;
                    previous = 0;
                }
                else {
                    cr++;
                    previous = current;
                }
                break;
            case LF:
                if ( previous == CR ) {
                    cr--;
                    crlf++;
                    previous = 0;
                }
                else {
                    lf++;
                    previous = current;
                }
                break;
            default:
                previous = current;
                break;
        }
    } while (current != EOF &amp;amp;&amp;amp; ! result );

    fclose(fptr);

    printf("%s", filename);

    int n = 0;
    char *verdict;
    if ( lf > 0 ) {
        printf(" LF=%ld", lf);
        verdict = "LF";
        n++;
    }
    if ( crlf > 0 ) {
        printf(" CRLF=%ld", crlf);
        verdict = "CRLF";
        n++;
    }
    if ( lfcr > 0 ) {
        printf(" LFCR=%ld", lfcr);
        verdict = "LFCR";
        n++;
    }
    if ( cr > 0 ) {
        printf(" CR=%ld", cr);
        verdict = "CR";
        n++;
    }
    if ( n > 1 ) {
        verdict = "MIXED";
    }
    printf(" VERDICT:%s\n", verdict);

    return result;
}

int main(int argc, char **argv) {
    int i;
    for ( i = 1 ; i < argc ; i++ )
        readfile(argv[i]);
    return 0;
}

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to How to identify the end of lines used in a text file

  1. Joel says:

    also consider dos2unix and unix2dos tools for conversion

    http://www.linuxcommand.org/man_pages/dos2unix1.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s