Question on fopen()/fread()

Anonymous

Hi All,

I would like to some questions on fopen() / fread() of CCS.

I have noticed that when the "mode" parameter in fopen() is set to "r" reading and without an additional "t" (text) or "b" (binary) to specify the whether in text or binary mode, which then by C standard defaults "t"(text), fread() read at most the content of a file to the first '1A' (in hex) byte.

stdio.h said:

extern _CODE_ACCESS FILE *fopen(const char *_fname, const char *_mode);
extern _CODE_ACCESS size_t fread(void *_ptr, size_t _size, size_t _count, FILE *_fp);

reading experiment said:

FILE *fp;
int data_array[10000];
int f_size;

fp=fopen("data","r");
f_size=fread(&data_array[0],4,10000,fp); //four bytes per item, 10000 items
fclose(fp);

The file "data" could be of 1MB in size, but the return value of fread(), which is assigned to f_size, is only a small value (say, 300) and it can be confirmed to be true by comparing data_array[ ] contents with the original file "data".

Because of the lack of documentation with fopen()/fread(), I turned to the Visual C+ MSDN and found that:

1. "t" (text) is the default mode for fopen()
2. In "t" mode CTRL + Z (^Z, ASCII 1A) is treated as the end of the file.
3. In "b" (binary mode) there is no such treatment.

fopen() in MSDN said:

t
Open in text (translated) mode. In this mode, CTRL+Z is interpreted as an end-of-file character on input. In files opened for reading/writing with "a+", fopen checks for a CTRL+Z at the end of the file and removes it, if possible. This is done because using fseek and ftell to move within a file that ends with a CTRL+Z, may cause fseek to behave improperly near the end of the file.

Also, in text mode, carriage return–linefeed combinations are translated into single linefeeds on input, and linefeed characters are translated to carriage return–linefeed combinations on output. When a Unicode stream-I/O function operates in text mode (the default), the source or destination stream is assumed to be a sequence of multibyte characters. Therefore, the Unicode stream-input functions convert multibyte characters to wide characters (as if by a call to the mbtowc function). For the same reason, the Unicode stream-output functions convert wide characters to multibyte characters (as if by a call to the wctomb function).

b
Open in binary (untranslated) mode; translations involving carriage-return and linefeed characters are suppressed.

I thought on why in "t" mode there needs to be a number '1A' denoting the end of the file. Why is that necessary? Why cannot fread() directly query the OS (Windows, etc.) for the size of file and simply take the returned value on faith? Because in our everyday use, if one wants to know the size of a file on the disk, say a .avi movie, he can simply right click the file name and then choose "property", and Windows will the user its size instantly. Why wouldn't fread() just ask for the same?

Is this behavior (seeking file end in "t" mode) created because the designer of fopen() do not "trust" the OS? When a file is initially created locally or copied from other place, the OS should be able to know its size precisely and probably would log it in its file system's record. Is it possible someone to modify the file without using OS's file accessing routine? For example, a file whose original size is 100KB has been modified in size by other means other than the OS routine (magnetic, optical, etc.), and its last 33KB has been moved to the front and overwrite the original content, and the OS was completely unaware of this change, then obvious in this case the logged value of the file size no longer matches its latest condition.

Chances like this is rare in everyday use, but is still possible. Therefore, is it the intent of fopen()/fread()'s designer to let fopen() check the file size itself rather than getting this information from the OS?

And what about accessing one OS's disk file from another incompatible OS? In this case there is no way to ask for the file size directly and and any file accessing function (similar to fopen()/fread() but in the new OS, probably not using C language) needs to check the file size itself.

I have found that there could be so many different reasons and concerns for why C's fopen()/fread() needs to check for the file end itself. I could not determine which of them was the real intent of the designer.

Could anyone drop a few words on this?

Sincerely,
Zheng

over 15 years ago

0 George Mock over 15 years ago

TI__Guru**** 251550 points

fread returns the number of objects (not bytes) read. It serves as a way to check whether the fread succeeded. If it reads fewer objects than you expect, then you can issue an error diagnostic, throw an exception, or something like that. This return code has nothing to do with the size of the file.

Thanks and regards,

-George

0 Archaeologist over 15 years ago

TI__Guru* 84285 points

Zheng Zhao said:

..

Also, in text mode, carriage return–linefeed combinations are translated into single linefeeds on input, and linefeed characters are translated to carriage return–linefeed combinations on output.

Why cannot fread() directly the OS (Windows, etc.) for the size of file and simply take the returned value on faith?

[/quote]

To oversimplify a complicated issue, the raw size of the file on disk might *not* accurately reflect the length in bytes after carriage-return/line-feed (CR+LF) translation. The behavior of fopen() is a simplified, abstract interface which tries to smooth over the differences in different filesystems so that a program doesn't have to know what kind of filesystem is in use if you are reading and writing text. The most notable difference is between MS-DOS and Unix-like filesystems. In MS-DOS, a file ends with a Ctrl-Z character, and each newline is represented with a two-character combination (CR+LF). In Unix, files end at the length listed by the filesystem, and newlines are represented with one LF character. See also http://en.wikipedia.org/wiki/Newline

You should also be aware that the implementation of fopen in the TI RTS + CCS debugger API is implemented more in the Unix style, and may experience some problems when trying to read MS-DOS files, particularly if you use lseek or binary-mode fopen on a 16-bit target. There is an app note about this somewhere, I'll see if I can dig it up.

0 Archaeologist over 15 years ago in reply to Archaeologist

TI__Guru* 84285 points

Archaeologist said:

You should also be aware that the implementation of fopen in the TI RTS + CCS debugger API is implemented more in the Unix style, and may experience some problems when trying to read MS-DOS files, particularly if you use lseek or binary-mode fopen on a 16-bit target. There is an app note about this somewhere, I'll see if I can dig it up.

SPRA757, "Reading and Writing Binary Files on Targets With More Than 8-Bit Chars"

http://www.ti.com/lit/pdf/spra757

http://processors.wiki.ti.com/index.php/Tips_for_using_printf

0 Anonymous over 15 years ago in reply to Archaeologist

Dear Archaeologist,

Many thanks for the answer, they are to the point. I will also look at the notes.

Archaeologist said:

In MS-DOS, a file ends with a Ctrl-Z character, and each newline is represented with a two-character combination (CR+LF). In Unix, files end at the length listed by the filesystem, and newlines are represented with one LF character.

If Unix file doesn't have Ctrl-Z as their ends, can DOS still correctly recognize them? And if Unix has file length listed by the filesystem, when someone copies a file from DOS to Unix, does he have to add the file length information manually (or by writing some code) in the filesystem?

Does this introduce a fairly significant incompatibility between MS Dos/Win and the Unix family systems?

For newline, I noticed that in editors like Ultraedit there is the option to accommodate both file formats. This seems much smaller an issue than then file end one.

Sincerely,

Zheng

Code Composer Studio™︎

Code Composer Studio forum

Question on fopen()/fread()