Текстовый файл огромного размера

← →
ИМХО © (2004-10-17 18:22) [40]

> panov © (17.10.04 17:32) [39]
> Ну, возможно, надо прибавить к длине каждой строки 2 (если
> строки заканчиваются #13#10)

и здесь вы оказались абсолютно правы, спасибо.

а вот такой вопрос можно?

как теперь лучше обращаться к файлу:
1) через TFileStream.Create, затем Position и Read
или
2) через AssignFile, Reset(FromF, 1), Seek и BlockRead

???

← →
default © (2004-10-17 18:38) [41]

ИМХО © (17.10.04 18:22) [40]
глянул я в окно CPU при вызове Reset вообщем там апишная OpenFile вызывается в конце концов так что это оболочка api ф-ий и всё поэтому по большому счёту разница, наверно, невелика что использовать, что удобней то и используй

← →
VMcL © (2004-10-17 18:41) [42]

>>ИМХО © (17.10.04 18:22) [40]

>как теперь лучше обращаться к файлу

Не важно. При последовательной обработке файла можно еще попробовать CreateFile() с флагом FILE_FLAG_NO_BUFFERING.

← →
Anatoly Podgoretsky © (2004-10-17 18:41) [43]

default © (17.10.04 18:38) [41]
Ты не совсем прав, файлы Паскаля это высокоуровневая оболочка на системой и это не обязательно Виндоус.
Кроме того они или работают с понятиями строка, тип, блок, а не байт, как это у низкоуровневых библиотек, как АПИ так и TStream

← →
Fay © (2004-10-18 04:15) [44]

2 VMcL © (17.10.04 18:41) [42]
Если не секрет, а нафиг? Что это даст при последовательной обработке?

← →
VMcL © (2004-10-18 07:14) [45]

>>ИМХО © (17.10.04 18:22) [40]

Кстати, забыл, кроме FILE_FLAG_NO_BUFFERING, можно еще поиграться с FILE_FLAG_SEQUENTIAL_SCAN.

>>Fay © (18.10.04 04:15) [44]

Тут так сразу сложно ответить. Просто приведу выдержку из книги:
CreateFile Cache Flags
FILE_FLAG_NO_BUFFERING This flag indicates not to use any data buffering when accessing a file. To improve performance, the system caches data to and from disk drives. Normally you do not specify this flag, and the cache manager keeps recently accessed portions of the file system in memory. This way, if you read a couple of bytes from a file and then read a few more bytes, the file"s data is most likely loaded in memory, and the disk has to be accessed only once instead of twice, greatly improving performance. However, this process does mean that portions of the file"s data are in memory twice: the cache manager has a buffer, and you called some function (such as ReadFile) that copied some of the data from the cache manager"s buffer into your own buffer.

When the cache manager is buffering data, it might also read ahead so that the next bytes you"re likely to read are already in memory. Again, speed is improved by reading more bytes than necessary from the file. Memory is potentially wasted if you never attempt to read further in the file. (See the FILE_FLAG_SEQUENTIAL_SCAN and FILE_FLAG_RANDOM_ACCESS flags, discussed next, for more about reading ahead.)

By specifying the FILE_FLAG_NO_BUFFERING flag, you tell the cache manager that you do not want it to buffer any data—you take on this responsibility yourself! Depending on what you"re doing, this flag can improve your application"s speed and memory usage. Because the file system"s device driver is writing the file"s data directly into the buffers that you supply, you must follow certain rules:

You must always access the file by using offsets that are exact multiples of the disk volume"s sector size. (Use the GetDiskFreeSpace function to determine the disk volume"s sector size.)

You must always read/write a number of bytes that is an exact multiple of the sector size.

You must make sure that the buffer in your process"s address space begins on an address that is integrally divisible by the sector size.

FILE_FLAG_SEQUENTIAL_SCAN and FILE_FLAG_RANDOM_ACCESS These flags are useful only if you allow the system to buffer the file data for you. If you specify the FILE_FLAG_NO_BUFFERING flag, both of these flags are ignored.

If you specify the FILE_FLAG_SEQUENTIAL_SCAN flag, the system thinks you are accessing the file sequentially. When you read some data from the file, the system will actually read more of the file"s data than the amount you requested. This process reduces the number of hits to the hard disk and improves the speed of your application. If you perform any direct seeks on the file, the system has spent a little extra time and memory caching data that you are not accessing. This is perfectly OK, but if you do it often, you"d be better off specifying the FILE_FLAG_RANDOM_ACCESS flag. This flag tells the system not to pre-read file data.

To manage a file, the cache manager must maintain some internal data structures for the file—the larger the file, the more data structures required. When working with extremely large files, the cache manager might not be able to allocate the internal data structures it requires and will fail to open the file. To access extremely large files, you must open the file using the FILE_ FLAG_NO_BUFFERING flag.

← →
panov © (2004-10-18 10:08) [46]

>Fay © (18.10.04 04:15) [44]

Если не секрет, а нафиг? Что это даст при последовательной обработке?

Система не будет использовать запись во внутренние буферы, а будет сразу отдавать данные приложению. За счет этого операция чтения будет выполняться быстрее(с точки зрения приложения).

← →
Fay © (2004-10-19 12:11) [47]

2 panov © (18.10.04 10:08) [46]
Если склероз мне не изменяет, эта самая система читает из файла несколько больше, чем запросили.
Именно при последовательном доступе это дожно дать выигрыш.

← →
panov © (2004-10-19 12:26) [48]

>Fay © (19.10.04 12:11) [47]
Если склероз мне не изменяет, эта самая система читает из файла несколько больше, чем запросили.

Нет, не несколько больше, а объем данных, пропорциональный размеру кластера.

Именно при последовательном доступе это дожно дать выигрыш.

Заметь, что чтение происходит во внутренние буферы системы.
При запросе приложения на чтение данных система читает данные и записывает в свой буфер.
После этого передает эти данные приложению, на это уходит некоторое время.

Как раз этот флаг FILE_FLAG_NO_BUFFERING позволяет избежать этого промежуточного этапа в получении данных.

Еще добавлю, что излишества в виде "...несколько больше, чем запросили..." только замедляют работу при непрерывном последовательном чтении

2 panov © (19.10.04 12:32) [49]

> только замедляют работу при непрерывном
> последовательном чтении

Поверить в это не сложно, но видеть это замедление не приходилось.

>Fay © (19.10.04 13:21) [50]

Прошу прощения - неправильно сформулировал.

При последовательном чтении всего файла буферизация только замедлит работу.

Текстовый файл огромного размера Найти похожие ветки