Storage and access to large multidimensional dynamic array

The question was asked: 6 years 11 months ago   views: 51
0

Please tell me, what's the best way to store multi-dimensional dynamic array( the size from 120GB up to 8 - 20 TB) and to access its elements without loading the entire array into memory?

The main condition - fast read access to array elements, the rest doesn't matter. In the extreme case can be written in any language, not just C#.

Asked: 19-11-2012 в 22:06:14
Where your is stored the array in the first place? : without loading the entire array into memory? - 19-11-2012 в 22:07:35
initially, it is created in RAM will be populated with data, then when the size of the OP not to be missed, do I have to save and how to work with it from your hard drive, is it possible? - 19-11-2012 в 22:11:53
OP - this, I mean RAM, right? @Merlin, that you're probably not a beginner in programming, but it all turns upside down: when the size of the OP not to be missed, do I have to save and how to work with it from your hard drive that Is absurd! Ie you intend to store in the array +1000000 values !? You do not seem strange? C# is quite a flexible language, most of the "black" work, he assumes, unlike C++, for example. But what you're trying to do C# developers could not foresee =) - 19-11-2012 в 22:17:07
@Merlin What does it mean "all of it in RAM at exactly fit"? And, by the way, who is SDD (I have not)? I.e. the array does not fit in 128GB? And if so, what equipment are You going to solve the problem and at what time? And whether such a super-mega-project to take? (Recall how many hundreds of desktops initially your MapReduce google something chasing?) - 19-11-2012 в 23:09:52
@alexlz can keep their opinions about the project. I put quite clearly a question. > what's the best way to store > a multidimensional dynamic array( > size from 120GB up to 8 - 20 TB), and > to access its elements without > loading the entire array in > memory? > > The main condition - quick access > read-array elements, the rest > doesn't matter. In extreme cases, can any > programming language, not just C# This is the problem, if You do not see it please do not breed demagoguery. PK - ordinary average desktop. - 20-11-2012 в 12:24:44

Answers   1

0

If the array is "dense", then quickly (although depends on what method you use) to work will not work.

Discharged the same (the vast majority of elements are zeros (or some other predefined value)) can be stored in a hash table with a key from the index.

Answered: 19-11-2012 в 22:28:29
Thank you. And the storage and access of the array elements are in memory, how to implement? - 19-11-2012 в 22:40:14
Good question. For example, can be stored in the DBMS. But most likely it is necessary to invent file format (direct access) which can efficiently store a hash table. There just is the problem of effective access conflicts. Need to come up with format that elements of conflict were in one "record" on the disk (in General to be read in one read-ohms). - 19-11-2012 в 22:50:09
@Merlin, as I understand from other reviews, You are going to use SSD. The idea is good, especially in terms of storage structures direct access, the correct approach to the format might be different compared to HDD, but as far as I know there are pitfalls with performance. It is known that the Oracle logs to place on SSD is not recommended (on the HDD is faster). However I personally to experiment with them (SSDS, etc.) have not yet been. - 20-11-2012 в 00:04:13
@avp of external databases not until you want to use, I will try as You recommend to experiment with the format of the file. Please tell me that this article fits the solution? habrahabr.ru/post/124900 never encountered file formats. - 20-11-2012 в 00:37:41
Looked at the article in the Habre. I don't think that is the case. I meant the binary file formats (without any cross-platform). Ie a format that is quickly (almost without change) to read-written. Something with which to work directly system read/write/lseek. Of course, addresses those parts of the file read into memory to be configured. Inside parts might be able to keep certain offsets (indexes) and to work with them and not direct pointers. Actually saying that, I probably mean more a realization on si. - 20-11-2012 в 01:50:59