Below is a transcript of the conversation I had about FileSystems on the Redox-OS Mattermost chat.
amdphreak1:08 AM
I am researching a new option for filesystems that uses the Associative model of data (graph database model). This will solve alot of problems with maintaining packages and user files across multiple physical media, different lib versions, official and development versions (if local development files). but it will necessitate creating new userland tools and a complete redesign of shell interactivity that revolves around unique ID’s and finding things via a declarative language (similar to sql and graphQL). It will also make searching by metadata and fuzzy search terms extremely painless, and maintaining allocation of files across multiple drives will be much easier. All of this design is predicated on the usefulness of group theory/tagging/bubble clouds and I am debating what other constructs will be needed to make it intuitive. But it is clear by now that hierarchical filesystems were always a horrible idea, because data is not typically cognitively organized this way.
amdphreak1:15 AM
For example, you would be able to locate a set of files that are candidates for what data you really want by issuing a SELECT command (without the homage to sql in the actual implementation). This will present the user with file names of potential options as well as their unique ID’s. To process the file in a command you would recognize which file you wanted from the name and then copy/paste the unique ID into the next command. It would be the search tool’s responsibility to populate a list of diff’s between the files in the event there are two files with two different names.
To clarify, it would not diff the actual file contents, only the metadata. It need only locate one property that is different between all results in order to let the user determine which file is the correct one.
The files would maintain consistency across all systems by grouping them by machine ID, and grouping by user ID.
amdphreak1:24 AM
So you can see that the file metadata effectively makes the files unique. The user can copy files from one disk to another by specifying which disks it should exist on, just like LVM. When a user modifies a file, it gets updated on all disks (buffered, nonblocking, made authoritatively by the fastest disk and recorded to a backlog for delayed processing).
But any system call to the file API will receive only one handle to a file, even though it is on two disks.
The advantages of using Associative data model become apparent when you need to find old junk or specific copies of versions of files. You just type in what you know about the file and then either narrow down results by recognition, or by analyzing the meta-diff
Transferring files from an old user account to a new user account becomes instantly easy. You just search for files tagged “user such and such” and then copy all of them and re-merge them onto the destination machine and you never once have to touch a folder or re-integrate the files. The system does it for you because it’s building its structure from simple combinations of tags.
It astounds me how slow it is to search for files on Windows if they aren’t already indexed. And this file system idea will end the never-ending battle of deciding what the least unintuitive (read: least shitty) way to organize your files hierarchically is. Perfect hierarchies just don’t exist.(edited)
But here’s the good news. This system is backward compatible with existing programs that request files from hierarchical filesystems. The associative data model can have a file that emulates the old file system hierarchies, allowing programmers to gradually transition to the new system. Storing the traditional hierarchy in an xml file or some equivalent solution would work.
thomhuds4:11 AM
@jackpot51 what do you think about this? Personally I have no problems with hierarchical structure and would struggle to understand anything more complex as intuitively as I understand hierarchy
fabiao4:53 AM
Are there any working implementations of this as example?
samwisefilmore2:28 PM1
I think I’ve seen the blog post you’re most likely getting this from. I’d like to point out that the ideas of tagged filesystems and multi-disk filesystems are completely separable: asking for a file descriptor for a specific resource located someplace on one of the disks is not incompatible with a hierarchical tree structure.
You mentioned at the beginning that such a filesystem would require fundamental shift in tooling, but I think that an implementation of tagged filesystems is possible without that using schemes on redox.
I’d also like to see a working implementation of this idea. I think there’s a critical mass of developers who would be interested, somebody needs to do some work to start implementing something. Initially, I think that an implementation based on utilizing the file:
provider’s filesystem as an interface over disks and stuff would be helpful for making development faster, effectively a redirection layer on top of redoxfs.1+
samwisefilmore2:38 PM
You can ofc go down to a proper filesystem once a POC is created, and whatnot
zen3ger2:51 PMPinned1Commented on samwisefilmore‘s message: I think I’ve seen the blog post you’re most likely getting this from. I’d like to point out that the ideas of tagged filesystems and multi-disk filesystems are completely separable: asking for a file descriptor for a specific resource located someplace on one of the disks is not incompatible with a hierarchical tree structure. You mentioned at the beginning that such a filesystem would require fundamental shift in tooling, but I think that an implementation of tagged filesystems is possible without that using schemes on redox. I’d also like to see a working implementation of this idea. I think there’s a critical mass of developers who would be interested, somebody needs to do some work to start implementing _something_. Initially, I think that an implementation based on utilizing the `file:` provider’s filesystem as an interface over disks and stuff would be helpful for making development faster, effectively a redirection layer on top of redoxfs.
https://www.nayuki.io/page/designing-better-file-organization-around-tags-not-hierarchies this one?1
samwisefilmore2:53 PM
Yes, that one
Fri, Feb 08, 2019
zen3ger9:59 AMPinned1
One thing regarding graph based file systems that still a question to me is their performance. If I take relational DBs and graph DBs the part where generally rDB outperforms the other is when there is a fix set of queries and the environment isn’t “too dynamic”. While it would be cool to have very flexible metadata based search engine on a file system it doesn’t necessarily means it requires a full graph FS. A separate search engine that does tag searches on top of HFS could have the advantage of both.1
Wed, Feb 13, 2019
amdphreak5:44 PMPinned
I did not get this from a blog. I came up with this idea from scratch. However, the Brian Will videos on youtube also contain a subset of this idea (circa 2014). And there is a single lonesome research article from 2002 or 2007 (will post if I can re-find it) that has this same idea. And I just found this new video from Linux Conf 2019 (Australia) with a similiar idea: https://www.youtube.com/watch?v=wN6IwNriwHc @fabiao (edited)
amdphreak5:55 PM
That video has more research on projects that have already implemented this idea.
Look at 6:25
They use RDBMS as their model (Relational model). I’m thinking an Associative model would be better, particularly when you want to get analytics. With this built in, it would make Redox incredibly attractive to analytics programmers who want to deploy websites.
amdphreak6:05 PM1Commented on zen3ger‘s message: One thing regarding graph based file systems that still a question to me is their performance. If I take relational DBs and graph DBs the part where generally rDB outperforms the other is when there is a fix set of queries and the environment isn’t “too dynamic”. While it would be cool to have very flexible metadata based search engine on a file system it doesn’t necessarily means it requires a full graph FS. A separate search engine that does tag searches on top of HFS could have the advantage of both.
Thanks for the analysis. I’m not entirely familiar with Graph dbs. I know there are a few different implementations of associative data model, and Graph dbs are one kind. Egh. Need to research more.
amdphreak6:37 PMPinned1
@thomhuds have you ever had to manage a huge library of files spanning multiple disks? Ever tried to look up a file that you know you have but don’t really remember what it was called or where you put it? This operation takes forever on a hfs, and most of the time is not possible, if you have to distinguish between thousands of similarly-named files. Have you ever tried to navigate a Windows C:\ file-system structure? Have you ever gotten fed up with Windows putting a bunch of temporary files on an external hard drive and then leaving empty directories there? This habitual lack of cleanliness is a problem with the hfs usage pattern. Have you ever tried to migrate a user’s files and settings for all of their apps from an old OS installation to a new one in any OS? Ever wonder why there is a C:\users\myuser\AppData\Local\Temp folder? Have you ever wanted to just find your files without always having to memorize where you put everything? Human memories are associative, not hierarchical. You can describe your files in heuristic terms: this is what the search function in a FS explorer attempts to do, but it does so inflexibly and poorly. When your brain thinks of a new way to differentiate between files to narrow down the search results, and sort them, you have to manually hard-code that search functionality in the current model, or restructure your entire file-system, which is impossible for developers that wish to keep a stable system. Have you noticed how the Unix file-hierarchy hasn’t changed much in forever due to developers’ fears of change, and the resulting confusion this causes for new users? I’ve noticed alot of Linuxers don’t have large collections of music or photos or videos, until they’re doing it as a business, and they don’t worry about finding those things by metadata. Metadata is what your brain remembers. Over time you will forget more and more details. Memorizing locations is inefficient and unsustainable; it is unfriendly to your brain, which is why we invented Google for the internet. People’s personal troves of data are getting much larger, as well. Because memorizing URLs is idiotic, and a hierarchical organization of the internet is unscalable. Hierarchy is unnatural for how we organize memories and therefore how we organized data, but it is natural for how we make decisions and process that data. So naturally, just to be stupid, the software industry habitually stores things in hierarchies and forces you to design software associatively (OO paradigm).
Now to top it all off, have you done those things while managing your storage space? Let’s say you want video collection on an external hard drive, and all your games on an internal one? But you want your Documents path (C:\users\Documents) to reside on the internal drive, so that all those system-generated settings and files, like game saves, and application stuff (seriously just look at a user’s Documents folder after using a few dozen apps; they clutter it up and make it entirely unusable). You need the ability to isolate space requirements from program behavior requirements.(edited)
samwisefilmore8:42 PM1
I don’t think anybody is specifically arguing against the need for a relational filesystem, but having an implementation would be more likely to garner attention and serious consideration for its use in redox.
amdphreak8:43 PM
Yeah there are implementation/s. The video above is by someone who found an old implementation and updated it a bit and tested its performance.
Thu, Feb 14, 2019
thomhuds1:21 AMPinned1Commented on amdphreak‘s message: @thomhuds have you ever had to manage a huge library of files spanning multiple disks? Ever tried to look up a file that you know you have but don’t really remember what it was called or where you put it? This operation takes forever on a hfs, and most of the time is not possible, if you have to distinguish between thousands of similarly-named files. Have you ever tried to navigate a Windows C:\ file-system structure? Have you ever gotten fed up with Windows putting a bunch of temporary files on an external hard drive and then leaving empty directories there? This habitual lack of cleanliness is a problem with the hfs usage pattern. Have you ever tried to migrate a user’s files and settings for all of their apps from an old OS installation to a new one in any OS? Ever wonder why there is a C:\users\myuser\AppData\Local\Temp folder? Have you ever wanted to just find your files without always having to memorize where you put everything? Human memories are associative, not hierarchical. You can describe your files in heuristic terms: this is what the search function in a FS explorer attempts to do, but it does so inflexibly and poorly. When your brain thinks of a new way to differentiate between files to narrow down the search results, and sort them, you have to manually hard-code that search functionality in the current model, or restructure your entire file-system, which is impossible for developers that wish to keep a stable system. Have you noticed how the Unix file-hierarchy hasn’t changed much in forever due to developers’ fears of change, and the resulting confusion this causes for new users? I’ve noticed alot of Linuxers don’t have large collections of music or photos or videos, until they’re doing it as a business, and they don’t worry about finding those things by metadata. Metadata is what your brain remembers. Over time you will forget more and more details. Memorizing locations is inefficient and unsustainable; it is unfriendly to your brain, which is why we invented Google for the internet. People’s personal troves of data are getting much larger, as well. Because memorizing URLs is idiotic, and a hierarchical organization of the internet is unscalable. Hierarchy is unnatural for how we organize memories and therefore how we organized data, but it is natural for how we make decisions and process that data. So naturally, just to be stupid, the software industry habitually stores things in hierarchies and forces you to design software associatively (OO paradigm). Now to top it all off, have you done those things while managing your storage space? Let’s say you want video collection on an external hard drive, and all your games on an internal one? But you want your Documents path (C:\users\Documents) to reside on the internal drive, so that all those system-generated settings and files, like game saves, and application stuff (seriously just look at a user’s Documents folder after using a few dozen apps; they clutter it up and make it entirely unusable). You need the ability to isolate space requirements from program behavior requirements.
(This message is a sentence-by-sentence rebuttal; I apologise if it is hard to follow but I am on mobile and cannot easily make structural changes)
No, because I don’t put related files on different disks, because that doesn’t make any sense.
Yes, and ripgrep made it blindingly easy.
This is only really true for pictures and videos that have not been renamed; personally I go through each file and rename it based on contents before using files like these. This would take a similar amount of effort as tagging each file.
Yes, and it makes a surprising amount of sense given that it’s on Windows.
No, but if I had then that’s Windows’ fault, not the filesystem’s.
…
Yeah. cp ~/.config /mnt/backup/
couldn’t have been easier.
No, it says right there in the name.
I can already do this. You seem to be forgetting directories can have names; by looking at the names I can tell where everything is instantly. I never thought this would be a problem. A directory named “code” is pretty unambiguous.
With tab-completion and GUI file explorers the only fact I need to remember is that my Redox source tree is under /hd/, everything else can be figured out on the fly.
…
I’ve never thought of a better way to structure an existing hierarchy, so I can’t really tell if this is true.
Yes, and I have personally argued against that for a while and even tried to think of a new one.
…
What kind of metadata does my brain remember? To use the image example, if I had one image to select from a large group, I would definitely remember it by how it looked and not time taken etc.
Forgetting details only matters if they’re not written in the file metadata.
(Cut to the next paragraph; what little of this section that is actually about filesystems has already been covered)
Done all what things?
(this sentence only has meaning in conjunction with the next)
(this sentence is incomplete and I don’t understand what its point is)
I don’t understand this sentence – what are program behaviour requirements and how are they not isolated already?(edited)
Leave a Smart Comment