S3 Files: Well, that's a bit of a game-changer!

AWS has launched S3 Files – a new offering that lets file-based applications, agents, or tools work “directly” with object storage data in S3 buckets.

The landmark release (GA across 43 regions) “makes S3 buckets directly accessible as high-performance file systems” Amazon said on April 7.

S3 is, in short, no longer just for object storage.

Users can mount S3 buckets inside EC2 VMs, containers, or Lambda functions and access that data through your file system: "If you make changes, your changes will be propagated back to S3. As a result, you can work with your objects as files, and your files as objects" AWS VP Andy Warfield said.

It also now promises “full file system semantics including consistency, locking, and POSIX permissions” directly via S3 Files.

That’s a potentially attractive proposition for many – and AWS thinks it’s blazed a welcome trail, describing it as “first and only cloud object store that provides fully-featured, high-performance file system access to your data.”

(Systems typically require separate file and object storage – often resulting in complex pipelines to keep the two in sync, as well as data duplication. This lets organisations do it all in S3 – a potential game-changer that could reduce data siloes, synchronisation headaches, or manual data moving.)

The new system is built on Amazon’s existing Elastic File Storage (EFS) but integrates it tightly with S3 and automates the management so that read requests are routed to the storage layer (file system or bucket) best suited for them, with small, random reads of actively used files served from a high-performance storage layer for low latency, documentation shows.

“Read throughput scales with… connected compute instances and the degree of parallelism within each… maximum per-client read throughput is 3 GiB/s. S3 Files supports up to terabytes per second of aggregate read throughput and up to 250,000 read IOPS per file system” AWS said.

“It’s EFS with cold storage on S3, or S3 with EFS write-thru cache,” sniffed one senior software architect, Łukasz Rekucki, “Neat, but nothing new. You still pay extra for the EFS part” he commented on X. (You can please some of the people some of the time, but you can’t please all of the people…)

For the compute needed, buckets can be mounted on EC2, ECS, EKS, or Lambda. Users can read, write, edit files with standard NFS operations, and changes sync back to S3 automatically, Amazon said on the product page. Compute must run in the same Amazon VPC as the S3 file system, it added.

AWS already had “Mountpoint” – which lets users mount an Amazon S3 bucket as a local file system. But the hyperscaler’s principal developer advocate Sébastien Stormacq said S3 Files provides far lower latency, as well as “intelligent management of the pieces that are brought to the file system layers” and far more fine-grained POSIX permissioning capabilities.

A closer look at the documentation show that “when you modify a file in the file system, S3 Files waits up to 60 seconds, aggregating any successive changes to the file in that time, before copying to your S3 bucket…”

That’s handy for saving on PUT requests, but a potential deal-breaker for applications that require more real-time syncing and potentially for those with multiple instances that might want to use S3 Files to talk to the same bucket with strict "Read-after-Write" consistency across multiple nodes.

(Think we're wrong? Let us know by email; comment welcomed!)

The behind-the-scenes jiggery-pokery AWS has conjured up to make this happen incurs charges as follows that will also need careful assessment.

AWS said: “Background sync operations incur data access charges: importing data onto the file system's high-performance storage incurs write charges, and exporting changes back to your S3 bucket incurs read charges…”

(High-performance storage is $0.30/GB/month and reads from it are $0.03/GB; writes are $0.06/GB. File reads from S3 buckets are free.)

As AWS-watcher Corey Quinn noted: “One pricing nuance that isn’t obvious from the pricing page: the first time you read a small file, it gets imported onto the fast storage and you pay the $0.06/GB import write charge.

“The read itself is included in that operation – you’re not paying $0.06 to place it plus $0.03 to read it. So first-read cost for small files is $0.06/GB (double the headline read rate), and subsequent reads of the same cached file are $0.03/GB… Your Parquet files? Still free via S3 GET,” Quinn wrote.

He added: “The pricing is reasonable—you’re charged proportional to what you’re actually using the filesystem for, not for the privilege of having mounted the bucket. But between the 32 KB minimums and the first-read import cost, model your workload’s actual I/O patterns before committing.”

But AWS sees a potentially eager market. Per Stormacq:

“Whether you’re running production tools that already work with file systems, building agentic AI systems that rely on file-based Python libraries and shell scripts, or preparing datasets for ML training, S3 Files lets these interactive, shared, hierarchical workloads access S3 data directly without choosing between the durability of Amazon S3 and cost benefits and a file system’s interactive capabilities.

To conclude, in brief (from our understanding): When you create an S3 file system, AWS provisions a high-performance filesystem layer based on EFS infrastructure, with sub-millisecond latencies and high IOPS. This acts as a read cache and a write-back buffer that temporarily holds new data before it is bundled and shipped to S3 – where the "authoritative" version of your data lives in S3, giving you all the established durability wihout the need to manage EFS backups or replication. As AWS sums up, this lets "S3 Files maintain[s] a view of the objects in your bucket and intelligently translates your file system operations into efficient S3 requests on your behalf.

"Your file-based applications run on your S3 data with no code changes, AI agents persist memory and share state across pipelines, and ML teams run data preparation workloads without duplicating or staging files first."

AWS Distinguished Engineer Andy Warfield has a fantastic 6,000 blog about the steps it took to launch S3 Files here.

In it, he notes that authorisation was a "thorny" challenge to crack.

"S3 and file systems think about authorization in very different ways. S3 supports IAM policies scoped to key prefixes—you can say “deny GetObject on anything under /private/”. In fact, you can further constrain those permissions based on things like the network or properties of the request itself. IAM policies are incredibly rich, and also much more expensive to evaluate than file permissions are.

"File systems have spent years getting things like permission checks off of the data path, often evaluating up front and then using a handle for persistent future access. Files are also a little weird as an entity to wrap authorization policy around, because permissions for a file live in its inode. Hard links allow you to have many inodes for the same file, and you also need to think about directory permissions that determine if you can get to a file in the first place. Unless you have a handle on it, in which case it kind of doesn’t matter, even if it’s renamed, moved, and often even deleted.

"... by moving to an explicit boundary, the team got themselves out of having to co-represent both types of permissions on every single object. Instead, permissions could be specified on the mount itself (familiar territory for network file system users) and enforced within the file system, with specific mappings applied across the two worlds.

"This design had another advantage. It preserved IAM policy on S3 as a backstop. You can always disable access at the S3 layer if you need to change a data perimeter, while delegating authorization up to the file layer within each mount. And it left the door open for situations in the future where we might want to explore multiple different mounts over the same data."

nb: AWS has a user guide here; pricing guide here; performance specs here.