All the data is encrypted so it doesn’t matter where they are storing it. Using multiple service providers (Google, Amazon, their own servers, etc) might actually be beneficial because it eliminates one point of failure, especially if they replicate data in a smart way to take advantage of it. It might also speed up access to data.
Is the assumption that google is going to store, then attempt to decrypt the data at some point in the future? Or is the claim that google's services are less secure than Apple's would be? If not, then I'm not sure that this is worse at all. Sure, if you believe google to be actually malicious, then this would be a problem, but I don't think there's a reasonable argument to say that google would go to this length of maliciousness.
Well, I don't assume either really but I guess GP does. I find it unlikely to the extreme that Google would ever want to scan this content or even less so that they would go to lengths to decrypt it first. But saying that all is lost if the files can't be protected by encryption is a strawman as the data is unreachable by Google if hosted elsewhere.
Google -is- a malicious and exploitative surveillance capitalism company preying on people that don't feel like they have a choice.
Inch by inch "don't be evil" has been replaced with "maximize profit".
I have no doubt that if in 20 years decrypting "historical" Apple user data for "training purposes" is legal and will make Google leadership more money, they will pressure ethically flexible engineers to do it for them.
Assume anything profitable that is legally defensible somewhere in the world will be done by every surveillance capitalism company at some point.
Well, you may be right, but then if you are, I’m not sure you can justify leaving your data with Apple either, which is a public company that has the same underlying market incentives. In fact, if that’s what I believed, I doubt I would put any of my data anywhere, and rather just go live in the woods, or more likely, just give up and stop caring. I’m not convinced that Apple will be any more or less malicious than google is/was/will be when considering a 20 year time frame.
No it isn’t, wether Apple stores its data in a contract data center or one it owns doesn’t change the fact that if you don’t believe encryption is effective then it’s a lost cause to worry about the privacy of your data.
I mean if modern encryption proves broken, the world has quite a bit larger problems than open access to the videos you took of that 2017 summer holiday to Tahiti or the password vault containing your keys to log into HN and your Target account.
Of course there is no guarantee: AES256 could be broken tomorrow, or maybe it has already been. What we know is that, extrapolating compute speed from the past decades and even assuming quantum computers become useable in practice, the best algorithms we currently have cannot be brute-forced within the next 50 years.
> AES256 could be broken tomorrow, or maybe it has already been.
This is extremely unlikely.
> What we know is that, extrapolating compute speed from the past decades and even assuming quantum computers become useable in practice, the best algorithms we currently have cannot be brute-forced within the next 50 years.
Quantum computers only offer a quadratic speedup against symmetric ciphers.
AES 256 will survive much longer than the next 50 years against brute force attacks.
It will either be broken spectacularly, using theoretical methods entirely inconceivable today, or live on – brute force is of no concern at all due to the amounts of energy and matter required to perform it against 256 bit keys.
AES-256 was broken in 2011.[1] While only four times faster than brute force and thus not a practical attack, it suggests that compromise is possible. The Snowden documents indicated that the NSA was working on breaking AES-256. It seems unlikely they would waste effort on a task they considered impossible. Whatever they achieve will be achievable by others eventually.
On top of that, no implementation is perfect. Bugs are discovered in cryptographic APIs on a regular basis. Even if your API is perfect, the application calling the API can have bugs that allow compromise.
>>AES 256 will survive much longer than the next 50 years against brute force attacks.
From what I understand it simply can't be broken by brute force because simply iterating through every possible value of a 256 bit key would require more energy than there is in the universe, and that's without actually trying any of the combinations, just simply having a computer do a i++ through all possible values.
I'm not sure if quantum computing helps here in any way , someone else would need to chime in here with details.
> iterating through every possible value of a 256 bit key
It's my understanding that when encryption gets "broken", it usually refers to something other than a simple brute force attack. Like, something that would make it so you don't need to run as many iterations or whatever.
I assume this because a brute force attack is something that is always possible from day 1, whereas an encryption scheme being broken is something that happens some time afterwards.
My understanding is that encryption is "broken" any time it becomes feasible for someone to decrypt your data without the key. Brute force attacks are always hypothetically possible, but the encryption isn't broken unless such an attack is feasible.
As a counter-example, DES would count as "unbroken" under your definition. The EFF built a machine in 1998 for under $250,000 that could crack a DES key in under 24 hours. I don't know what that would cost today, but I wouldn't be surprised if a couple GPUs could get you the same thing today.
The difference is whether such an attack has even a vanishing chance of succeeding. For AES, the hardware just isn't anywhere close to that. Afaik, there isn't anything that could even hypothetically threaten to make brute force attacks on AES feasible on the table today.
I think you're mixing "weak" and "broken". Out of interest I looked at the Known attacks section of wikipedias AES article and it says as the first sentence "For cryptographers, a cryptographic "break" is anything faster than a brute-force attack".
DES is both weak and broken, but it could be either without the other.
> I'm not sure if quantum computing helps here in any way
Theoretically a quantum computer can brute-force AES-256 using 2^128 sequential steps using Grover's algorithm (i.e. a quadratic advantage over a classical computer). Parallelization diminishes the advantage, e.g. if you're limited to 2^64 sequential steps, you get a 2^64 speedup over classical, for a cost of 2^192 which is still ridiculously large.
Thus quantum computing is not a relevant threat for AES-256 or most other 256-bit symmetric crypto.
The current academic consensus is that AES-256 is quantum resistant, as even with Grover's algorithm the attack would still require 2^128 bits of work. Generally symmetric encryption schemes are impacted to a much lesser extent than public key cryptosystems, with larger key sizes sufficing to ensure quantum resistance. I'd be quite surprised if Apple weren't using AES-256 already.
A minimum of AES-128, which could mean 100% AES-128 or 1% AES-128 from old iOS devices that haven't been updated in years. And even AES-128 would require doing a 2^64 exhaust on a quantum computer, which would be quite a remarkable feat.
Quantum cryptography is much less convenient than conventional asymmetric cryptography and offers few benefits. Doesn't stop people from trying to sell it though...
And for the "encrypted data at rest" scenario we're talking about here, where symmetric encryption suffices, quantum crypto makes no sense what-so-ever.
> Unless you have encryption that is guaranteed to never be obsolete it ALWAYS matters where you store the data.
Well, while there is truth to that, it isn't the whole story. There is a time value to all information that must be factored in. If nothing else, think of one overbroad* classification example: Battle plans, SECRET; Intelligence; TOP SECRET.
* By which I mean there are subtleties and exceptions too numerous to go into here, but the example remains largely illustrative.
The higher the classification, the higher the long term time value, the greater robustness required in one's controls.
In this case, information about/for large numbers of private individuals, today's strong enough symmetric encryption may be strong enough for quite some time.
Or it might not be. I'd love to see a detailed risk assessment....
It does seem likely that Google will pay to store every bit of encrypted customer data that is currently stored forever because potentially decades from now it could be cracked and they could access all that amazing ancient private information.
It still puzzles me that people just can't understand the the huge chasm between storing and using peoples data from their own products and stealing data from other companies thats stored on your systems in the hope that one day it can be decrypted and used to improve ads.
Google is a huge company, the idea that it would set out to do something that everyone involved in would know is directly breaking the law (rather than doing something thats a grey area, or they know is legal but 'unethical' or could become a PR issue, or destroy trust in them and destroy their product) is fairly unbelievable.
This comes up again and again with stories about Google and (particularly) AWS cloud computing. I hope for better on HN!
I mean, there’s these posts on Hacker News claiming to be from former AWS employees saying they stole customer data to launch competing services: https://news.ycombinator.com/item?id=23929959
Maybe the first story is fake and the second is real… but they both point to a “win at all costs” company culture where policies might be violated, even if it threatens trust in the platform and a PR problem when exposed.
But those cases are very different too (I’m not trying to defend what they did). Talking to a startup then copying their product is douchy but not illegal. Using Amazon.com sales data is like looking at the public “top products in electronics” and selling your own version of the top sellers - I imagine all large retailers do something similar when deciding what products to make own versions of.
Hacking computers, networks, or services of your competitors, even when running in you data center is just bad business.
> from former AWS employees saying they stole customer data to launch competing services
You're reading that a lot differently than I am.
The quoted post says:
>AWS proactively looked at traction of products hosted on its platform, built competing products, and then scraped & targeted customer list of those hosted products
None of that reads, to me, as them having had to use confidential data to do any of these things.
You can identify many organisations that are running on AWS without knowing anything about AWS accounts - blog posts, IP space, public code, social media comments from staff, linkedin and all sorts of other places will often reveal that.
Scraping/finding customer lists can be done using research, too. I've spoken to Account Manager-types at places and they've often used various tools that scrape other public resources to identify customers of competing services.
Swap out company names, and it's effectively what I've seen from a bunch of companies, without it delving into anything unethical.
> It still puzzles me that people just can't understand the the huge chasm between storing and using peoples data from their own products and stealing data from other companies thats stored on your systems in the hope that one day it can be decrypted and used to improve ads.
Data is still data.
> Google is a huge company, the idea that it would set out to do something that everyone involved in would know is directly breaking the law (rather than doing something thats a grey area, or they know is legal but 'unethical' or could become a PR issue, or destroy trust in them and destroy their product) is fairly unbelievable.
They do this in Europe by not complying with GDPR. ( and they are not the only one)
> This comes up again and again with stories about Google and (particularly) AWS cloud computing. I hope for better on HN!
This sentiment seems so weird to me. Can you point to a single enterprise offering where Google is collecting data and using it themselves? Or where it would even make sense to do so? Corporate customers pay a lot more per byte than the fleeting value of private data so Google has a strong incentive to never ever touch that data.
Peter's comment comes off as satire to me. A joke about how absurd it would be for google to store 8 exabytes of data for decades because it might one day be useful.
Because it might one day be useful for.. targeting ads? I can almost see the abuse potential, the ads "I know what you did the summer of 1997" will turn us all into mind controlled consumer drones. (those of us who are old enough)
I don't think that is really the point that was being made. As you say the practical chances of google storing this data for x years, then committing corporate suicide by decrypting it is are miniscule (and presumably apple agrees).
The point is that there are layers of security, and by moving the data outside of their physical control apple has given up one of those layers of security.
I'm not sure why people are are thinking in such a limited fashion. Google is effectively holding onto the data for the US government or whoever else has the capability to access all that data in the future. This is the kind of stuff authoritarian governments dream about. And I'm not saying Google is doing it intentionally, just that they're holding onto the data at all and at some point there will be somebody who will make use of that data.
If that is the argument, does it matter whether it is Apple or Google who is holding the physical data? Apple and Google are both based in the US and beholden to the US government, so from the government's perspective it's just a change of address when they send out a warrant.
It doesn't. I'm not sure why you think I think there's a difference. Any corporation that stores our data long-term is a threat to our privacy and our rights.
> does it matter whether it is Apple or Google who is holding the physical data?
It might matter, yes. A company run by a guy like Eric Schmidt is a lot more likely to play nice with the US government when it comes to privacy compared to a company run by a guy like Cook, who from the outside seems obsessed with user-privacy (as long as China isn't directly involved).
Of course, just wanted to say that not two big US companies are the same, it highly depends on who leads them. The powers that be that decided that a guy like Schmidt was fit to run a company like Google could do that again.
I mean google is full of employees who threatened to walk out when their employer wanted government defense contracts. Pretty sure google would get a ton of internal pushback if a team wanted to do what you describe.
Google employs the same standard issue tech person you see here on HN.
What if the user encrypted and stored the data on their own storage media. It might actually be beneficial because it eliminates dependence on a third party, i.e., a point of failure. "It might also speed up access to data."
It’s a bit of an open secret among big tech that Apple is a huge user of storage on all major clouds. They manage encryption on their hardware and in return get cheap storage, higher availability and competition from their suppliers. Highly doubt Google has access to any iCloud data other than an estimate of the immense size of the service.
> Highly doubt Google has access to any iCloud data other than an estimate of the immense size of the service.
They do of course have access to the ciphertext and access to traffic patterns, in crypto threat models traffic analysis tells the adversary a wealth of information.
Why would Google even attempt to spy on those patterns? it serves no purpose and how do you justify employee time spent for such useless things? Not to mention apple proxies all requests through their servers rendering such analysis utterly useless to begin with.
On one occasion the traffic to google cloud is systematically not proxied: every time one sends an attachment in iMessage, the file (or the media) is encrypted on device and sent to gcs-{eu,us,asia}-00002(?).content-storage-upload.googleapis.com, received from gcs-{eu,us,asia}-00002(?).content-storage-download.googleapis.com
This should be pretty visible to Google, the rest of the traffic is handled better.
How do both parties determine the keys used during a conversation?
Are they making heavy use of public key cryptography? If so how? When I send a message to you, do I encrypt it using your public key? What about group messages? Does each conversation get its own key pair?
Also it’s interesting they decided to directly hit up google cloud… you’d think they would wrap it so at minimum they could tweak the underlying infrastructure without requiring every client to update.
> How do both parties determine the keys used during a conversation?
They don’t: public key cryptography is not initially used.
The sender generates a random AES-256 key, applies it in CTR mode and uploads the encrypted blob to GCS.
Every receiving device gets a message with the key, the URI, and the SHA-1 of the blob. These messages are encrypted as usual and sent via APNS (<n>-courier.push.apple.com:5223)
> you’d think they would wrap it so at minimum they could tweak the underlying infrastructure without requiring every client to update
Apple does this: two other endpoints are *.blobstore.apple.com and the Chinese Guizhou-Cloud Big Data.
In my logs blobstore is used less than 1% of the time.
It just means Google may provide access to metadata outside of Apple’s control. Those metadata could be useful to do classification of anomalies on the basis of pattern of life analysis, or similar.
Very true when each user is accessing its own encrypted data directly. But from what I read here in the comments, Apple is managing encryption on their own HW, which almost surely means that data is read and wrote from Apple’s machines.
Such aggregation of read and write calls across users makes access traffic patterns analyses risk fairly minimal...
Exact byte sizes of files in sets of files in a io pattern, combined with timestamps, with many occurrences (=chances at correlation), are very distinguishing.
If Apple is anywhere near competent the traffic patterns are useless, all Google can see is ‘data stored on that day was accessed on another day’. All traffic flows through Apple servers and Google has no way of even correlating it to a user.
I would think Apple is smart enough to mix storage blobs, so one blob is not one user. Plus all requests come from Apple datacenters, not user devices.
Let's say Eve was working at Google with access to the ciphertext and io traffic, and was a jealous lover to a iPhone user Alice. They could be interested in profiling IM app usage, to know about size profile of content (pictures?) were stored at what times to know about Alice's activity profile when out of sight. They could also be interested in fingerprinting apps used so they could be matched to for example different dating apps.
It's not hard to imagine something business or government related in place of this of course. And do this analysis in aggregate and follow many people at once.
Apple could and for all we know possibly has implemented countermeasures for many of these cases, eg to make it hard to distinquish users from the mass of ciphertext.
I'm just sketching something, I'm not claiming to have developed a vulnerability research result here from my arm chair. But there's a lot of terrain between "it's encrypted so there are no attack" and "you haven't presented a fully developed attack" especially about a system we don't know much about.
Anyway, there are many scenarios that come to mind for knowing their IO sizes. Apps probably have IO fingerprints. Or you could send the set of suspected users differently sized files to probe them. Etc.
Basic batching or partitioning of data before offloading to storage would defeat this analysis, and that’s assuming you can access and query so much data regularly.
I asked for practical examples. I don’t need an in-depth report to see that this doesn’t qualify.
Most of this crypto stuff is completely impractical risk, especially compared to some phishing emails.
In security engineering we have the standard the other way around - the proponents of a design have the burden of responsibility to argue the system is secure beyond reasonable doubt, and address and elucidate any potential cracks in the security design. The subthread was about potential pitfalls even in presence of encryption.
Because terabytes is a good frame of reference for most computer users. I get paid well to work in the computer space and if I saw 8 exabytes, I'd have to consult a chart/list to just determine how much that is.
Most people don't know what an exabyte is. That's why you frame the amount of data in an amount of data (terabyte) that people know.
Same reason for we say 1000 km instead of 1 Mm, or Sun to Earth 150 M km instead of 150 Gm. Some units are usual in some contexts, some are not. In one word, habits.
Those prefixes are bad because they break the 10^(3n) pattern. centi and hecto are also bad for the same reason. Most calculators have an "engineering" mode which only uses the good prefixes, because it's easier to think about prefixes when the ratios between them are consistent.
Centimeters are pretty common. Rulers are marked with cm and mm. We use mostly m and cm inside houses. A door width? 90 cm. A door height 2 meters and 10.
And don't forget cubic centimeters (or cc) for engines. Anything from 2000 cc down to 50 cc is very common east of the Atlantic Ocean.
Them why is t my hard drive four million kilobytes? We already crossed this bridge with gigabytes, and the course was laid in with terabytes, which I first heard in a business setting over 20 years ago, when that was still a rack of hard disks. Cloud providers and the LHC brought us petabytes, and that was over ten years ago. The “oh by the way the next unit is ‘exabyte’” conversations started around the same time.
> I’ve been using Mm too, but only when I have a reasonable expectation that it will be understood. IOW, rarely.
Yeah, I mainly only use it in chats (where I can immediately explain it if needed) or in person (where the actual pronunciation of "megametres" makes it pretty obvious). "Mm" and "Gm" being somewhat difficult to google due to "mm" and "gm" being different SI units makes it untenable to just expect people to work it out on their own.
> But stating the Sun–Earth distance as 1 AU is just a tautology, restating the definition, and hence void of information. I suspect you’re jesting. ;-)
To be clear, I don't mean that you'd answer "1 AU" if someone asked you how far away the sun was - obviously that's completely worthless information! The discussion was just on what units and prefixes are used for various magnitudes of measurement, and the AU is a fantastic unit to use for things that would otherwise be measured in hundreds of gigametres. For example, if you had a table containing the distances between various celestial bodies, it would be very convenient to have "Sun <-> Earth: 1 AU" in there along with things like "Sun <-> Mars: 1.5 AU".
> If someone was to ask me how far is was from sun to earth, and I answered 1AU, and they asked what an AU was?
You'd probably do the same thing that you would do if they asked what a km was: keep changing the units until you get one that they recognise, and then now they can understand the original unit you used.
(Although in this specific case given that they're asking about the definition of the unit you'd probably provide that context without needing further prompting in the first place, similar to if they asked about the circumference of the earth and you gave an answer in metres)
Not a joke, the AU is a pretty common unit of measurement when talking about distances too large for metres and too small for light-years, especially since it's based on one of the most widely understood astronomical distances. Of course using it to describe the distance it's derived from is more than a little tautological, but that wasn't too important in the context of discussing magnitude prefixes.
It's roughly the same thing as describing a distant potentially-habitable planet as being "1.5 times the size of Earth" instead of giving its radius in metres. It's just that AUs wrap that "n times the distance between the sun and the earth" into a nice standardised unit.
One of my favourite SI units that is used in climate science is the Sverdrup, often used to describe oceanic or river flows, which is a million cubic metres per second, or one cubic hectometer per second. (1)
The Sverdrup unit is tolerated because Harald Sverdrup made many key contributions to the science of Oceanography.
He was also a high-level administrator who had a super-power that few administrators have: he understood science as well as people.
And, to top it all off, the book he coauthored with Johnson and Fleming is widely regarded as the foundational document in the field of modern Oceanography.
There is so much to say about Harald Sverdrup that students quickly become comfortable with the occasional use of "Sv" as an abbreviation for 10^6m^3/s.
Oops, you’re right. I think I accidentally squared 1000 instead of cubing it when converting between km³ and m³. Which only goes to show why units can be hard. In the interest of using coherent units, I wish there were a named unit equalling one cubic metre, all the better for attaching prefixes to.
people deal with social media images and text more often than buying hard drives -- and that's (generally) handled in kilobytes.
my take : it's because numbers with lots of digits sell clicks easier -- up until scientific notation is needed, and then at that point the general readership can't fathom the number and generally doesn't care.
in other words, I bet 8,000,000 terabytes produces more clicks than '8 exabytes' and '6.4 x 10^19 bits' would have.
If I had been born a new-age journalist i'd have gone with 8,000,000,000,000 megabytes -- but only if that fit in the link/URL slug and headline for maximum click-bait exposure.
Is 1TB/person really that common? I mean, my iPhone has 64GB of storage, mostly used for storing photos, which I might want to backup. Other than that, what else do I put on iCloud?
I can't tell why this is on the front page of HN. Is the implication that Google would spy on Apple customer's data stored on Google Cloud? Google is evil/dumb/both at times but that's completely outlandish.
I do find it interesting that Apple could be one of Google's biggest Cloud clients though, that's very surprising.
Apple has built significant brand trust and brand loyalty on the premise that Apple’s business model is built around devices and services and that your data is not something that Apple is not interested in exploiting/monetizing; Google has a reputation for an antithetical business model in terms of user data.
Google and Amazon recoup their investment in processes, tools, and cultivating the institutional knowledge to run massive data center infrastructure by marketing cloud services. Maybe Apple calculated that they would need to do the same to “level up” their data center infra but decided that doesn’t align with their business strategy.
8TB hard drive is about 200 dollars nowadays, so this amount of data is about 200m dollars worthy of retail hard drives.
Ofc the calculation is super inaccurate, which doesn't take redundancies into consideration, and discounted prices for someone like Google to purchase hardware, plus the discount for Apple as a big customer. But had the scale be comparable, in which case, Apple is paying Google billions per year to handle the data storage for itself, doesn't sound quite news worthy IMO, pretty price efficient even.
The article claims that Apple is on track to spend around $300 million on GCP storage services this year. With redundancies, server costs, and DC costs, the upfront price could easily touch 3 to 6 billion, or 10 to 20 years of GCP storage costs.
> With redundancies, server costs, and DC costs, the upfront price could easily touch 3 to 6 billion, or 10 to 20 years of GCP storage costs.
Surely it didn't cost GOOG $6B upfront worth of infrastructure to get $300M of annual revenue from Apple. Yes, there's economies of scale, but I think you overstate the case.
Short answer: The cost of just the hard drives to replace an $89MM S3 storage bill was about $45 million dollars. That's not including bandwidth, racks, datacenter space, administration etc etc.
The secret sauce here is the opportunity cost of running one's own storage at that scale.
At that scale its less about sourcing the hardrives (yes its a problem, but now youre a big customer so you can ask for specialised things, and a 50% discount)
The big problem is the hosting, finding and building a number of datacentres local to where your customers are. Powering them, and connecting in a reliable way.
But even more, you need to make a storage interface that doesn't suck. I am currently working with a homegrown S3 "like" interface. It is a massive pain in the arse as it doesn't scale[1], has an entirely new set of words to describe standard things (trying to overwrite a file? it doesn't tell you that, it just says "predicate failed")
[1] large files transfer very fast, faster than s3. However all the tools are written single threaded. Add to this that each operation takes at least .7-1.5 seconds, shit gets slow very quickly. There is little to no documentation, the API is odd.
in short, much as it annoys me to say this: for general purpose storage, buy over build.
Is that stuff even stored on spinning rust these days? I mean solid state might be more expensive upfront but I imagine at scale the energy consumption required to power and cool millions of mechanical devices becomes a big issue.
Not to mention the random access performance of spiny disks ain’t too great either.
>8TB hard drive is about 200 dollars nowadays, so this amount of data is about 1.6B dollars worthy of retail hard drives.
External hard drives can be obtained on sale for $15-16/TB. They can then be shucked to get internal drives. I suspect bulk buyers can get pricing equal or cheaper than this, rather than the retail of $25/TB
Edit: I googled it an apparently it’s an economy of scale thing. People are more likely to buy external hard drives, much more mass market thing, Costco and Walmart sell externals, therefore downward pressure on prices compared to internal drives.
Do external drives even have performance/longevity specs? The specs I see on them are pretty basic (eg. capacity, interface, operating temperature), and considering that they're marketed to consumers that's not surprising. As for actual performance/longevity, my understanding is that they're just whatever drives the manufacturer has available. For low capacities they're going to be consumer drives and for higher capacities they'll usually be NAS/datacenter drives.
At the scale these data centers operate at, I would imagine that google / FB / AWS gets these storage devices custom built to whatever spec and tolerances they need.
Apple finally realized they weren’t very good at hosting stuff. They never were.
Plus, it’s a no brainer leveraging Google and AWS. Their global footprint and expertise alone is worth it. Also, $300 million a year is a drop in the bucket to Apple’s bottom line. They probably make that alone from the millions of adapters they sell each year.
It's not just $300m. It's $300 for storage. I'm sure they spend more on the GCP platform.
They also use AWS, which is over $350m a year. That contract ends in a year or two.
Afaik, they don't run on azure anymore.
It's not hard to believe they're spending $1B+ on external cloud partners.
At $57B net revenue ($274B), a high margin space such as the cloud services could make sense. Although if they're not selling it to the public it would only account for 1-2% net revenue.
Apple likes to learn from companies.. Soon there will be an Apple cloud using low energy ARM (M1X) servers. Let's call it the "Digital Apple Tree". Maybe a small reference to the DAT recorder, as Apple likes music. Of course this last part is just made up ;-)
Everyone is acting like that $300MM expense is being paid by Apple.
I'm paying Apple $120/year for 2TB of iCloud storage. I might not be doing the math right but at $300 million/year for 8 million TB, that's $75/year in google expense for the $120 I'm paying Apple.
The average user with 2TB of quota probably isn't using anywhere near that much. It is more likely to be closer to the previous smallest level of quota.
I'm amazed any HN readers could believe Google would want to know what is stored if they could actually read it. That is completely useless to them. What is interesting and useful is who is connecting from where, when, with what. Yet again it feels like Snowden is forgotten. Metadata is what matters in high volume data, not content.
As far as I know they don't but for arguments sake let's say that they do. A lot of smart people have worked on Gmail so any scanning of mails would be for useful metadata or if they scan all the text it would be to provide services from context (find something with a date, ask if it should put it in your calendar, etc.). Almost all emails are worthless information and I doubt even the NSA with 100 times the budget it has today could learn anything worthwhile from scanning everything even if everything is only "everything at gmail". At best they would end up with data worse than they get from metadata and more likely the need for thousands of people looking at mails that match wordlists (hate USA, bombmaking, presidents route...) and not learn anything useful at all.
It would be interesting to know how that data breaks down -- like photos vs videos vs software backups vs saved movies, etc. I assume that most people stream movies, TV shows, and music these days, and that few people save them to disk. And even for those that do, Apple could deduplicate (i.e., store exactly one copy of Men in Black even if 100,000 people saved it). Software installs are easily deduplicated as well.
So that leaves personal photos and videos. Apple has about 1 billion active users, so a back of the napkin calculation suggests that each user is storing 8GB of data. That actually sounds quite low. That's 1600 5MB photos without even talking about the crazy storage that personal videos would use.
The article talks about how much Apple stores on Google's servers, but doesn't say how much they store on their own servers or with other third parties. Maybe they have multiple times as much storage on their own servers.
iMessage is probably secretly insane: A lot of people's iClouds are stuffed with it, so it probably has random not-worth-saving content going back several years that people don't know how to clean out.
1 million 8TB hard drives. you can probably put at least 100 of them in the single rack, which means 10.000 racks, put in 100x100 grid that means a single 300mx300m factory hall. Treat that as an upper bound, probably it is possible to package the drives more tightly.
Note: Apple charges $1/month for 50 GB of storage. That's ~84% margin Apple is making on their cloud storage service. Though granted they do give everyone 5 GB of storage for free, so the math will be lower (and they will still be generating a profit so long as 1 out of 5 are paying for iCloud storage).
I don’t see why this is news, given that it’s been common knowledge to anyone who has things like Little Snitch running and tracing your network requests to all sorts of online services.
On my phone I keep all my personal stuff in git with the remote set to my desktop (it has a public IP) over ssh. My photo stream is just rsynced occasionally over ssh.
Not having to deal with this is arguably even more convenient provided your OS vendor doesn't intentionally get in the way.
Non-tech people aren't going to ever be able to set something like that up on their own and keep it working. I don't even want so mess with stuff like that. I do that all day at work, I neither have the time or motivation to worry about messing with more tech after work.
Also, I would suggest not port forwarding ssh directly to an internal machine. Use a VPN and then ssh.
What does Apple sell wrt data storage that Google doesn't? The integration into apple's product line? Seems a bit mealy mouthed to me. Both companies sell storage to consumers.
I disagree - the average person is not capable of securely managing their own data, which is why they pick a consumer brand like Apple to do it for them - providing an option to do so would likely result in significant exposure of personal data at significant scale from botched DIY attempts.
To test this theory, simply ask an average iPhone or Android user what encryption is.
What you are saying here is the equivalent to me buying a plane ticket, but also having the option to fly the plane as well.
If you want to manage your own data - use something like a PinePhone - the barrier to entry is high enough that people who are capable of managing their own data securely can use the device and achieve the data sovereignty outcome they require.
That said, I believe Apple should build and manage their own infrastructure. They have the resources and capabilities to do so. The longer Apple stick’s with GCP the larger the inertia of moving away and the less leverage they have when it comes to negotiating to maintain their high standard of privacy commitments.
Most normal usera may not be able to secure things effectively and they can still use the existing iCloud / Google Play infrastructure. That doesn't mean that other users, who want to manage their own data for one reason or another, shouldn't get the opportunity.
It's far more likely that building and maintaining this feature is not worth the development time for Apple and Google product teams at this point, since the possible market is small
As far as I know, you can manage your own data with Apple devices if you want to.
By that I mean, storage in iCloud is optional. You can back up Macs locally pretty easily with Time Machine and you can back up iPhones to Macs locally as well (which then gets backed up via Time Machine). You can encrypt Time Machine and iPhone backups too if you want.
Is it as easy as using iCloud for backups, photo syncing, etc.? No, but that’s because iCloud is not just dumb storage, it’s a hosted software application (several, really).
The files API on ios is open for anyone to use. You can run your own nextcloud server on your own hardware and it integrates in to the OS just like google drive or a plugged in USB would.
You can’t do backup or sync the Photos.app to a random cloud. The “integration” here is just that occasionally you can save to and load from your server. That’s all.
I don't think you can do the full device backup on anything but icloud and I agree that should change but your own storage has the same access as google does on ios. You can have your own photos app which automatically uploads everything in the photo roll and can delete photos to keep them in sync with your cloud.
The conversation is about cloud though. Local backups are nice but that means connecting the phone daily to my turned-on computer instead of just plugging it into the wall every night.
Yes there are plenty of photos apps but they’re no Apple Photos and they can’t upload in the background unless you open them regularly. Unfortunately they’re not as convenient. Chances are that they’re one-way syncs too, like Google Photos.
iOS is opening up to 3rd party services quite a bit in recent releases. The files app and allowing setting default apps are nice improvements which have come recently. As well as allowing 3rd parties in the find my network.
I think we could see full device backups to other cloud services later. I guess the problem is they would want to provide a specific api for 3rd parties to implement rather than trying to drop a 256GB zip file on google drive with no differential backups.
Backups are so personal, complex, and niche that I doubt we’ll ever see Apple opening that up to third parties. Ever. It just won’t happen. Normal people don’t care about backups and Apple makes money through iCloud backups.
iCloud is a tool that lets my parents just that. This means I don't have to do it for them and no one has to worry about losing years of photographs due to a device failure.
edit: I guess my point is that different people have different computer skills and I think it's great that the tools now exist so people on the lower end of that spectrum can sort this out themselves without having to ask for help.
I've moved everything to the cloud. I have nothing running at home except networking gear, and a small "server" that pulls nightly backups from the clouds to a local USB drive.
In theory i could probably do without the local server if looking at Apple/Microsoft/Google data redundancy (Microsoft is multi geo, i can't figure out what Apple is).
Sadly i need to guarantee that some random account closure doesn't remove all my data, so the backup server stays for now. The way cloud prices are going, it will only be a question of months/years before it's cheaper/easier to just utilize two cloud services, one for main storage and one for backup storage, and with projects like the data transfer project [1], you don't even need to download them first.
Managing 8 exabytes reliably is a massive pain in the arse. with no redundancy, and nothing but hardrives, 8 exabytes burns around a megawatt.
Its a million drives. given the failure rate that I had when I was looking after 5 pb,(about 8k hdds) you'd be looking at at least 100-400 failed drives a week
But how do you detect that, how do you schedule replacements? whats your hamming factor for redundancy, how do you optimise your storage? for speed, power, geospatial, or redundancy?
what about capacity management, the lead time on growth would be large.
I was a sys admin for a large VFX place, so did a lot of storage admin, but unless it was a core business function, I'd buy over build that shit, for that scale.
Flip side. You have no idea if google is actually making a profit on this workload. It might literally be cheaper for apple to use GCP for storage here than doing it themselves.
Either A) it’s cheaper economically or B) apple makes enough on its products where the loss is negligible compared to the work and complexity involved.