How Free Tools and Services Use Your Data and What to Watch For
Last updated: April 14, 2026 · Applies to: streaming sites, AI tools, free apps, browser extensions, online learning platforms
Contents
The Real Cost of "Free"
In 2025, Cisco's annual benchmark study found that 64% of people worried about inadvertently sharing sensitive information with generative AI tools — yet nearly half admitted to inputting personal or non-public data into them anyway. That gap between concern and behavior is not limited to AI. It applies to every free service we use: the streaming site where we watch movies, the music app running in the background, the browser extension checking our grammar, the online learning platform where we take courses, and the phone app we use to make free calls.
Every one of these services costs real money to build and operate. Servers, content licensing, engineering teams, and especially the computational resources behind AI models are expensive. When a service is offered for free, the money has to come from somewhere. Sometimes it comes from reasonable sources: OpenAlex is funded by a nonprofit and releases its entire scholarly dataset under a CC0 license. Zotero is open-source and stores your data locally. Ad-supported streaming services like Tubi and Pluto TV make money by showing you commercials, just like broadcast television always has. But other services extract far more value from your data than most users realize.
We regularly review and recommend free tools and services across our curated lists — covering everything from free streaming sites for movies and TV and music streaming services to AI-powered research tools and online learning platforms. This companion article explains how free tools use your data, what the real risks are, and how to protect yourself. All information below has been verified as of April 2026.
Six Ways Free Services Make Money from You
Not all monetization is the same. Some models are transparent and benign; others are opaque and extractive. Understanding the spectrum — and recognizing which model each service uses — helps you make informed decisions about what to trust with your data, your content, and your attention.
1. Freemium upsells. The core product is free with limitations; revenue comes from users who upgrade to paid tiers. This is the dominant model across virtually every category of free software. In AI research tools, Elicit offers free paper extractions with paid systematic reviews; Grammarly provides free grammar checking with paid AI writing at $12/month. In streaming, services like Spotify and YouTube Music offer ad-supported free tiers alongside premium subscriptions. Online learning platforms like Coursera and edX let you audit courses for free and charge for certificates. In all these cases, the free tier serves as a product demo: your actual content is not the revenue source — your potential willingness to pay is. Privacy risk: Low.
2. Advertising. The service shows ads, and revenue depends on user attention. This is the primary model for free streaming sites, free music services, many browser-based tools, and most free mobile apps. The privacy impact varies widely depending on how ads are targeted. Contextual ads — showing a cooking ad on a food video — are relatively benign. Behavioral targeting, where ads follow you based on your browsing history, viewing habits, location, and device information, is more invasive. Free streaming sites tend to rely heavily on this model: the content is the bait, and your viewing data is the product that gets sold to advertisers. Privacy risk: Low to moderate, but can be high if combined with aggressive third-party tracking.
3. Aggregated data and analytics. The company collects usage data across all users and sells aggregated, supposedly anonymized insights to third parties. A streaming service might sell data about viewing trends by region; a learning platform might sell engagement benchmarks to educational publishers; a productivity tool might sell anonymized usage patterns to enterprise clients. Individual identity is ostensibly removed, but research has repeatedly shown that re-identification is sometimes possible by combining supposedly anonymous datasets with other data sources. Privacy risk: Moderate.
4. AI model training. Your inputs — prompts, uploads, edits, interactions — are used to improve the company's AI models. The improved models are then sold via API access, enterprise licenses, or premium features. This is the model that raises the most concern for professionals, researchers, and anyone who creates original content. According to Incogni's 2026 AI privacy ranking, platforms like Meta AI and Google Gemini provide no clear mechanism for opting out of training, while Mistral AI, OpenAI (ChatGPT), and xAI (Grok) offer the strongest opt-out controls. This model does not apply only to AI chatbots: any service with AI features — a grammar checker, a music recommendation engine, a "smart" photo editor — may be feeding your interactions back into model training. Privacy risk: High.
5. Third-party data sharing or selling. The company shares or sells individual-level user data to data brokers, advertisers, or other companies. This is the most opaque model and the hardest for users to detect. It is often disclosed in privacy policies under broad language about "business partners" or "service providers." Incogni's 2026 analysis of 442 AI-powered Chrome extensions found that personally identifiable information — names, addresses, email addresses — is the most frequently collected data type across most extension categories. Free mobile apps are particularly prone to this: a flashlight app that requests access to your contacts and location is a classic example of a product that exists primarily to harvest data. Privacy risk: High.
6. Open-source and grant-funded. The tool is funded by academic grants, nonprofit donations, or community contributions with no commercial incentive to monetize user data. Zotero (funded partly by the Andrew W. Mellon Foundation), OpenAlex (built by the nonprofit OurResearch), Obsidian (free for personal use, revenue from optional sync/publish), and Semantic Scholar (funded by the Allen Institute for AI) all fall into this category. In the streaming and media world, the Internet Archive and Project Gutenberg are the closest equivalents. Privacy risk: Very low.
Most services use a combination of these models. A free music streaming app might be ad-supported (model 2), sell aggregated listening data to record labels (model 3), and offer a premium subscription (model 1) — all at the same time. A free AI writing assistant might use freemium upsells while also using your free-tier inputs to train its models. The question to always ask is: which combination applies to this specific service, and am I comfortable with it?
What Makes AI Tools Different
Traditional software — a PDF reader, a video player, a file converter — typically processes your data and discards it. AI-powered tools can operate fundamentally differently, which is why they deserve a separate discussion even in a guide about free services generally.
The training data problem. Large language models and other AI systems require enormous datasets to learn. When you use a free AI tool, your inputs may become part of that training pipeline: your prompts, uploaded documents, and generated content can be absorbed into the model's learned patterns. Once incorporated into model weights through training, your data cannot simply be "deleted" the way you delete a file from a server. It has been transformed into statistical patterns that influence how the model responds to future queries from anyone. This is not hypothetical: Incogni's researchers found that all investigated AI platforms collect user data from "publicly accessible sources," and most also process direct user interactions.
Free-tier vs. paid-tier: a two-tiered privacy system. There is often a stark difference in how the same tool handles data depending on your plan. OpenAI states that enterprise accounts are not used for model training, with data encrypted using AES-256 and protected by SOC 2 compliance. Free-tier users get none of these contractual guarantees. Google's NotebookLM states that it does not use uploaded data for model training — but on personal (non-Workspace) accounts, if you provide feedback, human reviewers may see your queries, uploads, and responses. Grammarly allows any user (free or paid) to opt out of product improvement and training in account settings, but the default is opt-in. This two-tiered pattern is not limited to AI: many streaming services offer an ad-free tier that also reduces data collection, and learning platforms often provide stronger data protections to institutional subscribers.
The opt-out landscape in 2026. Incogni's comprehensive AI privacy ranking, which evaluated nine major platforms across 11 weighted criteria, found wide disparities. Mistral AI (Le Chat) ranked as the least privacy-invasive, followed by ChatGPT and Grok. The most privacy-invasive were Meta AI, Google Gemini, and Microsoft Copilot — all products of the largest tech companies. Key findings: Gemini, DeepSeek, Pi AI, and Meta AI do not appear to let users opt out of prompt-based training; ChatGPT was the most transparent about training data use and had the clearest privacy policy; and mobile apps consistently collect more data than their desktop counterparts, including sensitive data like precise location and contact details.
Data Handling Comparison: 20 Popular AI Research Tools
To make the question of how free tools use your data concrete, the following table maps the data handling practices of the 20 tools reviewed in our AI research tools guide. It covers where your data is processed, whether it is used for AI training, and whether you can opt out. These are the same questions worth asking about any free service you use, whether it is a streaming site, a learning platform, or a mobile app.
| Tool | Data processing | Used for AI training? | Opt-out available? | Primary monetization |
|---|---|---|---|---|
| Semantic Scholar | Cloud (Allen Institute) | No (search-only, no user uploads) | n/a | Grant-funded (nonprofit) |
| Consensus | Cloud | Not stated for queries | Not specified | Freemium |
| Elicit | Cloud | Not stated for queries | Not specified | Freemium |
| Connected Papers | Cloud (no uploads) | No (no user data ingested) | n/a | Freemium |
| ResearchRabbit | Cloud | Not stated | Not specified | Committed free; premium upcoming |
| OpenAlex | Open API / local download | No (open data, CC0 license) | n/a | Nonprofit (OurResearch) |
| NotebookLM | Cloud (Google) | No (per Google's stated policy) | n/a — but feedback on personal accounts may be reviewed by humans | Freemium (Google Workspace) |
| Obsidian | Local-first (your device) | No | n/a | Optional paid sync/publish |
| Notion AI | Cloud (AWS) | Not for customer content (per policy) | n/a | Freemium + AI add-on |
| Grammarly | Cloud (AWS US) | Yes, by default (de-identified samples) | Yes — toggle off "Product Improvement and Training" in settings | Freemium (subscriptions) |
| QuillBot | Cloud | Not clearly stated | Not specified | Freemium |
| Paperpal | Cloud | Not clearly stated | Not specified | Freemium |
| Jenni AI | Cloud | Not clearly stated | Not specified | Freemium |
| Zotero | Local-first (optional cloud sync) | No | n/a | Open-source (grants + paid storage) |
| Mendeley | Cloud (Elsevier) | Not stated for personal library content | Not specified | Freemium (Elsevier ecosystem) |
| Scite.ai | Cloud | Not stated for user queries | Not specified | Freemium |
| Julius AI | Cloud (sandboxed per user) | Not stated; data deleted on user action | Data deletion available | Freemium |
| Napkin.ai | Cloud | Not clearly stated | Not specified | Freemium |
| BioRender | Cloud | Not stated | Not specified | Freemium |
| MindTheGraph | Cloud | Not stated | Not specified | Freemium |
Key takeaway: Only a handful of tools in this list are fully local-first (Obsidian, Zotero) or fully open/no-upload (Semantic Scholar, OpenAlex, Connected Papers). These present essentially zero data privacy risk. NotebookLM and Grammarly deserve credit for clear, specific privacy disclosures — even though their defaults differ. The majority fall into a gray zone where training-data policies are either vague or not publicly documented, which itself is a data point worth noting.
How to Read a Privacy Policy in 5 Minutes
Most people do not read privacy policies, and for good reason — they are long, dense, and written in legal language designed to protect the company, not inform the user. Incogni's research specifically called out Microsoft, Meta, and Google for privacy documents that attempt to cover all products under a single umbrella, making it difficult to understand how any specific service handles your data. But you do not need to read every word. The following five sections contain the information that matters most, and learning to scan for them takes only a few minutes.
| Section to find | What to look for | Example of a good practice |
|---|---|---|
| "What we collect" | Scope of collection. Does the service collect only what it needs (account email, basic usage metrics), or everything it can (precise location, contact lists, device identifiers, content of your documents or viewing history)? | Grammarly explicitly lists what it collects (username, email, language preferences) and what it excludes by design (passwords, payment fields, sensitive form data). |
| "How we use your data" | Watch for "improve our services," "develop new features," or "train our models" — common euphemisms for using your data beyond the immediate service. For streaming services, look for "personalization" language that may indicate behavioral profiling. | Google's NotebookLM states directly that it does not train models on uploaded Workspace user data. |
| "Sharing" / "Third parties" | Vague language like "trusted partners" or "affiliates" without specific names. Check whether data is shared within a corporate family — if the service is owned by a large conglomerate, your data may flow to products you never signed up for. | Grammarly states it restricts AI service providers from training their models on customer content, and explicitly says it does not sell user content. |
| "Data retention" | How long data is kept after you stop using the service. For AI tools, ask whether "deletion" means removal from training datasets or only from active storage. For streaming services, check whether viewing history is retained after account cancellation. | Julius AI states that data is completely erased from servers when you delete it in the app. Grammarly deletes data within 30 days of account deletion. |
| "Your rights" / "Your choices" | Can you download your data? Request deletion? Opt out of training, personalization, or targeted advertising? The presence of these controls — and how easy they are to find — tells you a lot about how seriously a company takes privacy. | Grammarly provides a downloadable Personal Data Report within hours of request and offers per-feature opt-out toggles in account settings. |
Shortcut: The community project Terms of Service; Didn't Read (ToS;DR) crowd-sources plain-language ratings of privacy policies and terms of service for hundreds of popular services. If reading the full policy feels overwhelming, their letter-grade summaries can give you a quick sense of where a service stands.
Red Flags vs. Reasonable Trade-Offs
Not every data practice is predatory. The key is distinguishing between trade-offs you can live with and practices that should make you look for alternatives.
Reasonable trade-offs include basic analytics to improve the product (crash reports, feature usage counts), anonymized and aggregated usage data for benchmarking, a freemium model with usage limits, contextual advertising based on the type of content rather than personal profiling, and an opt-in request to contribute anonymized data. This describes most of the services we review — from Elicit's limited free extractions and Consensus's daily search cap to ad-supported streaming sites where you watch a commercial break in exchange for free content. The key word is proportionality: the data collected should be proportional to the service provided.
Yellow flags (proceed with caution) include using free-tier inputs for AI model training with an opt-out mechanism that is not the default — Grammarly's approach is transparent but requires user action. Also: sharing data within a corporate family of products (relevant for services owned by Google, Meta, Elsevier, Amazon, or other conglomerates), retaining data for longer than 90 days after account deletion, requiring permissions that exceed what the service needs (a streaming app requesting microphone access, a grammar checker requesting your contacts), and vague language about "business partners" without naming them.
Red flags (strongly consider alternatives) include no opt-out for AI training data — Incogni identified Meta AI and Gemini as platforms where this is the case. Also: selling or sharing individual-level data with third-party data brokers, no mechanism for data deletion, a privacy policy that claims ownership of content you create, collecting precise location, biometrics, or keystroke patterns without clear justification, mobile apps that collect significantly more data than their web versions, and a history of changing privacy policies retroactively to expand data usage rights. Free streaming sites from unknown operators that require no login but embed dozens of tracking scripts are a particularly common example of this last pattern.
Your Rights Under the GDPR and the EU AI Act
Depending on where you live, you have legal rights over your data even when using free services.
If you are in the EU/EEA, the GDPR gives you the right to access your data and learn how it is used, the right to have it deleted, the right to object to profiling, and the right to data portability. These apply regardless of whether a service is free or paid, and any company offering services to EU residents must comply, even if headquartered elsewhere. The EU AI Act — the world's first comprehensive AI regulation — adds requirements for transparency, human oversight, and accountability in AI systems. High-risk AI system requirements became enforceable in August 2026, with general-purpose AI model obligations already in force.
Important caveat for 2026: The European Commission's "Digital Omnibus" proposals, introduced in late 2025, are seeking to modify both the GDPR and the AI Act. One contentious change would introduce AI training as a "legitimate interest" under the GDPR, potentially making it easier for companies to use personal data for model training without explicit consent. The European Data Protection Board has cautioned that these changes must not weaken fundamental rights. These proposals remain under legislative debate as of April 2026.
If you are in the United States, there is no federal equivalent to the GDPR, but California's CCPA/CPRA gives residents the right to know what data is collected, to delete it, and to opt out of its sale. Colorado's Algorithmic Accountability Law, effective February 2026, specifically addresses AI systems used in high-stakes decisions. Several other states have enacted varying levels of privacy protection.
Practical reality: Laws are only as effective as their enforcement. For everyday users, the most reliable protection is still being selective about which services you use and what data you share — rather than relying on after-the-fact legal remedies.
How to Protect Yourself: A Practical Checklist
You do not need to be a privacy expert to use free services responsibly. The following steps cover the most important bases, whether you are signing up for a streaming site, installing a browser extension, or trying an AI writing tool.
Before signing up: Search for "[service name] privacy" or "[service name] data collection" to surface any analyses or controversies. For AI tools, check our comparison table above. For streaming sites, check whether the service is legitimate and licensed — unlicensed sites often have the most aggressive tracking. Look for opt-out mechanisms before you start using the service — it is easier to configure settings on a fresh account than to retroactively undo data collection.
Account setup: Use a dedicated email address for free service sign-ups rather than your primary personal or institutional email. Avoid signing in with Google, Facebook, or Apple when possible — social login links your activity across platforms and grants the service access to profile information. Disable optional data sharing, analytics participation, and training data contributions in settings. For Grammarly specifically, turn off "Product Improvement and Training" in account settings. For streaming and music apps, review and disable "personalized ads" or "ad personalization" toggles. Enable two-factor authentication wherever available.
During use: Do not upload sensitive, proprietary, or pre-publication data to free-tier cloud tools unless you have verified their data handling. Use "Temporary Chat" or "Privacy Mode" features when available — ChatGPT and Grok both offer modes where conversations are not saved or used for training. For streaming services, consider using a browser with built-in tracker blocking. Review and clear old conversations, uploads, and viewing histories periodically. Be cautious with browser extensions: Incogni found that programming assistants posed the highest privacy risk among AI Chrome extensions, while audiovisual generators posed the lowest.
When you stop using a service: Request data deletion through settings or support — do not just stop logging in, as many services retain your data indefinitely unless you explicitly request removal. Download any data you want to keep first. Grammarly deletes data within 30 days of account deletion; other services may take longer or never delete automatically.
For sensitive work, use local-first tools. For researchers and professionals working with pre-publication results, patient data, proprietary formulations, or any information that absolutely must stay private, we recommend tools that keep your data on your own hardware: Semantic Scholar for discovery (no login required, no data uploaded), Obsidian for notes (all files stored locally), and Zotero for citations (open-source, local-first). For a broader privacy perspective, see our guide to staying anonymous online with antidetect browsers.
Special Considerations for Researchers and Students
If you are using free tools in an academic or scientific context, the privacy stakes are higher than for casual personal use.
Unpublished research data. Uploading draft manuscripts, experimental data, or grant proposals to a free AI tool that uses inputs for model training means fragments of your pre-publication work could theoretically appear in the model's outputs. The probability of a direct, attributable leak is low, but for competitive fields where priority matters, even a small risk may be unacceptable. NotebookLM's approach of grounding responses only in uploaded documents is helpful, but on personal accounts, human reviewers may see your content if you provide feedback. For the strongest guarantees, use the Google Workspace version or choose a fully local tool.
Institutional policies. Many universities and research institutions have developed AI usage policies. The University of Pittsburgh, for example, advises that any information not already public should not be entered into a free generative AI platform. Check with your institution's IT security or research compliance office before uploading sensitive data. Some institutions have negotiated enterprise licenses with stronger privacy guarantees than public free tiers.
Ethics board considerations. If your research involves human subjects data, uploading it to a third-party AI tool may violate your IRB or ethics committee approval. Most informed consent forms do not cover having participant data processed by external AI systems. The consequences — retraction, loss of funding, institutional sanctions — are severe enough that the cautious approach is always justified.
A balanced perspective. None of this means you should avoid free tools entirely. The tools we review across our curated lists range from fully open and local (Zotero, Obsidian, OpenAlex, Project Gutenberg) to cloud-based with clear privacy commitments (NotebookLM, Grammarly with opt-out, legitimate ad-supported streaming). The key is matching the sensitivity of your data to the privacy profile of the service. Use cloud-based services for work and content that is already public or non-sensitive, and local-first tools for everything else.
This guide will be updated periodically as privacy policies, regulations, and service features change. All information was verified as of April 2026. If you spot an outdated claim or have a suggestion, feel free to let us know.
Frequently Asked Questions
Do free streaming sites sell my data to advertisers?
Ad-supported streaming services generate revenue by showing you ads, and many use your viewing habits, location, and device information to target those ads. Legitimate free services like Tubi, Pluto TV, and Crackle are transparent about this in their privacy policies. The risk increases with lesser-known or unlicensed sites, which may embed aggressive tracking scripts or share data with third-party brokers without clear disclosure.
Do free AI tools like ChatGPT use my inputs to train their models?
It depends on the tool and the plan. Many free-tier AI tools use your inputs for model training by default. ChatGPT allows you to opt out in settings or use a Temporary Chat mode. Incogni's 2026 ranking found that Mistral AI, ChatGPT, and Grok offer the strongest opt-out controls, while Meta AI and Gemini provide no clear opt-out mechanism. Enterprise and paid plans typically guarantee your data is not used for training.
Are free browser extensions safe to use?
It varies enormously. Incogni's 2026 analysis of 442 AI-powered Chrome extensions found that personally identifiable information is the most frequently collected data type across most categories. Programming assistants posed the highest privacy risk, while audiovisual generators posed the lowest. Before installing any extension, check what permissions it requests and what data the developer declares collecting in the Chrome Web Store listing.
What should I look for in a privacy policy before using a free service?
Focus on five areas: what data is collected, whether your content or inputs are used for AI training or product improvement, whether data is shared with or sold to third parties, how long data is retained after you stop using the service, and whether you can request deletion. Also check whether free and paid tiers differ in data handling — the difference is often significant.
Does the GDPR protect me when I use free tools and services?
If you are in the EU or EEA, the GDPR gives you rights regardless of whether a service is free or paid, including the right to access your data, have it deleted, and object to profiling. The EU AI Act adds transparency requirements for AI systems. However, enforcement varies, and many services bury compliance mechanisms in complex settings. Practical prevention — being selective about what you share — remains your most reliable protection.
Is it safe to upload unpublished research to free AI tools?
Exercise caution. Many free AI tools process uploads through cloud-based models, and some may use those inputs for training. NotebookLM states it does not use uploads for training, but on personal accounts, human reviewers may see your content if you provide feedback. For sensitive or pre-publication data, prioritize offline-first tools like Obsidian and Zotero, or paid enterprise plans with contractual no-training guarantees.
What does it mean when a service says it collects "anonymized" or "aggregated" data?
Anonymized data has had personally identifiable information removed. Aggregated data combines information from many users into statistical summaries. In theory, both protect your identity. In practice, research has shown that supposedly anonymized datasets can sometimes be re-identified by combining them with other sources. Grammarly's approach of offering an explicit opt-out alongside its anonymization claims is more trustworthy than anonymization claims alone.
Can I delete data that a free tool has already used to train its AI model?
This is one of the hardest challenges in AI privacy. While the GDPR grants a right to deletion, removing specific data from a trained AI model is technically complex — information is transformed into model weights that cannot be simply extracted. Some companies retrain models without deleted data, but this is expensive and not universally practiced. The EU's current proposals would require removal only if it does not require "disproportionate efforts" — a vaguely defined term that privacy advocates have criticized.