Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hidden renderer for fetching backup search results #27285

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

DJAndries
Copy link
Collaborator

@DJAndries DJAndries commented Jan 21, 2025

Resolves brave/brave-browser#43399

Security review: https://github.com/brave/reviews/issues/1843

Submitter Checklist:

  • I confirm that no security/privacy review is needed and no other type of reviews are needed, or that I have requested them
  • There is a ticket for my issue
  • Used Github auto-closing keywords in the PR description above
  • Wrote a good PR/commit description
  • Squashed any review feedback or "fixup" commits before merge, so that history is a record of what happened in the repo, not your PR
  • Added appropriate labels (QA/Yes or QA/No; release-notes/include or release-notes/exclude; OS/...) to the associated issue
  • Checked the PR locally:
    • npm run test -- brave_browser_tests, npm run test -- brave_unit_tests wiki
    • npm run presubmit wiki, npm run gn_check, npm run tslint
  • Ran git rebase master (if needed)

Reviewer Checklist:

  • A security review is not needed, or a link to one is included in the PR description
  • New files have MPL-2.0 license header
  • Adequate test coverage exists to prevent regressions
  • Major classes, functions and non-trivial code blocks are well-commented
  • Changes in component dependencies are properly reflected in gn
  • Code follows the style guide
  • Test plan is specified in PR before merging

After-merge Checklist:

Test Plan:

@diracdeltas diracdeltas requested a review from bridiver January 21, 2025 17:07
@DJAndries DJAndries requested a review from a team as a code owner January 22, 2025 03:15
@DJAndries DJAndries force-pushed the gmix-fix branch 3 times, most recently from 0566421 to f112abc Compare January 22, 2025 18:12
components/brave_search/browser/backup_results_service.cc Outdated Show resolved Hide resolved
browser/brave_search/backup_results_service_impl.cc Outdated Show resolved Hide resolved
pending_request->timeout_timer.Stop();
pending_request->shared_url_loader_factory =
pending_request->otr_profile->GetDefaultStoragePartition()
->GetURLLoaderFactoryForBrowserProcess();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we really need to create OTR profile for each new request? Creating This seems a bit overkill, because even if nothing is stored on disk, the CPU usage can still be high. A lot of KeyedServices are created for each profile.
  2. Do we really need to use OTR profile for these SimpleURLLoader requests? Can we use use a single OTR profile for cookie fetch and then use the cookie in the normal profile?

Copy link
Collaborator Author

@DJAndries DJAndries Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to create OTR profile for each new request? Creating This seems a bit overkill, because even if nothing is stored on disk, the CPU usage can still be high. A lot of KeyedServices are created for each profile.

Creating a fresh OTR profile makes it easy to ensure that state is not shared between individual requests, which is a desirable property. Are there a lot of KeyedServices that are created specifically for an OTR profile (not just redirected to the service for the original profile)?

Do we really need to use OTR profile for these SimpleURLLoader requests? Can we use use a single OTR profile for cookie fetch and then use the cookie in the normal profile?

We need the cookie state for subsequent requests. I'm not sure how we would use those cookies when making a request with the normal profile, unless we stored them in the normal profile CookieManager or we override the cookie header. Using an OTR profile for the SimpleURLLoader seems more convenient in this respect

@iefremov
Copy link
Contributor

iefremov commented Jan 23, 2025

To summarize, I'd say we can continue with this approach, given that for now this machinery has limited usage (google search results) and the number of potentially created OTR profiles / renderers is estimates as low. Also having in mind the high priority of the problem. So once the lifetime issues found for now are solved, I'm ok to merge.

In the follow-ups we have to investigate the possibility of

  • not creating extra OTR profiles (just using storage partitions instead?),
  • simpler sandboxing to potentially avoid crafting the throttle code
  • reusing of spawned renderers (i.e. have at maximum one hidden renderer)

Copy link
Contributor

[puLL-Merge] - brave/brave-core@27285

Description

This PR adds a new backup results service to Brave Search, allowing it to fetch search results from a backup provider (Google) for use in fallback mixing and Web Discovery Project. The service handles cookie management and rendering through isolated OTR (Off The Record) profiles.

Possible Issues

  1. Complex state management with multiple OTR profiles could lead to memory leaks if profiles aren't properly cleaned up
  2. Potential race conditions in handling multiple concurrent backup result requests

Security Hotspots

  1. URL validation in IsBackupResultURLAllowed is critical to prevent SSRF attacks - currently properly restricted to specific Google domains
  2. Cookie handling needs careful review since it's dealing with externally provided cookie headers
Changes

Changes

By filename:

browser/brave_search/:

  • Added new BackupResultsService implementation for fetching search results from backup providers
  • Implemented navigation throttle to control web contents navigation
  • Added service factory for proper profile management
  • Added comprehensive browser tests for the new functionality

components/brave_search/browser/:

  • Added URL allowlist validation
  • Implemented core backup results service interfaces
  • Added mojo interfaces for browser-renderer communication

browser/extensions/:

  • Added new Web Discovery API for extension integration
  • Implemented backup results retrieval functionality

chromium_src/chrome/browser/profiles/:

  • Added support for search backup results OTR profiles
sequenceDiagram
    participant E as Extension
    participant BS as BackupResultsService
    participant OTR as OTR Profile
    participant WC as WebContents
    participant BP as Backup Provider

    E->>BS: FetchBackupResults(url)
    BS->>OTR: Create Profile
    BS->>WC: Create WebContents
    WC->>BP: Initial Request
    BP-->>WC: Response with Redirect
    WC->>BP: Follow Redirect
    BP-->>WC: Final Results
    WC-->>BS: Extract HTML
    BS-->>E: Return Results
    BS->>OTR: Cleanup Profile
Loading

request->timeout_timer.Start(
FROM_HERE, kTimeout,
base::BindOnce(&BackupResultsServiceImpl::CleanupAndDispatchResult,
base::Unretained(this), request, std::nullopt));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:

- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated


Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml


Cc @thypon @goodov @iefremov

pending_request->simple_url_loader->DownloadToString(
pending_request->shared_url_loader_factory.get(),
base::BindOnce(&BackupResultsServiceImpl::HandleURLLoaderResponse,
base::Unretained(this), pending_request),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:

- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated


Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml


Cc @thypon @goodov @iefremov

@ShivanKaul
Copy link
Collaborator

not creating extra OTR profiles (just using storage partitions instead?),

That sounds like a great approach to me.

request->timeout_timer.Start(
FROM_HERE, kTimeout,
base::BindOnce(&BackupResultsServiceImpl::CleanupAndDispatchResult,
base::Unretained(this), request, std::nullopt));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:

- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated


Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml


Cc @thypon @goodov @iefremov

pending_request->simple_url_loader->DownloadToString(
pending_request->shared_url_loader_factory.get(),
base::BindOnce(&BackupResultsServiceImpl::HandleURLLoaderResponse,
base::Unretained(this), pending_request),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:

- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated


Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml


Cc @thypon @goodov @iefremov

}

std::string potential_tld = base::JoinString(
std::vector<std::string>(google_it + 1, host_parts.end()), ".");
Copy link
Member

@goodov goodov Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @fmarier.

I think this all can be replaced with a call to GetDomainAndRegistry and splitting the output on the first . symbol. The part before the dot is either google or not, the rest is either an acceptable domain or not.

doing all these joins and splits should not be necessary here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add hidden renderer for fetching backup search results
8 participants