Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix domain_is_in? for deeply nested subdomain #270

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions lib/valid_email2/address.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,12 @@ def self.permitted_multibyte_characters_regex=(val)
@permitted_multibyte_characters_regex = val
end

def initialize(address, dns_timeout = 5, dns_nameserver = nil)
def initialize(address, dns_timeout = 5, dns_nameserver = nil, domain_hierarchy_max_depth: 3)
@parse_error = false
@raw_address = address
@dns_timeout = dns_timeout
@dns_nameserver = dns_nameserver
@domain_hierarchy_max_depth = domain_hierarchy_max_depth

begin
@address = Mail::Address.new(address)
Expand Down Expand Up @@ -119,10 +120,16 @@ def domain_is_in?(domain_list)
address_domain = address.domain.downcase
return true if domain_list.include?(address_domain)

i = address_domain.index('.')
return false unless i
Copy link
Collaborator

@phoet phoet Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmhmmm, i guess it has always been truethy? i mean domains typically contain a dot. 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@micke was it on purpose, that in any case, the TLD so com from example.com was checked against the list? i guess we can speed up the lib significantly by getting rid of this.

Copy link
Contributor Author

@cm-dyoshikawa cm-dyoshikawa Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me as well, the TLD check seems unnecessary.

Shall I remove the TLD check while we have this opportunity? I think it's possible.

Copy link
Contributor Author

@cm-dyoshikawa cm-dyoshikawa Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation might look something like this:

      domain_hierarchy_level = 0
      while (i = address_domain.index('.')) && domain_hierarchy_level < @domain_hierarchy_max_depth
        address_domain = address_domain[(i + 1)..-1]
        # Skip checking TLD (if there's no more dot, it's considered as TLD)
        break unless address_domain.include?('.')
        return true if domain_list.include?(address_domain)
        domain_hierarchy_level += 1
      end

This implementation includes this discussion.

Copy link
Collaborator

@phoet phoet Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have not tested this thoroughly, but maybe you get the basic idea here and can finish it off

    def domain_is_in?(domain_list, max_depth = 3)
      address_domain = address.domain.downcase
      return true if domain_list.include?(address_domain)

      tokens = address_domain.split('.')
      return false if tokens.length < 3

      limited_sub_domain_part = tokens.reverse.first(max_depth).reverse.join('.')

      domain_list.include?(limited_sub_domain_part)
    end

there is no iteration over parts, so this is faster and we get rid of the while.

Copy link
Contributor Author

@cm-dyoshikawa cm-dyoshikawa Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so there are such cases. If that's the case, the current implementation that allows users to select max_depth optionally doesn't seem like a bad idea (Default is 3 ).

I'll wait for your judgment regarding the current implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've only recently joined as a maintainer, so i'd wait a bit for @micke to add some context information before changing any of the semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll also wait for @micke . Thanks.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was introduced in #137, i honestly don't know why it's there 😅 This sounds good to me!

Copy link
Contributor Author

@cm-dyoshikawa cm-dyoshikawa Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@micke Thank you for your comment. I guess #137 overlooked deeply nested subdomains case.
So, again, I believe this PR will improve the disposable_domain search function.

Could you please review this again, thanks.

tokens = address_domain.split('.')
return false if tokens.length < 3

domain_list.include?(address_domain[(i + 1)..-1])
max_depth = [@domain_hierarchy_max_depth, tokens.length].min
(2..max_depth).each do |depth|
partial_domain = tokens.last(depth).join('.')
return true if domain_list.include?(partial_domain)
end

return false
end

def mx_server_is_in?(domain_list)
Expand Down
55 changes: 51 additions & 4 deletions spec/address_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,53 @@
end
end

describe "#disposable_domain?" do
phoet marked this conversation as resolved.
Show resolved Hide resolved
let(:domain_hierarchy_max_depth) { 10 }

context "when the disposable domain does not have subdomains" do
let(:disposable_domain) { ValidEmail2.disposable_emails.find { |domain| domain.count(".") == 1 } }

it "is true if the domain is in the disposable_emails list" do
address = described_class.new("foo@#{disposable_domain}", nil, nil, domain_hierarchy_max_depth:)
expect(address.disposable_domain?).to eq true
end

it "is true if the domain is a subdomain of a disposable domain" do
address = described_class.new("foo@sub.#{disposable_domain}", nil, nil, domain_hierarchy_max_depth:)
expect(address.disposable_domain?).to eq true
end

it "is true if the domain is a deeply nested subdomain of a disposable domain" do
address = described_class.new("[email protected].#{disposable_domain}", nil, nil, domain_hierarchy_max_depth:)
expect(address.disposable_domain?).to eq true
end

it "is false if the domain is not in the disposable_emails list" do
address = described_class.new("[email protected]", nil, nil, domain_hierarchy_max_depth:)
expect(address.disposable_domain?).to eq false
end
end

context "when the disposable domain has subdomains" do
let(:disposable_domain) { ValidEmail2.disposable_emails.find { |domain| domain.count(".") > 1 } }

it "is true if the domain is in the disposable_emails list" do
address = described_class.new("foo@#{disposable_domain}", nil, nil, domain_hierarchy_max_depth:)
expect(address.disposable_domain?).to eq true
end

it "is true if the domain is a subdomain of a disposable domain" do
address = described_class.new("foo@sub.#{disposable_domain}", nil, nil, domain_hierarchy_max_depth:)
expect(address.disposable_domain?).to eq true
end

it "is true if the domain is a deeply nested subdomain of a disposable domain" do
address = described_class.new("[email protected].#{disposable_domain}", nil, nil, domain_hierarchy_max_depth:)
expect(address.disposable_domain?).to eq true
end
end
end

describe "caching" do
let(:email_address) { "[email protected]" }
let(:email_instance) { described_class.new(email_address) }
Expand Down Expand Up @@ -181,7 +228,7 @@
email_instance.valid_strict_mx?
end

it "calls the the MX servers lookup" do
it "calls the the MX servers lookup" do
Comment on lines -184 to +231
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been automatically fixed by my code editor. Removing unnecessary spaces.

email_instance.valid_strict_mx?

expect(Resolv::DNS).to have_received(:open).once
Expand Down Expand Up @@ -308,8 +355,8 @@
stub_const("ValidEmail2::DnsRecordsCache::MAX_CACHE_SIZE", 0)
end

it "prunes the cache" do
expect(dns_records_cache_instance).to receive(:prune_cache).once
it "prunes the cache" do
expect(dns_records_cache_instance).to receive(:prune_cache).once

email_instance.valid_mx?
end
Expand All @@ -331,7 +378,7 @@
end
end
end
end
end
end
end
end