Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try speeding up weeder with a bloom filter #186

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/Weeder.hs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ import Data.Set ( Set )
import qualified Data.Set as Set
import Data.Tree (Tree)
import qualified Data.Tree as Tree
import Data.BloomFilter.Hash

-- generic-lens
import Data.Generics.Labels ()
Expand Down Expand Up @@ -139,6 +140,10 @@ instance Show Declaration where
declarationStableName


-- TODO maybe we can make this faster by only hashing the location.
instance Hashable Declaration where
hashIO32 d s = hashIO32 (declarationStableName d) s
Comment on lines +143 to +145
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely not fast, and could be why this approach doesn't work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth trying the Uniquable instance of Module and OccName to produce hashes.

Note [The Unique of an OccName]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
They are efficient, because FastStrings have unique Int# keys.  We assume
this key is less than 2^24, and indeed FastStrings are allocated keys
sequentially starting at 0.

So we can make a Unique using
        mkUnique ns key  :: Unique
where 'ns' is a Char representing the name space.  This in turn makes it
easy to build an OccEnv.
-}

(or maybe even better: the Uniquable instance of the original Names we derive the declarations from with nameToDeclaration)


declarationStableName :: Declaration -> String
declarationStableName Declaration { declModule, declOccName } =
let
Expand Down
19 changes: 18 additions & 1 deletion src/Weeder/Run.hs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import Data.Function ( (&) )
import Data.Set ( Set )
import qualified Data.Set as Set
import qualified Data.Map.Strict as Map
import qualified Data.BloomFilter.Easy as BloomFilter

-- ghc
import GHC.Plugins
Expand Down Expand Up @@ -117,7 +118,9 @@ runWeeder weederConfig@Config{ rootPatterns, typeClassRoots, rootInstances, root
-- We only care about dead declarations if they have a span assigned,
-- since they don't show up in the output otherwise
dead =
outputableDeclarations analysis Set.\\ reachableSet
fastSpecialSetDifference
(outputableDeclarations analysis)
reachableSet

warnings =
Map.unionsWith (++) $
Expand Down Expand Up @@ -173,3 +176,17 @@ runWeeder weederConfig@Config{ rootPatterns, typeClassRoots, rootInstances, root
displayDeclaration :: Declaration -> String
displayDeclaration d =
moduleNameString ( moduleName ( declModule d ) ) <> "." <> occNameString ( declOccName d )

-- | This computes the same as (Set.\\) but is faster by assuming that the set difference will be small.
-- I.e. there will be more non-weeds than weeds.
--
fastSpecialSetDifference :: Set Declaration -> Set Declaration -> Set Declaration
fastSpecialSetDifference allDecls usedDecls =
let bloom = BloomFilter.easyList 0.05 (Set.toList usedDecls)
-- The elem docs say:
-- @
-- If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.
-- @
-- I.e. if some declaration is a weed, it will definitely show up in the result, but also some weeds will show up in the result.
-- So we need to do another set difference afterwards, but with a much smaller set.
in Set.difference (Set.filter (not . (`BloomFilter.elem` bloom)) allDecls) usedDecls
Comment on lines +186 to +192
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about 50% sure I got this backward in some way, so that might be part of the issue as well.

1 change: 1 addition & 0 deletions weeder.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ library
, text ^>= 2.0.1 || ^>= 2.1
, toml-reader ^>= 0.2.0.0
, transformers ^>= 0.5.6.2 || ^>= 0.6
, bloomfilter ^>= 2.0.1.2
hs-source-dirs: src
exposed-modules:
Weeder
Expand Down
Loading