-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added styles and finished first draft
- Loading branch information
Showing
86 changed files
with
355 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
\documentclass[11pt]{article} | ||
\usepackage[a4paper, hmargin={2.8cm, 2.8cm}, vmargin={2.5cm, 2.5cm}]{geometry} % Geometri-pakke: Styrer bl.a. maginer % | ||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
\usepackage[babel, lille]{ku-forside} % KU-forside | ||
% | ||
% Mini-manual til ku-forside pakken: | ||
% | ||
% Sprogmuligheder: da, en | ||
% babel loader babelpakken, med det valgte sprog | ||
% Fakultetsmuligheder: farma, hum, jur, ku, life, nat, samf, sund, teo | ||
% Farvemuligheder: sh, farve | ||
% Forsidemuligheder: lille, stor, titelside | ||
% titelside er identisk med designet på ku.dk/designmanual | ||
% lille er giver et lille logo sammen med titlen på den første side | ||
% stor er giver et stort logo sammen med titlen på den første side | ||
% | ||
% Default er [da,nat,farve,titelside] | ||
% | ||
% Ex. \usepackage[babel, lille, jur, sh, en]{ku-forside} giver et lille logo i sorthvid for juridisk fakultet og loader babelpakken med engelsk som sprog. | ||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
%%% Titel %%% | ||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
\titel{Test} % | ||
\undertitel{Test test} % | ||
\opgave{Overspringshandling} % Findes kun under 'titelside' | ||
\forfatter{Navnet}% | ||
\dato{\today}% | ||
\vejleder{Doktoren} % Findes kun under 'titelside' | ||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
%%% Her begynder dokumentet %%% | ||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
\begin{document} | ||
\maketitle % LAVER TITLEN | ||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
%%%%%%%%%%%%%%%%%%%% TEKST BEGYNDER HER %%%%%%%%%%%%%%%%%%%%%%% | ||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
\section{Reserving on an aggregated level} | ||
$$ | ||
R=\sum_{i=2}^{n}R_{i}=\sum_{i=2}^{n}D_{i(n+1-i)}\left(\prod_{j=n+1-i}^{n-1}\hat{f}_{j}-1\right) | ||
\label{(a0)} | ||
$$ | ||
|
||
|
||
\end{document} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
\section{Introduction} | ||
The \textit{Approximate Similarity Search Problem} regards efficiently finding a set $A$ from a corpus $\mathcal{F}$ that is approximately similar to a query set $Q$ in regards to the \textit{Jaccard Similarity} metric $J(A,Q) = \frac{|A\cap Q|}{|A\cup Q|}$\cite{dahlgaard2017fast}\cite{fast-similarity-search}. Practical applications includes searching through large corpi of high-dimensional text documents like plagiarism-detection or website duplication checking among others\cite{vassilvitskii2018}. The main bottleneck in this problem is the \textit{curse of dimensionality}. Any trivial algorithm can solve this problem in $O(nd|Q|)$ time, but algorithms that query in linear time to the dimensionality of the corpus scale poorly when working with high-dimensional datasets. Text documents are especially bad in this regard since they often are encoded using \textit{$w$-shingles} ($w$ contigous words) which \citet{li2011hashing} shows easily can reach a dimensionality upwards of $d=2^{83}$ using just $5$-shingles.\\ | ||
The classic solution to this problem is the MinHash algorithm presented by \citet{broder1997minhash} to perform website duplication checking for the AltaVista search engine. It preprocesses the data once using hashing to perform effective querying in $O(n + |Q|)$ time, a significant improvement independent of the dimensionality of the corpus. | ||
Many improvements have since been presented to both improve processing time, query time and space efficiency. Notable mentions includes (but are not limited to) \textit{b-bit minwise hashing}\cite{ping2011theory}, \textit{fast similarity sketching}\cite{dahlgaard2017fast} and \textit{parallel bit-counting}\cite{fast-similarity-search} (the latter of which is the main focus of this project). | ||
These contributions have brought the query time down to sublinear time while keeping a constant error probability.\\ | ||
The addition of parallel bit-counting for querying | ||
Many improvements have since been presented to both improve processing time, query time and space efficiency. Notable mentions includes (but are not limited to) the use of \textit{tensoring}\cite{andoni2006efficient}, \textit{b-bit minwise hashing}\cite{ping2011theory}, \textit{fast similarity sketching}\cite{dahlgaard2017fast}. Simple applications of these techniques leads to efficient querying with a constant error probability. If one wishes to achieve an even better error probability such as $\varepsilon = o(1)$, it is standard practice within the field to use $O(\log_2(1/\varepsilon))$ independent data structures and return the best result, resulting in a query time of $O(\frac{1}{\epsilon} (n^\rho + |Q|))$. Recent advances by \citet{fast-similarity-search} show that it is possible to achieve an even better query time by sampling these data structures from one large sketch. The similarity between these sub-sketches and a query set needs to be evaluated efficiently when querying, which requires efficient computation of the cardinality of a bit-string. To do this, \citet{fast-similarity-search} presents a general parallel bit-counting algorithm that computes the cardinality of a list of bit-strings in sub-linear time amortized due to word-parallism. This brings the query time down to $O((\frac{n\log_2 w}{w})^\rho \log(1/\varepsilon) + |Q|)$.\\ | ||
The main focus of this project is to analyse, prove, implement and evaluate this parallel bit-counting technique. The analysis will be based on the original paper \cite{fast-similarity-search}, but with some modifications to resolve some of the issues with the original algorithm. This will also include a pseudo-code implementation of the algorithm since the original paper only describes it through recurrences. This leads to a proof of correctness that slightly alters from the one presented in the paper, and a runtime analysis that does indeed show the sub-linear run time as claimed.\\ | ||
This theoretical analysis will be backed up by a real-life implementation that can be benchmarked to help show this sub-linear run time in practice. At last, reflections on the results and methods will be made to back up eventual conclusions. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
% KU-forside pakke. Forsider til opgaver skrevet på Københavns Universitet | ||
% Skrevet af Christian Aastrup. Designet af forsiderne følger det på http://www.ku.dk/designmanual | ||
% | ||
\ProvidesPackage{ku-forside}[2007/07/07 v1.0 Frontpages with University of Cph. logos] | ||
% | ||
%Definerer Standard SPROG/AFDELING/FARVE | ||
\def\SPROG{da}\def\FARVE{farve}\def\AFDELING{nat}\def\FORSIDE{titelside} | ||
% | ||
% Laver SPROG-mulighederne til 'if's | ||
\newif\if@en \newif\if@da | ||
% | ||
% Laver AFDELINGS-mulighederne til 'if's | ||
\newif\if@ku \newif\if@farma \newif\if@hum | ||
\newif\if@jur \newif\if@life \newif\if@nat | ||
\newif\if@samf \newif\if@sund \newif\if@teo | ||
% | ||
% Laver FARVE-mulighederne til 'if's | ||
\newif\if@farve \newif\if@sh | ||
% | ||
% Laver FORSIDE-mulighederne til 'if's | ||
\newif\if@titelside \newif\if@stor \newif\if@lille | ||
% | ||
\newif\if@babel \DeclareOption{babel}{\@babeltrue} | ||
% | ||
% Erklærer sprogene som 'options' i pakke-kaldet | ||
\DeclareOption{en}{\@entrue} \DeclareOption{da}{\@datrue} | ||
% | ||
% Erklærer afdelingerne som 'options' i pakke-kaldet | ||
\DeclareOption{ku}{\@kutrue} \DeclareOption{farma}{\@farmatrue} \DeclareOption{hum}{\@humtrue} | ||
\DeclareOption{jur}{\@jurtrue} \DeclareOption{life}{\@lifetrue} \DeclareOption{nat}{\@nattrue} | ||
\DeclareOption{samf}{\@samftrue} \DeclareOption{sund}{\@sundtrue} \DeclareOption{teo}{\@teotrue} | ||
% | ||
% Erklærer farverne som 'options' i pakke-kaldet | ||
\DeclareOption{farve}{\@farvetrue} \DeclareOption{sh}{\@shtrue} | ||
% | ||
% Erklærer forsidemulighederne som 'options' i pakke-kaldet | ||
\DeclareOption{lille}{\@lilletrue} \DeclareOption{stor}{\@stortrue} | ||
\DeclareOption{titelside}{\@titelsidetrue} | ||
% | ||
\ProcessOptions\relax | ||
% | ||
% Definerer hvad der skal ske når sprogene er TRUE | ||
\if@en \def\SPROG{en} \fi \if@da \def\SPROG{da} \fi | ||
% | ||
% Definerer hvad der skal ske når afdelingerne er TRUE | ||
\if@ku \def\AFDELING{ku} \fi \if@farma \def\AFDELING{farma} \fi \if@hum \def\AFDELING{hum} \fi | ||
\if@jur \def\AFDELING{jur} \fi \if@life \def\AFDELING{life} \fi \if@nat \def\AFDELING{nat} \fi | ||
\if@samf \def\AFDELING{samf} \fi \if@sund \def\AFDELING{sund} \fi \if@teo \def\AFDELING{teo} \fi | ||
% | ||
% Definerer hvad der skal ske når farverne er TRUE | ||
\if@sh \def\FARVE{sh} \fi \if@farve \def\FARVE{farve} \fi | ||
% | ||
% Definerer hvad der skal ske når de forskellige forsidemuligheder er TRUE | ||
\if@stor \def\FORSIDE{stor} \fi \if@lille \def\FORSIDE{lille} \fi | ||
\if@titelside \def\FORSIDE{titelside} \fi | ||
% | ||
\def\OPGAVE{$\backslash$opgave$\{\ldots\}$} | ||
\def\FORFATTER{$\backslash$forfatter$\{\ldots\}$ el. $\backslash$author$\{\ldots\}$ } | ||
\def\TITEL{$\backslash$titel$\{\ldots\}$ el. $\backslash$title$\{\ldots\}$} | ||
\def\UNDERTITEL{$\backslash$undertitel$\{\ldots\}$} | ||
\def\VEJLEDER{$\backslash$vejleder$\{\ldots\}$} | ||
\def\AFLEVERINGSDATO{$\backslash$dato$\{\ldots\}$ el. $\backslash$date$\{\ldots\}$} | ||
% | ||
\renewcommand{\author}[1]{\def\FORFATTER{#1}} | ||
\renewcommand{\title}[1]{\def\TITEL{#1}} | ||
\renewcommand{\date}[1]{\def\AFLEVERINGSDATO{#1}} | ||
% | ||
\newcommand{\opgave}[1]{\def\OPGAVE{#1}} | ||
\newcommand{\forfatter}[1]{\def\FORFATTER{#1}} | ||
\newcommand{\titel}[1]{\def\TITEL{#1}} | ||
\newcommand{\undertitel}[1]{\def\UNDERTITEL{#1}} | ||
\newcommand{\vejleder}[1]{\def\VEJLEDER{#1}} | ||
\newcommand{\dato}[1]{\def\AFLEVERINGSDATO{#1}} | ||
% | ||
% Pakker nødvendige for at sætte forsiden op % | ||
% | ||
%\RequirePackage[OT2,OT4]{fontenc} | ||
\RequirePackage{eso-pic,graphicx,fix-cm,ae,aecompl,ifthen} % | ||
\RequirePackage[usenames]{color} % | ||
%% BABEL-option: Undersøger det erklærede sprog og sætter pakken Babel derefter %% | ||
\if@babel | ||
\ifthenelse{\equal{\SPROG}{en}}{\RequirePackage[danish,english]{babel}}{} % Engelsk ordeling, overskrifts- og kapitel struktur % | ||
\ifthenelse{\equal{\SPROG}{da}}{\RequirePackage[english,danish]{babel}}{} % Dansk ordeling, overskrifts- og kapitel struktur % | ||
% Bemærk at begge sprog indlæses. Rækkefølgen er vigtig, idet det er det sidste sprog som dokumnetet generelt sættes i. % | ||
% Det andet sprogs orddeling mm. kan man få fat i ved at skrive \selectlanguage{sprog} i brødteksten % | ||
\fi | ||
% | ||
%% FORSIDEN DEFINERES: % | ||
% | ||
% Mulighed: titelside | ||
\ifthenelse{\equal{\FORSIDE}{titelside}}{ | ||
\def\tyk{\fontfamily{phv}\fontseries{bx}\selectfont} %Bold extended % | ||
\def\tynd{\fontfamily{phv}\fontseries{sb}\selectfont} % Semi-bold % | ||
\def\maketitle{\thispagestyle{empty} % | ||
\AddToShipoutPicture*{\put(0,0){\includegraphics*[viewport=0 0 700 600]{\AFDELING-\FARVE}}}% % | ||
\AddToShipoutPicture*{\put(0,602){\includegraphics*[viewport=0 600 700 1600]{\AFDELING-\FARVE}}}% % | ||
\AddToShipoutPicture*{\put(0,0){\includegraphics*{\AFDELING-\SPROG}}}% % | ||
\AddToShipoutPicture*{\put(50,583.5){\fontsize{20 pt}{22 pt} \tyk \OPGAVE }} % % | ||
\AddToShipoutPicture*{\put(50,555.3){\fontsize{14 pt}{16 pt} \tynd \FORFATTER }} % % | ||
\AddToShipoutPicture*{\put(50,499){\fontsize{22 pt}{24 pt} \tynd \TITEL }} % % | ||
\AddToShipoutPicture*{\put(50,480.5){\fontsize{14 pt}{16 pt} \tynd \UNDERTITEL }} % % | ||
\AddToShipoutPicture*{\put(50,92){\fontsize{11 pt}{12 pt} \tynd \VEJLEDER }} % % | ||
\AddToShipoutPicture*{\put(50,66.7){\fontsize{11 pt}{12 pt} \tynd \AFLEVERINGSDATO }} % % | ||
\phantom{Usynlig, men nødvendig} % | ||
\newpage \noindent}}{} % | ||
% Mulighed: lille | ||
\ifthenelse{\equal{\FORSIDE}{lille}}{ | ||
\def\maketitle{\thispagestyle{plain} | ||
\AddToShipoutPicture*{\put(035,613){\includegraphics*[viewport=0 600 700 1600, scale=0.88]{\AFDELING-\FARVE}}}% Billedet bruges | ||
\AddToShipoutPicture*{\put(-010,613){\includegraphics*[viewport=0 600 420 1600, scale=0.88]{\AFDELING-\FARVE}}}% tre gange for at | ||
\AddToShipoutPicture*{\put(400,613){\includegraphics*[viewport=0 600 420 1600, scale=0.88]{\AFDELING-\FARVE}}}% få stregen lang. | ||
\AddToShipoutPicture*{\put(79,755){\large{\textbf{\TITEL}}}}% | ||
\AddToShipoutPicture*{\put(79,733){\UNDERTITEL}}% | ||
\AddToShipoutPicture*{\put(79,715){\tiny{\emph{\FORFATTER}}}}% | ||
\AddToShipoutPicture*{\put(79,702){\tiny{\AFLEVERINGSDATO}}}% | ||
\phantom{Usynlig, men nødvendig} | ||
\vspace*{3.2cm} % | ||
\noindent}}{} % | ||
% Mulighed: stor | ||
\ifthenelse{\equal{\FORSIDE}{stor}}{ | ||
\def\maketitle{\thispagestyle{plain} | ||
\AddToShipoutPicture*{\put(0,602){\includegraphics*[viewport=156 649 700 1600, scale=1.4]{\AFDELING-\FARVE}}} % % | ||
\AddToShipoutPicture*{\put(79,755){\LARGE{\textbf{\TITEL}}}}% | ||
\AddToShipoutPicture*{\put(79,723){\Large{\UNDERTITEL}}}% | ||
\AddToShipoutPicture*{\put(79,695){\normalsize{\emph{\FORFATTER}}}}% | ||
\AddToShipoutPicture*{\put(79,670){\footnotesize{\AFLEVERINGSDATO}}}% | ||
\phantom{Usynlig, men nødvendig} | ||
\vspace*{5cm} % | ||
\noindent}}{} |
Binary file not shown.
Binary file not shown.
Oops, something went wrong.