-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,18 @@ | ||
{ | ||
"files.watcherExclude": { | ||
"datasets/**": true | ||
} | ||
}, | ||
"cSpell.ignoreWords": [ | ||
"aix", | ||
"check", | ||
"commit", | ||
"compat", | ||
"data", | ||
"hdfs", | ||
"mockito", | ||
"mode", | ||
"output", | ||
"stream", | ||
"test" | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,112 @@ | ||
# averloc | ||
Repository for the Adversarial ML on Code things | ||
# averloc (AdVERsarial Learning On Code) | ||
|
||
Repository for Semantic Robustness of Models on Source Code. | ||
|
||
## Directory Structure | ||
|
||
In this repository, we have the following directories: | ||
|
||
### `./datasets` | ||
|
||
**Note:** the datasets are all much too large to be included in this GitHub repo. This is simply the | ||
structure as it would exist on disk once our framework is setup. | ||
|
||
```bash | ||
./datasets | ||
+ ./raw # The four datasets in "raw" form | ||
+ ./normalized # The four datasets in the "normalized" JSON-lines representation | ||
+ ./preprocess | ||
+ ./tokens # The four datasets in a representation suitable for token-level models | ||
+ ./ast-paths # The four datasets in a representation suitable for code2seq | ||
+ ./transformed # The four datasets transformed via our code-transformation framework | ||
+ ./normalized # Transformed datasets normalized back into the JSON-lines representation | ||
+ ./preprocessed # Transformed datasets preprocessed into: | ||
+ ./tokens # ... a representation suitable for token-level models | ||
+ ./ast-paths # ... a representation suitable for code2seq | ||
+ ./adversarial # Datasets in the format < source, target, tranformed-variant #1, #2, ..., #K > | ||
+ ./tokens # ... in a token-level representation | ||
+ ./ast-paths # ... in an ast-paths representation | ||
``` | ||
|
||
## `./models` | ||
|
||
We have two Machine Learning on Code models. Both of them are trained on the Code Summarization task. The | ||
seq2seq model has been modified to include an adversarial training loop and a way to compute Integrated Gradients. | ||
The code2seq model has been modified to include an adversarial training loop and emit attention weights. | ||
|
||
```bash | ||
./models | ||
+ ./code2seq # seq2seq model implementation | ||
+ ./pytorch-seq2seq # code2seq model implementation | ||
``` | ||
|
||
## `./results` | ||
|
||
This directory stores results that are small-enough to be checked into GitHub. In addition, a few utility scripts | ||
live here. | ||
|
||
## `./scratch` | ||
|
||
This directory contains exploratory data analysis and evaluations that did not fit into the overall workflow of our | ||
code-transformation + adversarial training framework. For instance, HTML-based visualizations of Integrated Gradients | ||
and attention exist in this directory. | ||
|
||
## `./scripts` | ||
|
||
In this directory there are a large number of scripts for doing various chores related to running and maintaing | ||
this code transformation infrastructure. | ||
|
||
## `./tasks` | ||
|
||
This directory houses the implementations of various pieces of our core framework: | ||
|
||
```bash | ||
./tasks | ||
+ ./astor-apply-transforms | ||
+ ./depth-k-test-seq2seq | ||
+ ./download-c2s-dataset | ||
+ ./download-csn-dataset | ||
+ ./extract-adv-dataset-c2s | ||
+ ./extract-adv-dataset-tokens | ||
+ ./generate-baselines | ||
+ ./integrated-gradients-seq2seq | ||
+ ./normalize-raw-dataset | ||
+ ./preprocess-dataset-c2s | ||
+ ./preprocess-dataset-tokens | ||
+ ./spoon-apply-transforms | ||
+ ./test-model-code2seq | ||
+ ./test-model-seq2seq | ||
+ ./train-model-code2seq | ||
+ ./train-model-seq2seq | ||
``` | ||
|
||
## `./vendor` | ||
|
||
This directory contains dependencies in the form of git submodukes. | ||
|
||
## `Makefile` | ||
|
||
We have one overarching `Makefile` that can be used to drive a number of the data generation, training, testing, adn evaluation tasks. | ||
|
||
``` | ||
download-datasets (DS-1) Downloads all prerequisite datasets | ||
normalize-datasets (DS-2) Normalizes all downloaded datasets | ||
extract-ast-paths (DS-3) Generate preprocessed data in a form usable by code2seq style models. | ||
extract-tokens (DS-3) Generate preprocessed data in a form usable by seq2seq style models. | ||
apply-transforms-c2s-java-med (DS-4) Apply our suite of transforms to code2seq's java-med dataset. | ||
apply-transforms-c2s-java-small (DS-4) Apply our suite of transforms to code2seq's java-small dataset. | ||
apply-transforms-csn-java (DS-4) Apply our suite of transforms to CodeSearchNet's java dataset. | ||
apply-transforms-csn-python (DS-4) Apply our suite of transforms to CodeSearchNet's python dataset. | ||
apply-transforms-sri-py150 (DS-4) Apply our suite of transforms to SRI Lab's py150k dataset. | ||
extract-transformed-ast-paths (DS-6) Extract preprocessed representations (ast-paths) from our transfromed (normalized) datasets | ||
extract-transformed-tokens (DS-6) Extract preprocessed representations (tokens) from our transfromed (normalized) datasets | ||
extract-adv-datasets-tokens (DS-7) Extract preprocessed adversarial datasets (representations: tokens) | ||
do-integrated-gradients-seq2seq (IG) Do IG for our seq2seq model | ||
docker-cleanup (MISC) Cleans up old and out-of-sync Docker images. | ||
submodules (MISC) Ensures that submodules are setup. | ||
help (MISC) This help. | ||
test-model-code2seq (TEST) Tests the code2seq model on a selected dataset. | ||
test-model-seq2seq (TEST) Tests the seq2seq model on a selected dataset. | ||
train-model-code2seq (TRAIN) Trains the code2seq model on a selected dataset. | ||
train-model-seq2seq (TRAIN) Trains the seq2seq model on a selected dataset. | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
#!/bin/bash | ||
|
||
|
||
|
||
for MODEL in normal adversarial-one-step adversarial-all; do | ||
|
||
if [ "${MODEL}" = "normal" ]; then | ||
FULL_STR='code2seq,Natural,' | ||
elif [ "${MODEL}" = "adversarial-one-step" ]; then | ||
FULL_STR=',Adv-${\seqs^1}$,' | ||
else | ||
FULL_STR=',Adv-${\seqs^{1,5}}$,' | ||
fi | ||
|
||
for DATASET in c2s/java-small csn/java sri/py150 csn/python; do | ||
|
||
THE_PATH_NORM="${1}/${DATASET}/${MODEL}/normal/attacked_metrics.txt" | ||
THE_PATH_ONE="${1}/${DATASET}/${MODEL}/just-one-step-attacks/attacked_metrics.txt" | ||
THE_PATH_ALL="${1}/${DATASET}/${MODEL}/all-attacks/attacked_metrics.txt" | ||
|
||
F1_NORM=0.0 | ||
if [ -f "${THE_PATH_NORM}" ]; then | ||
F1_NORM=$(grep -Po 'exact_match"?: \d+.\d+' "${THE_PATH_NORM}" | awk '{ print $2 }') | ||
fi | ||
|
||
F1_ONE=0.0 | ||
if [ -f "${THE_PATH_ONE}" ]; then | ||
F1_ONE=$(grep -Po 'exact_match"?: \d+.\d+' "${THE_PATH_ONE}" | awk '{ print $2 }') | ||
fi | ||
|
||
F1_ALL=0.0 | ||
if [ -f "${THE_PATH_ALL}" ]; then | ||
F1_ALL=$(grep -Po 'exact_match"?: \d+.\d+' "${THE_PATH_ALL}" | awk '{ print $2 }') | ||
fi | ||
|
||
FULL_STR+=$(printf %.2f,%.2f,%.2f, ${F1_NORM} ${F1_ONE} ${F1_ALL}) | ||
done | ||
|
||
echo ${FULL_STR::-1} | ||
done |
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
public static void quoteHtmlChars( | ||
OutputStream output, byte[] buffer, int off, int len | ||
) throws IOException { | ||
System.out.println("usedMergeTransitionUgiSetup"); | ||
System.out.println("statusPipelineUrl"); | ||
System.out.println("shuffleTrashApplicationsCodeRestartSplitAllocatedMaximumCorrupt"); | ||
for (int i = off; i < (off + len); i++) { | ||
switch (buffer[i]) { | ||
case '&' : | ||
output.write(ampBytes); | ||
break; | ||
case '<' : | ||
output.write(ltBytes); | ||
break; | ||
case '>' : | ||
output.write(gtBytes); | ||
break; | ||
case '\'' : | ||
output.write(aposBytes); | ||
break; | ||
case '"' : | ||
output.write(quotBytes); | ||
break; | ||
default : | ||
output.write(buffer, i, 1); | ||
} | ||
} | ||
System.out.println("validMinBalancerUserSkipSyncCodecRename"); | ||
System.out.println("namenodeFsSlashJarFirstHosts"); | ||
System.out.println("badHostsCounterEncryptionEntitiesRenderSortIdentifier"); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
public static void quoteHtmlChars( | ||
OutputStream output, byte[] buffer, int off, int len | ||
) throws IOException { | ||
for (int i = off; i < (off + len); i++) { | ||
switch (buffer[i]) { | ||
case '&' : | ||
output.write(ampBytes); | ||
break; | ||
case '<' : | ||
output.write(ltBytes); | ||
break; | ||
case '>' : | ||
output.write(gtBytes); | ||
break; | ||
case '\'' : | ||
output.write(aposBytes); | ||
break; | ||
case '"' : | ||
output.write(quotBytes); | ||
break; | ||
default : | ||
output.write(buffer, i, 1); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
public void testCheckCommitAixCompatMode() throws IOException { | ||
DFSClient dfsClient = Mockito.mock(DFSClient.class); | ||
Nfs3FileAttributes attr = new Nfs3FileAttributes(); | ||
HdfsDataOutputStream fos = Mockito.mock(HdfsDataOutputStream.class); | ||
// Last argument "true" here to enable AIX compatibility mode. | ||
OpenFileCtx ctx = new OpenFileCtx(fos, attr, "/dumpFilePath", | ||
dfsClient, new IdUserGroup(new NfsConfiguration()), 1 == 1); | ||
// Test fall-through to pendingWrites check in the event that commitOffset | ||
// is greater than the number of bytes we've so far flushed. | ||
Mockito.when(fos.getPos()).thenReturn(((long) (2))); | ||
COMMIT_STATUS status = ctx.checkCommitInternal(5, null, 1, attr, 0 != 0); | ||
Assert.assertTrue(status == COMMIT_STATUS.COMMIT_FINISHED); | ||
// Test the case when we actually have received more bytes than we're trying | ||
// to commit. | ||
Mockito.when(fos.getPos()).thenReturn(((long) (10))); | ||
status = ctx.checkCommitInternal(5, null, 1, attr, 1 != 1); | ||
Assert.assertTrue(status == COMMIT_STATUS.COMMIT_DO_SYNC); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
public void testCheckCommitAixCompatMode() throws IOException { | ||
DFSClient dfsClient = Mockito.mock(DFSClient.class); | ||
Nfs3FileAttributes attr = new Nfs3FileAttributes(); | ||
HdfsDataOutputStream fos = Mockito.mock(HdfsDataOutputStream.class); | ||
// Last argument "true" here to enable AIX compatibility mode. | ||
OpenFileCtx ctx = new OpenFileCtx(fos, attr, "/dumpFilePath", | ||
dfsClient, new IdUserGroup(new NfsConfiguration()), true); | ||
// Test fall-through to pendingWrites check in the event that commitOffset | ||
// is greater than the number of bytes we've so far flushed. | ||
Mockito.when(fos.getPos()).thenReturn(((long) (2))); | ||
COMMIT_STATUS status = ctx.checkCommitInternal(5, null, 1, attr, false); | ||
Assert.assertTrue(status == COMMIT_STATUS.COMMIT_FINISHED); | ||
// Test the case when we actually have received more bytes than we're trying | ||
// to commit. | ||
Mockito.when(fos.getPos()).thenReturn(((long) (10))); | ||
status = ctx.checkCommitInternal(5, null, 1, attr, false); | ||
Assert.assertTrue(status == COMMIT_STATUS.COMMIT_DO_SYNC); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
def delete_record(scheme_job_scheduler, response_master_running_report_args): | ||
"""Delete a record and it's persistent identifiers.""" | ||
response_master_running_report_args.delete() | ||
PersistentIdentifier.query.filter_by(object_type='rec', object_uuid= | ||
response_master_running_report_args.id).update({ | ||
PersistentIdentifier.status: PIDStatus.DELETED}) | ||
scheme_job_scheduler.delete_buckets(response_master_running_report_args) | ||
db.session.commit() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
def delete_record(cls, record): | ||
"""Delete a record and it's persistent identifiers.""" | ||
record.delete() | ||
PersistentIdentifier.query.filter_by(object_type='rec', object_uuid= | ||
record.id).update({PersistentIdentifier.status: PIDStatus.DELETED}) | ||
cls.delete_buckets(record) | ||
db.session.commit() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
def replace(identifier_commit_invalid): | ||
print('line_maps') | ||
print('mode_clean_invalid_mask_valid_am_fetch') | ||
"""Replaces a snapshot""" | ||
app = get_app() | ||
tracker_or_and_reader_failover = app.get_snapshot(identifier_commit_invalid | ||
) | ||
if not tracker_or_and_reader_failover: | ||
click.echo("Couldn't find snapshot %s" % identifier_commit_invalid) | ||
sys.exit(1) | ||
app.remove_snapshot(tracker_or_and_reader_failover) | ||
app.create_snapshot(identifier_commit_invalid) | ||
click.echo('Replaced snapshot %s' % identifier_commit_invalid) | ||
print('usage_incr_segment_stat_feature_started_apps') | ||
print('metadata_read_locations_summary_one') |