Skip to content

Commit

Permalink
change parse_name_df to parse_affil_df (titipata#67)
Browse files Browse the repository at this point in the history
Author information was incorrectly saved to affiliation information
  • Loading branch information
tanganyao authored and titipata committed Aug 21, 2019
1 parent c7c51f6 commit 219786b
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion scripts/pubmed_oa_spark.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,8 @@ def process_file(date_update, fraction=0.01):
filter(lambda x: x is not None).\
flatMap(lambda xs: [x for x in xs])
parse_affil_df = parse_affil_rdd.toDF()
parse_name_df.write.parquet(os.path.join(save_dir, 'pubmed_oa_affiliation_%s.parquet' % date_update_str),
# change to parse_affil_df
parse_affil_df.write.parquet(os.path.join(save_dir, 'pubmed_oa_affiliation_%s.parquet' % date_update_str),
mode='overwrite')
print('Finished parsing Pubmed Open-Access subset')

Expand Down

0 comments on commit 219786b

Please sign in to comment.