Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated deprecated dependency request-promise-native #576

Merged

Conversation

yashkohli88
Copy link
Contributor

@yashkohli88 yashkohli88 commented May 15, 2024

The crawler code has been updated to replace the deprecated dependency, request-promise-native, with axios. The following tasks have been performed:

  • A fetch.js file was created in the lib folder. This file contains a function which takes request parameters as input, uses the axios library to initiate HTTP requests, and returns responses. It accepts method, URL, responseType ('json', 'text', 'stream'), headers, and data as request parameters.
  • This function was integrated into the files in lib/providers/fetch folder wherever the request-promise-native library was previously invoked. Various files and functions were updated to accommodate this change.
  • All replaced statements were individually tested, and an integration test suite was also used to ensure that no functionality was broken due to these changes.
  • Test files were updated where stubs had been declared using request-promise-native.
  • The request-promise-native library was removed from the package.json file.
  • A test file for fetch.js, named fetchTests.js, was created to test the function with various scenarios. Please note that an internet connection is necessary to run these test cases.

Future scope -

  • Use this same file and function to replace request-promise-native dependency in service repo as well
  • Replace request and request-retry library with axios and axios-retry
  • Refactor the code to declare the headers in centralised fetch.js file

This PR is related to the issue #555

Comment on lines 30 to 32
const response = await callFetch(options)
if (response.status !== 200)
this.logger.info(`Failure Firing webhook failed: ${response.status} ${response.statusText}`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could not be tested as I was not able to find the correct flow to hit this function.

Copy link
Collaborator

@qtomlinson qtomlinson Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup to test the webhookDeltaStore locally:

  • in .env file, set the following env variables:

    • CRAWLER_STORE_PROVIDER=cdDispatch+cd(file)+webhook
    • CRAWLER_WEBHOOK_URL="http://service:4000/webhook"
    • CRAWLER_WEBHOOK_TOKEN="Bearer github_token_here"
    • WEBHOOK_CRAWLER_SECRET="Bearer github_token_here"
  • trigger a harvest in the crawler. Upon completion of harvest the highlighted code will be executed.

  • To ensure the message has been correctly transmitted to service, can also set a break point at handleCrawlerCall in service. Verify that the response status is set to 200.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @qtomlinson for taking out time to help me with testing this portion of the code change. I followed these steps and I am able to test the remaining portion as well.

Comment on lines 89 to 94
async _getSourceArchive(dir, url, podspec) {
const archive = path.join(dir.name, `${podspec.name}-${podspec.version}.archive`)
const output = path.join(dir.name, `${podspec.name}-${podspec.version}`)
const response = await callFetch({ url: url, responseType: 'stream' })
return new Promise((resolve, reject) => {
request({ url: url, json: false, encoding: null }).pipe(
response.pipe(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could not be tested as I was not able to find the pod component without github url detail which would have triggered this function.

Copy link
Collaborator

@qtomlinson qtomlinson Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above code path can be hit using one of these coordinates. Please also consider adding one of these to integration test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @qtomlinson for the list of coordinates. It was very helpful and this functionality has been covered in the testing now. We can add one of the version of pod/cocoapods/-/xcbeautify coordinate to our integration test cases as the size of this particular component is least out of all the coordinates that I tested.

Comment on lines 71 to 78
const response = await callFetch({
url: `https://crates.io${version.dl_path}`,
responseType: 'stream',
headers: {
'User-Agent': 'clearlydefined.io crawler ([email protected])',
Accept: 'text/html'
}
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header 'Accept: text/html' is added to make sure we get the crate package in the stream response. If we do not add this header in the request the response is coming out to be a json containing the download URL for that crate. request-promise-native was redirecting and getting crate package as response by default but while using axios we have to specify this header to directly get the crate package. @lumaxis I request your opinion on this approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajhenry Your feedback is much appreciated!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me as we're essentially mimicking what a web crawler would request 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the confirmation!

@qtomlinson qtomlinson requested a review from elrayle July 4, 2024 15:52
@qtomlinson qtomlinson marked this pull request as draft July 5, 2024 16:23
@qtomlinson qtomlinson marked this pull request as ready for review July 23, 2024 22:39
Copy link
Contributor

@lumaxis lumaxis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Should also make it fairly easy to move to fetch eventually, if we want to 🚀

@qtomlinson qtomlinson merged commit adc6832 into clearlydefined:master Aug 2, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants