Skip to content

Commit

Permalink
Merge pull request #214 from tgxn/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
tgxn authored Feb 26, 2025
2 parents 71a52cd + bd7d453 commit b3078c7
Show file tree
Hide file tree
Showing 19 changed files with 506 additions and 267 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/publish-pages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ jobs:
cache-name: cache-pages-yarn
with:
path: ./pages/node_modules/
key: cache-pages-yarn-${{ hashFiles('pages/package-lock.json') }}
key: cache-pages-yarn-${{ hashFiles('pages/yarn.lock') }}

- name: Install Dependencies
if: steps.cache-pages-yarn.outputs.cache-hit != 'true'
Expand Down
62 changes: 24 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,20 @@ Data Dumps: https://data.lemmyverse.net/

This project provides a simple way to explore Lemmy Instances and Communities.

![List of Communities](./docs/images/communities.png)
![List of Communities](./docs/images/0.10.0-communities.png)

The project consists of four modules:
## Project Structure

1. Crawler (NodeJS, Redis) `/crawler`
2. Frontend (ReactJS, MUI Joy, TanStack) `/frontend`
3. Deploy (Amazon CDK v2) `/cdk`
4. Data Site (GitHub Pages) `/pages`
The project consists of the following modules:

| Module Description | Path | Readme |
| --------------------------------------------- | ----------- | ------------------------------ |
| Crawler _(NodeJS, Redis)_ | `/crawler` | [README](./crawler/README.md) |
| Frontend _(ReactJS, MUI Joy, TanStack)_ | `/frontend` | [README](./frontend/README.md) |
| Deployment _(Amazon CDK v2)_ | `/cdk` | [README](./cdk/README.md) |
| Data Dump Site _(ReactJS, MUI, GitHub Pages)_ | `/pages` | [README](./pages/README.md) |

Each module has its own README with more details.

## FAQ

Expand All @@ -36,11 +42,12 @@ Additionally, instance tags and trust data is fetched from [Fediseer](https://gu

The NSFW filter is a client-side filter that filters out NSFW communities and instances from results by default.
The "NSFW Toggle" checkbox has thress states that you can toggle through:
| State | Filter | Value |
| --- | --- | --- |
| Default | Hide NSFW | false |
| One Click | Include NSFW | null |
| Two Clicks | NSFW Only | true |

| State | Filter | Value |
| ---------- | ------------ | ----- |
| Default | Hide NSFW | false |
| One Click | Include NSFW | null |
| Two Clicks | NSFW Only | true |

When you try to switch to a non-sfw state, a popup will appear to confirm your choice. You can save your response in your browsers cache and it will be remembered.

Expand Down Expand Up @@ -75,47 +82,26 @@ You can also download [Latest ZIP](https://nightly.link/tgxn/lemmy-explorer/work
- `instances.full.json` - list of all instances
- `overview.json` - metadata and counts

## Crawler

[Crawler README](./crawler/README.md)

## Frontend

[Frontend README](./frontend/README.md)

## Data Site

[Data Site README](./pages/README.md)

## Deploy

The deploy is an Amazon CDK v2 project that deploys the crawler and frontend to AWS.

`config.example.json` has the configuration for the deploy.

then run `cdk deploy --all` to deploy the frontend to AWS.
## Awesome Lemmy Links

## Similar Sites
### General

- https://browse.feddit.de/
- https://join-lemmy.org/instances
- https://github.com/maltfield/awesome-lemmy-instances
- https://lemmymap.feddit.de/
- https://browse.toast.ooo/
- https://lemmyfind.quex.cc/

## Lemmy Stats Pages
### Lemmy Stats Pages

- https://lemmy.fediverse.observer/dailystats
- https://the-federation.info/platform/73
- https://fedidb.org/software/lemmy
- https://fedidb.org/current-events/threadiverse

## Thanks / Related Lemmy Tools
### Thanks / Related Lemmy Tools

- https://github.com/db0/fediseer
- https://github.com/LemmyNet/lemmy-stats-crawler

# Credits

Logo made by Andy Cuccaro (@andycuccaro) under the CC-BY-SA 4.0 license.
- Logo made by Andy Cuccaro (@andycuccaro) under the CC-BY-SA 4.0 license.
- Lemmy Developers and Community for creating [Lemmy](https://github.com/LemmyNet).
6 changes: 5 additions & 1 deletion cdk/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Lemmy Explorer Deployment (Amazon CDK v2)

This is a CDK v2 project for deploying the Lemmy Explorer to AWS.
The deploy is an Amazon CDK v2 project that deploys the Lemmy Explorer frontend to AWS.

`config.example.json` has the configuration for the deploy, rename to `config.json` and fill in the values.

then run `cdk deploy --all` (or `yarn deploy`) to deploy the frontend to AWS.

## Deployment

Expand Down
4 changes: 2 additions & 2 deletions crawler/src/lib/crawlStorage.ts
Original file line number Diff line number Diff line change
Expand Up @@ -131,12 +131,12 @@ export class CrawlStorage {
}

async getAttributesWithScores(baseUrl: string, attributeName: string): Promise<any> {
const start = Date.now() - this.attributeMaxAge;
// const start = Date.now() - this.attributeMaxAge;
const end = Date.now();

const keys = await this.client.zRangeByScoreWithScores(
`attributes:instance:${baseUrl}:${attributeName}`,
start,
0, //start,
end,
);
return keys;
Expand Down
20 changes: 19 additions & 1 deletion crawler/src/output/file_writer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,28 @@ export default class OutputFileWriter {
await this.writeJsonFile(`${this.publicDataFolder}/tags.meta.json`, JSON.stringify(fediTags));
}

async storeMetricsSeries(data: { versions: any }) {
await this.writeJsonFile(`${this.publicDataFolder}/metrics.series.json`, JSON.stringify(data));
}
/**
* this method is used to store the instance metrics data
*/
public async storeInstanceMetricsData(instanceBaseUrl: String, data: any) {

public async storeInstanceMetricsData(
instanceBaseUrl: String,
data: {
instance: any[];
communityCount: number;
users: any[];
communities: any[];
posts: any[];
comments: any[];
versions: any[];
usersActiveDay: any[];
usersActiveMonth: any[];
usersActiveWeek: any[];
},
) {
await mkdir(this.metricsPath, {
recursive: true,
});
Expand Down
151 changes: 151 additions & 0 deletions crawler/src/output/output.ts
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,12 @@ export default class CrawlOutput {
const returnInstanceArray = await this.getInstanceArray();
await this.fileWriter.storeInstanceData(returnInstanceArray);

// VERSIONS DATA
await this.outputAttributeHistory(
returnInstanceArray.map((i) => i.baseurl),
"version",
);

const returnCommunityArray = await this.getCommunityArray(returnInstanceArray);
await this.fileWriter.storeCommunityData(returnCommunityArray);

Expand Down Expand Up @@ -518,8 +524,21 @@ export default class CrawlOutput {
private async generateInstanceMetrics(instance, storeCommunityData) {
// get timeseries
const usersSeries = await storage.instance.getAttributeWithScores(instance.baseurl, "users");
const usersActiveDaySeries = await storage.instance.getAttributeWithScores(
instance.baseurl,
"users_active_day",
);
const usersActiveMonthSeries = await storage.instance.getAttributeWithScores(
instance.baseurl,
"users_active_month",
);
const usersActiveWeekSeries = await storage.instance.getAttributeWithScores(
instance.baseurl,
"users_active_week",
);
const postsSeries = await storage.instance.getAttributeWithScores(instance.baseurl, "posts");
const commentsSeries = await storage.instance.getAttributeWithScores(instance.baseurl, "comments");
const communitiesSeries = await storage.instance.getAttributeWithScores(instance.baseurl, "communities");
const versionSeries = await storage.instance.getAttributeWithScores(instance.baseurl, "version");

// generate array with time -> value
Expand All @@ -529,6 +548,28 @@ export default class CrawlOutput {
value: item.value,
};
});

const usersActiveDay = usersActiveDaySeries.map((item) => {
return {
time: item.score,
value: item.value,
};
});

const usersActiveMonth = usersActiveMonthSeries.map((item) => {
return {
time: item.score,
value: item.value,
};
});

const usersActiveWeek = usersActiveWeekSeries.map((item) => {
return {
time: item.score,
value: item.value,
};
});

const posts = postsSeries.map((item) => {
return {
time: item.score,
Expand All @@ -541,6 +582,14 @@ export default class CrawlOutput {
value: item.value,
};
});

const communities = communitiesSeries.map((item) => {
return {
time: item.score,
value: item.value,
};
});

const versions = versionSeries.map((item) => {
return {
time: item.score,
Expand All @@ -555,6 +604,10 @@ export default class CrawlOutput {
posts,
comments,
versions,
usersActiveDay,
usersActiveMonth,
usersActiveWeek,
communities,
});
}

Expand Down Expand Up @@ -895,6 +948,104 @@ export default class CrawlOutput {
return instanceErrors;
}

/// VERSION HISTORY

private async outputAttributeHistory(
countInstanceBaseURLs: string[],
metricToAggregate: string,
): Promise<any> {
// this function needs to output and aghgregated array of versions, and be able to show change over time
// this will be used to show version history on the website

// basically, it creates a snapshot each 12 hours, and calculates the total at that point in time
// maybe it shoudl use a floating window, so that it can show the change over time

// load all versions for all instances
let aggregateDataObject: {
time: number;
value: string;
}[] = [];

console.log("countInstanceBaseURLs", countInstanceBaseURLs.length);

for (const baseURL of countInstanceBaseURLs) {
const attributeData = await storage.instance.getAttributeWithScores(baseURL, metricToAggregate);
// console.log("MM attributeData", attributeData);

if (attributeData) {
for (const merticEntry of attributeData) {
const time = merticEntry.score;
const value = merticEntry.value;

aggregateDataObject.push({ time, value });
}
}
}

console.log("aggregateDataObject", aggregateDataObject.length);

// console.log("aggregateDataObject", aggregateDataObject);

const snapshotWindow = 12 * 60 * 60 * 1000; // 12 hours
const totalWindows = 600; // 60 snapshots

// generate sliding window of x hours, look backwards
const currentTime = Date.now();

const buildWindowData = {};

let currentWindow = 0;
// let countingData = true;
while (currentWindow <= totalWindows) {
// console.log("currentWindow", currentWindow);
const windowOffset = currentWindow * snapshotWindow;

// get this
const windowStart = currentTime - windowOffset;
const windowEnd = windowStart - snapshotWindow;

// filter data
const windowData = aggregateDataObject.filter((entry) => {
// console.log("entry.time", entry.time, windowStart, windowEnd);
return entry.time < windowStart;
});
console.log("currentWindow", currentWindow, windowStart, windowEnd, windowData.length);

// // stop if no data
// if (windowData.length === 0) {
// countingData = false;
// break;
// }

// console.log("windowData", windowData);

// count data
const countData = {};
windowData.forEach((entry) => {
if (!countData[entry.value]) {
countData[entry.value] = 1;
} else {
countData[entry.value]++;
}
});

// console.log("countData", countData);

// store data
buildWindowData[windowStart] = countData;

currentWindow++;
}

console.log("buildWindowData", buildWindowData);

await this.fileWriter.storeMetricsSeries({
versions: buildWindowData,
});

// throw new Error("Not Implemented");
}

// FEDIVERSE

private async outputFediverseData(outputInstanceData): Promise<IFediverseDataOutput[]> {
Expand Down
Binary file added docs/images/0.10.0-communities.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
1 change: 1 addition & 0 deletions frontend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"@tanstack/react-query-devtools": "^4.29.23",
"@uidotdev/usehooks": "^2.0.1",
"axios": "^1.4.0",
"d3-scale": "^4.0.2",
"masonic": "^3.7.0",
"moment": "^2.29.4",
"notistack": "^3.0.1",
Expand Down
Loading

0 comments on commit b3078c7

Please sign in to comment.