phase#1: input/output #1

Soltaniant · 2022-08-20T05:19:20Z

Phase#1: Input/Output

Goals: we have a single-page website where you can add a source and destination node to a pipeline and connect them together. In a source Node, you can upload a file as your dataset, and for destination node, you can choose a file format which you want your final result to be provided in. after creating the basic pipeline, you can execute it, and download the result, from destination node.

Smaller Steps:

Upload a csv file to our database. with its content correctly being imported.
Execute the generated pipeline (check the correctness of result)
download the exact imported file, in the same csv format.
Testing the input output process either in this project or a separate project is needed. now and then we will need to use this test for newer file formats and checking the correctness of the output

More Ideas:

You are free to add your idea, without considering whether it is possible or not. Just let your mind to fly!

Adding support for multiple outputs
Add Code documentations for architectureal subjects
convert uploaded data to Json and then dataTable (it seems that json file included dataType as well)

Further Tasks (Nice To Have):

Conclusion for first phase:

Nice to have a meeting and think about the pros and cons of the current phase and check plannings for the next phase.

Implementation

Interfaces and abstracts

Node

General abstract class of all nodes. we can also use template methods for default functions.
- Execute(ExecutionType : enum, Nodes : Dictionary) => string : return the string query of the node
- Id : string
ProcessNode : Node

All functional nodes, manipulating the source data are here. each one has its own execution method with some private properties.
- previousNodes : list of Ids
IParser

this parser is used for converting source raw data to a single unique datatype suitable for importing to database.
- Parse(rawData) : DataTable
IDatabase
- ImportDataTable(table : DataTable, tableName : string)
  
  details of this method is not clear, though the functionality is. (programmer itself might need to check and think more about it)
- RunQuery(queryString : string) => TempTable
- CreateTable(name : string) => string

Classes

SourceNode : INode
- tableName : string
IDestinationiNode : INode
- previousNodes : list of Ids
- tableName
Pipeline

after execution, the output result is stored in a table in database. the name is given from destinationNode. (while creating a destination node, an api is called and a table is created. the tablename then is returned as response to be stored in the recently created node in frontend)
- Nodes : Dictionary
- Execute(Nodes : Dictionary) : Dictionary (id -> output)
- Preview()
CSVParser : IParser
PostgresqlDatabase : IDatabase
CustomDeserializer (json to Pipeline)

While deserializing the received json file from API, we need to convert it to the Pipeline class. here we might need such a class. though it might not be needed if a simpler method does exist.

Services and Controllers:

DataInventoryService: handle data-related requests for uploading data or connection strings
- MapToParser(info) => IParser
- AddSource(dataset : File, datasetName : string) => tableName : string
- AddDestination(datasetName : string) => tableName : string
DataInventoryController: routing the requests and calling the appropriate functions
- AddSource(dataset : File, datasetName : string)
- AddDestination(datasetName : string)
PipelineService: handle all pipeline related services from executing to previewing
- Preview(pipeline : json, id : string) => after/before : Tupple
- Execute(pipeline : json) => Dictionary (id -> output result)
PipelineController: routing the requests and calling the appropriate functions
- Preview(pipeline : json, id : string)
- Execute(pipeline : json)

Enums

ExecutionType

executing a pipeline is done through many different conditions where we have a seperate type for each. here are some of them:
- FullExecution
- Heading
- Preview
- Validation

API Calls:

Execute([from body] pipeline : Pipeline) => Dictionary (id : string -> tablename : string)
AddSourceByFile() => tableName : string

file content is attached to request body for this API. the format and the filename are figured out by server itself. for uploading a file and its API in angular, see this link. this API will return the name of table in database where the uploaded data is stored. source node must store this name for further usages!
AddDestination([from body] datasetName : string) => tableName : string
Download(tableName : string, fileFormat‌ : string)

Soltaniant added the roadmap steps to achieve a phase in details label Aug 20, 2022

Soltaniant assigned Soltaniant, hosseinmasoodi, matinmoradi80 and smhh22 Aug 20, 2022

Soltaniant added Software Engineering and removed Software Engineering labels Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phase#1: input/output #1

phase#1: input/output #1

Soltaniant commented Aug 20, 2022 •

edited

Loading

phase#1: input/output #1

phase#1: input/output #1

Comments

Soltaniant commented Aug 20, 2022 • edited Loading

Phase#1: Input/Output

Implementation

Interfaces and abstracts

Classes

Services and Controllers:

Enums

API Calls:

Soltaniant commented Aug 20, 2022 •

edited

Loading