transform and converting between html,json,python-dict,command-line,dir-path
pip3 install xdict
ReadMeDetailed/
├── console_color.md
├── CrtableReadMe
│ └── crtable.md
├── hdict_cmdline.md
├── hdict_object.md
├── hdict_xml.md
├── jprint.md
├── ltdict.md
├── TestReadMe
│ └── genrand.md
├── tuple_list.md
└── utils.md
- Definition:
1.1. list_tuple_dict
for example:
array = ['a','b','c','d','e'];
the corresponding list_tuple_dict of this array is :{
0: 'a',
1: 'b',
2: 'c',
3: 'd',
4: 'e'
}
1.2. html_dict
{ #a <html\_dict> is a dict with the following format:
0 : <element>,
1 : <element>,
......,
n : <element>
}
element
{ #each <element> is a dict with the follwing format:
'depth': 2,
'orig_obj_path': ['html','body','Error'],
'siblings_seq': 1,
'path': [0,'children',1,'children',1],
'attrib': {
'type': 'NoneType'
},
'tag': 'Error',
'breadth': 2,
'text': "Timeout!!!",
'children': <html_dict>
}
1.2.1 the meaning of each key in element:
take a simple html structure for explaining:
<html>
<head>
<meta>
</meta>
</head>
<body>
<Success>
</Success>
<Error type="NoneType">
Timeout!!!
</Error>
</body>
</html>
'depth' : which hierarchy
html depth = 0 ;
head depth = 1, body depth = 1;
meta depth =2, Success depth =2,Error depth =2;
'breadth' : the position of the element hierarchy
html breadth = 0 ;
head breadth = 0, body breadth = 1;
meta breadth = 0, Success breadth =1,Error breadth =2;
'siblings_seq' : the position of the element in its siblings
html siblings_seq = 0 ;
head siblings_seq = 0, body siblings_seq = 1;
meta siblings_seq = 0, Success siblings_seq =0,Error siblings_seq =1;
'orig_obj_path': the absolute path consist of tags splited into a array,just like the ancestors-and-self chain in html
Error orig_obj_path = ['html','body','Error']
'path': the absolute path consist of siblings_seq and 'children' splited into a array
html path = [0] ;
head path = [0,'children',0], body path = [0,'children',1];
meta path = [0,'children',0,'children',0], Success path =[0,'children',1,'children',0],Error path =[0,'children',1,'children',1];
'tag' : tag names when convered a html_dict back to a dict, tag will be used as key
'attrib': a dict stores the attributes , just like the html attributes, when convert a dict to a html_dict, it will store the data type of the original in element['attrib']['type']
'children': just like the descendants tree in html;if the children = {},which means this is a non-recursive leaf, the string fromat of leaf value will be stored in element['text'],the type of leaf value will be stored in element['attrib']['type']
'text':just like the text in html;if the children = {},which means this is a non-recursive leaf, the string fromat of leaf value will be stored in element['text'],the type of leaf value will be stored in element['attrib']['type']; if the children != {}, the element['text'] will be None.
list/array and tuple not exist in html_dict,when converting a dict a html_dict, list/array and tuple will be convert to the following formats(it use the element['attrib']['type'] to in dicate the original type):
the original key:value pair:
"Pair": ('swimming','cadence')
key = "Pair"
value = ('swimming','cadence')
the correspoding element of the key:value pair:
{
'depth': 3,
'orig_obj_path': ['Value', 'RuleInfo', 'Rule', 'Pair'],
'siblings_seq': 0,
'path': [0, 'children', 2, 'children', 1, 'children', 0],
'attrib': {
'type': 'tuple'#-----indicate the type of the value in the original key:value pair
},
'tag': 'Pair', #----the key name in the original key:value pair
'breadth': 0,
'text': None,
'children': {
0: {
'depth': 4,
'orig_obj_path': ['Value', 'RuleInfo', 'Rule', 'Pair', 0],
'siblings_seq': 0,
'path': [0, 'children', 2, 'children', 1, 'children', 0, 'children', 0],
'attrib': {
'type': 'str' #-----the 'swimming' of value tuple ('swimming','cadence')[0]
},
'tag': 0,
'breadth': 0,
'text': 'swimming',#------the 0st in the original value tuple
'children': {}
},
1: {
'depth': 4,
'orig_obj_path': ['Value', 'RuleInfo', 'Rule', 'Pair', 1],
'siblings_seq': 1,
'path': [0, 'children', 2, 'children', 1, 'children', 0, 'children', 1],
'attrib': {
'type': 'str'
},
'tag': 1,
'breadth': 1,
'text': 'cadence',#------the 1st in the original value tuple
'children': {}
}
}
}
when converting a html_dict back to a dict, list/array and tuple will be convert back to list_tuple_dict
1.3. structure_description_dict
for example,theoriginal_dict:
original_dict = {
'html': {
'head': {
'meta': None
} ,
'body': {
'Success': None,
'Error': 'Timeout!!!'
}
}
}
the corresponding html:
<html>
<head>
<meta>
</meta>
</head>
<body>
<Success>
</Success>
<Error>
Timeout!!!
</Error>
</body>
</html>
the corresponding structure_description_dict is a two dimension dict with the format:
{
0: { 0:<description>,1:<description>......L1:<description>},
1: {0:<description,1:<description>......L2:<description>},
......
n: {0:<description>,1:<description>......Ln:<description>}
}
the desciption is a dict with the following format:
{
'orig_obj_path': ['html', 'body', 'Error'],#the same meaning as orig_obj_path explained in 1.2.1
'breadth_path': [0, 1, 0], #the array after poping all 'children's of hdict_path
'hdict_path': [0, 'children', 1, 'children', 0],#the same meaning as path explained in 1.2.1
'leaf':True,
'leaf_sons':0,
'non_leaf_sons':0,
'leaf_descendants':0,
'non_leaf_descendants':0,
'hdict_lsib_path': [], #the hdict_path of the left sibling
'hdict_lcin_path': [0.'children',0,'children',0], #the hdict_path of the left cousin (non-common parent)
'hdict_rsib_path': [0, 'children', 1, 'children', 1], #the hdict_path of the right sibling
'hdict_rcin_path': [], #the hdict_path of the right cousin (non-common parent)
'orig_lsib_path': [], #the orig_obj_path of the left sibling
'orig_lcin_path': ['html','head','meta'], #the orig_obj_path of the left cousin (non-common parent)
'orig_rsib_path': ['html','body','Success'], #the orig_obj_path of the right sibling
'orig_rcin_path': [], #the orig_obj_path of the right cousin (non-common parent)
}
{
0: { # this mean the 0st hierarchy
0: { # this mean the 0st in the 0st hierarchy
'orig_obj_path': ['html'],
'breadth_path': [0],
'hdict_path': [0],
'leaf':False,
'leaf_sons':0,
'non_leaf_sons':2,
'leaf_descendants':3,
'non_leaf_descendants':2,
'hdict_lsib_path':[],
'hdict_rsib_path':[],
'hdict_lcin_path':[],
'hdict_rcin_path':[],
'orig_lsib_path':[],
'orig_rsib_path':[],
'orig_lcin_path':[],
'orig_rcin_path':[]
}
}
1: { # this mean the 1st hierarchy
0: { # this mean the 0st in 1st the hierarchy
'orig_obj_path': ['html', 'head'],
'breadth_path': [0, 0],
'hdict_path': [0, 'children', 0],
'leaf':False,
'leaf_sons':1,
'non_leaf_sons':0,
'leaf_descendants':1,
'non_leaf_descendants':0,
'hdict_lsib_path':[],
'hdict_rsib_path':[0,'children',1],
'hdict_lcin_path':[],
'hdict_rcin_path':[],
'orig_lsib_path':[],
'orig_rsib_path':['html','body'],
'orig_lcin_path':[],
'orig_rcin_path':[],
},
1: { # this mean the 1st in the 1st hierarchy
'external_path': ['html', 'body'],
'breadth_external_path': [0, 1],
'internal_path': [0, 'children', 1],
'leaf':False,
'leaf_sons':2,
'non_leaf_sons':0,
'leaf_descendants':2,
'non_leaf_descendants':0,
'hdict_lsib_path':[0, 'children', 0],
'hdict_rsib_path':[],
'hdict_lcin_path':[],
'hdict_rcin_path':[],
'orig_lsib_path':['html','head'],
'orig_rsib_path':[],
'orig_lcin_path':[],
'orig_rcin_path':[]
}
},
2: { # this mean the 2st hierarchy
0: { # this mean the 0st in the 2st hierarchy
'orig_obj_path': ['html', 'head', 'meta'],
'breadth_path': [0, 0, 0],
'hdict_path': [0, 'children', 0, 'children', 0],
'leaf':True,
'leaf_sons':0,
'non_leaf_sons':0,
'leaf_descendants':0,
'non_leaf_descendants':0,
'hdict_lsib_path':[],
'hdict_rsib_path':[],
'hdict_lcin_path':[],
'hdict_rcin_path':[0,'children',1,'children',0],
'orig_lsib_path':[],
'orig_rsib_path':[],
'orig_lcin_path':[],
'orig_rcin_path':['html','body','Error']
},
1: { # this mean the 1st in the 2st hierarchy
'orig_obj_path': ['html', 'body', 'Error'],
'breadth_path': [0, 1, 0],
'hdict_path': [0, 'children', 1, 'children', 0],
'leaf':True,
'leaf_sons':0,
'non_leaf_sons':0,
'leaf_descendants':0,
'non_leaf_descendants':0,
'hdict_lsib_path':[],
'hdict_rsib_path':[0,'children',1,'children',1],
'hdict_lcin_path':[0,'children',0,'children',0],
'hdict_rcin_path':[],
'orig_lsib_path':[],
'orig_rsib_path':['html','body','Success'],
'orig_lcin_path':['html','head','meta'],
'orig_rcin_path':[]
},
2: { # this mean the 2st in the 2st hierarchy
'orig_obj_path': ['html', 'body', 'Success'],
'breadth_path': [0, 1, 1],
'hdict_path': [0, 'children', 1, 'children', 1],
'leaf':True,
'leaf_sons':0,
'non_leaf_sons':0,
'leaf_descendants':0,
'non_leaf_descendants':0,
'hdict_lsib_path':[0, 'children', 1, 'children', 0],
'hdict_rsib_path':[],
'hdict_lcin_path':[],
'hdict_rcin_path':[],
'orig_lsib_path':['html','body','Error'],
'orig_rsib_path':[],
'orig_lcin_path':[],
'orig_rcin_path':[]
}
}
}
1.4 paths_relations_dict:
the paths_relations_dict is a two dimension dict with the format:
{
'h:o': { # key = tuple(hdict_path) : value = orig_obj_path
(0,): ['html'],
(0, 'children', 0): ['html', 'head'],
(0, 'children', 1): ['html', 'body'],
(0, 'children', 0, 'children', 0): ['html', 'head', 'meta'],
(0, 'children', 1, 'children', 0): ['html', 'body', 'Error'],
(0, 'children', 1, 'children', 1): ['html', 'body', 'Success']
},
'b:h': {# key = tuple(breadth_path) : value = orig_obj_path
(0,): [0],
(0,0): [0, 'children', 0],
(0,1): [0, 'children', 1],
(0,0,0): [0, 'children', 0, 'children', 0],
(0,1,1): [0, 'children', 1, 'children', 0],
(0,1,2): [0, 'children', 1, 'children', 1]
},
'o:h': {# key = tuple(orig_obj_path) : value = hdict_path
('html'): [0],
('html', 'head'): [0, 'children', 0],
('html', 'body'): [0, 'children', 1],
('html', 'head', 'meta'): [0, 'children', 0, 'children', 0],
('html', 'body', 'Error'): [0, 'children', 1, 'children', 0],
('html', 'body', 'Success'): [0, 'children', 1, 'children', 1]
},
'h:b': {# key = tuple(hdict_path) : value = breadth_path
(0,): (0,),
(0, 'children', 0): (0,0),
(0, 'children', 1): (0,1),
(0, 'children', 0, 'children', 0): (0,0,0),
(0, 'children', 1, 'children', 0): (0,1,1),
(0, 'children', 1, 'children', 1): (0,1,2)
}
}
2 . Abbreviations:
2.1 list_tuple_dict : ltdict
2.2 html_dict: hdict
2.3 structure_description_dict: sdict
2.4 paths_relations_dict: prdict
2.5seperator of command line: cmd_sp
command line: cmd_str
command line in list pattern: cmd_pl
cmdlines_ltdict: clt
cmdlines_deep_ltdict: cdlt
cmdlines_full_ltdict: cflt