-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathGOALS
600 lines (443 loc) · 20.9 KB
/
GOALS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
num-utils - Goals for this project
-----------------------------------
(NOTE: This document does not represent a current feature set, please read
the README file and man pages instead.)
This document outlines certain criteria that the num-utils should eventually
meet. It outlines a general interface that each utility should follow and
guides the project to bring the num-utils to version 1.0
The initial releases of the num-utils will be small and much less featured
than what is shown below. I originally wrote the num-utils with many of
the features shown below, but everything was rather unorganized and some of
the programs simply processed the data incorrectly. I decided that I would
go back to the beginning, write down all the features that I would like to
have and think are useful, and create a development cycle. I'll release
each version to the public for review, comments and contribution. Anyone who
is interested in helping out in any way, please contact me at [email protected].
Any contributions are welcome.
I've put an 'X' at the beginning of the line for features that have been
completed.
-= All numeric utilities =-
1. All num utils should process data on either STDIN or from a file or files
specified from command line arguments.
Examples of STDIN usage:
cat file | num-util
num-util < file
num-util
[Program will wait for data on STDIN until Ctrl-D is pressed]
Examples of file argument usage:
num-util file
num-util file1 file2 file3 ... fileN
num-util -[options] file1 file2 file3
2. All num utils should use the following options that are standard among
all the utilities:
X -h -- provide helpful usage information
X -V -- Be verbose. Send verbose information to STDERR.
X -d -- Debug information for developers. This implies verbose and more.
-q -- Quiet mode, Don't print out any warnings about file read errors.
-i -- Output numeric values as their integer only.
-I -- Output numeric values as their decimal portion only.
-a -- Don't treat '-' characters as negative signs.
Think of -a as 'absolute value'.
-d -- Don't treat the '.' character as a decimal point.
-N -- Don't process complete numbers, process each numeric character
individually.
-C -- Commify numbers in output. Instead of printing out '1000000', it
should print out '1,000,000'.
-b <n>
-- Use the following base <n> when dealing with numbers.
-- -- Don't process any more arguments as options.
3. All num utils should provide user documentation in the form of man pages.
- While the programs are written in Perl I plan on using POD information in
the programs themselves and then that documentation can be exported to man
pages, info docs or HTML format.
4. All num utils should provide summarized help with the -h option similar to
the following:
----------------------------------------------
numutil: process numbers in such a way......
----------------------------------------------
Usage: numutil [options] [file args]
STDIN | numutil [options]
numutil [options] < file
Options:
-h Help: You're looking at it.
-V Increase verbosity.
-d Don't treat the '.' character as a decimal point.
5. All numeric utilities should abstract their functionality as much as
possible so that a numeric processing module can be written to make things
consistent among all numeric utility programs.
6. The num-utils set of utilities should be available in the following forms
for download:
o tar format, both gz and bz2
o rpm package
o Gentoo ebuild tree
o deb package
7. The Makefile should be setup to be able to create the different package
formats listed above as well as the man pages from the perldoc information
in each utility's source code.
8. Numeric expressions (regular expressions, plus operations for numbers)
.. -- range operator
i -- increment operator
f -- factor operator
m -- multiple operator
, -- expression separation separator
[] -- character grouping
+ -- quantifier (1 or more)
* -- quantifier (0 or more)
? -- Match the preceding character 0 or 1 times.
{} -- for matching specific number of times.
-- Some goals sent in by [email protected] (thanks) --
- Support for locale changes between . and , usage. In other parts of
the world they use a period in the place of a decimal point
(ie. 1.000.000,50 means one million and 5 tenths).
- Some of the utilities should be able to recurse through sub-directories,
where appropriate.
- Support for output and input of numbers in scientific notation.
(ie. 1.3e-7 for 0.0000013)
- A utility for sorting numbers in mixed notations. For instance, it
is hard to sort numbers that are in scientific notation or commified
along side ones that aren't. Maybe this is a call for a numsort utility.
-= Individual program goals =-
-- numsum --
This program adds up all numbers encountered. In it's basic operation, it
will simply add up all the numbers that it encounters. The numbers can be one
per line, multiple per line separated by space or multiple numbers separated
by anything.
Eventually, it would be nice if numsum could do things like sum up individual
rows or columns of input separately. There might be other types of summing
functions that would be useful when dealing with textual input.
Examples
$ cat numbers
1
2
3
4
$ cat numbers | numsum
10
$ numsum numbers
10
Advanced Examples
$ cat columns
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
$ cat columns | numsum -c
15 40 65 90 115
$ cat columns | numsum -r (add up the rows)
55
60
65
70
75
$
Usage options (in addition to the standard options)
-a -- Add all numbers in the file, not just the first ones found on each
line.
X -c -- Treat each line as a set of columns separated by white space, or a
string if the -s option is used. Sum up the values in each column
and print out the result of each column separated by the separation
character. This is shown above in the advanced examples section.
X -s <string> [for columns]
-- Use <string> as the separator between each column. This is allowed
to be more than one character and possibly even a number.
X -r -- Treat each line (by default) as a row of numbers to sum up. The
results of the sums of each row will be printed on separate lines.
-s <string> [for rows]
-- When used with the -r flag, this will specify the separator for
rows. By default it is the new line character. It could be a
character, set of characters or even a number.
-t <x>
-- When used, it will add up every <x>th number and at the end
print out <x> rows showing the sums of each <x>th number. This
would be useful for example if you had data from each day of the
month and you wanted to sum up the date for each week day.
So you might do something like this
$ cat monthly-data | numsum -t 7
Options that might be included eventually.
X -x <n> -- Where <n> is some number. This would be a shortcut for adding up
all the numbers in the <n>th column of the input. By default, the
columns would be determined by white space, but could also be
determined by the -s flag. This must be used with the -c or -r flag.
So you can do something like:
$ numsum -c -x 10 access_log
To get the total bytes transferred in an access_log.
Maybe this could also be able to handle comma separated values, so
1,5,10 would sum up the 1st, 5th and 10th columns.
-- numgrep --
This program is the numeric equivalent of the unix grep utility. numgrep
will search textual input for numbers matching the expression specified from
the command line. The main power of numgrep is in being able to search for
ranges of numbers. Such as searching for all numbers between 1 and 100.
Normal unix grep and regular expressions would not allow you to do this simple
task, but with numgrep's numeric matching expressions it is possible to match
numbers in ways not previously possible.
A few examples of numgrep's usage:
X o Search for all numbers between 1 and 100 in the file data.txt.
numgrep /1..100/ data.txt
X o search for numbers between 1 and 37,even numbers between 50 and 58 as
well as the numbers 79, 86 and 94.
numgrep /1..37,5[2468],79,86,94/ data.txt
X o search for numbers from -10 to 10.
numgrep /-10..10/ data.txt
X o search for numbers that are multiples of 7
numgrep /m7/ data.txt
o search for numbers that are multiples of 7, 12 and 22
numgrep /m7m12m22/ data.txt
o search for numbers that are factors of 1024 and multiples of 12 or numbers
that are factors of 2333 and multiples of 9
numgrep /f1024m12,f2333m9/ data.txt
o search for numbers that are in the set 1, 4, 7, 10, 13 and 16
numgrep /1..16i3/ data.txt
Usage options (in addition to the standard options)
-l -- Instead of keeping the numbers in their textual context, print
them out one number per line.
-R -- Inverse the sense of matching. To match non-matching numbers
-r -- Recurse sub-directories.
-p <file>
-- Obtain the patterns from file <file>
-f -- Suppress normal output and just print the names of the files
that contain a match one per line.
-F -- Suppress normal output and print the names of the files for which
no output would have been printed.
-n -- Print the line number in front of each line that contains a match.
-c -- Don't save non-numeric context. This will cause numgrep to dump all
non-matching/non-numeric values around the numbers that match.
Eventual options
-A [n] -- Print [n] lines of context out after the matching line.
The default is 2.
-B [n] -- Print [n] lines of context out before the matching line.
The default is 2.
Feature submitted by [email protected]:
- Intervals: I may be biased due to my work, but usually when you
determine a quantity with only finite precision, people write
something like 1.234 +/- 0.056 or 1.234(56) (the last form is more
common, at least in physics journals; the interpretation is that the
digits in brackets are taken to line up with the last digit shown,
left-padded with zeroes and a decimal point in the correct place). The
natural representation of both of these is as intervals, i.e. as sets
of two numbers representing lower and upper bounds. Thus, the given
number should be turned into something like 1.178..1.290, and there
should be a conversion function that turns it back into either of the
other two.
-- average --
This program finds the average of all numbers encountered. By default it will
find the mean average of all the numbers. Meaning (;-) that it will add up all
the values and divide that sum by the number of values encountered. It should
eventually offer more sophisticated calculations like finding the median and
mode values.
Examples of usage:
$ cat numbers
4
8
9
20
99
$ average numbers
28
Usage options:
X -m -- Print out the mode value of all the numbers entered. The mode is
the most frequently occurring value in the set.
X -M -- Print out the median of the set of numbers entered. The median is
the middle value all all numbers encountered. So if the numbers
88, 12, 2, 1, 9, 100 and 1000 are encountered, the median of that
set is 12. Illustrated:
1 2 9 12 88 100 1000
^^
X -l -- Use the lower number of the median on even counted sets.
-a -- average all numbers in the file, not just the first ones found on
each line.
-c -- Treat each line as a set of columns separated by white space, or a
string if the -s option is used. Average the values in each column
and print out the result of each column separated by the separation
character. This is shown above in the advanced examples section.
-s <string> [for columns]
-- Use <string> as the separator between each column. This is allowed
to be more than one character and possibly even a number.
-r -- Treat each line (by default) as a row of numbers to sum up. The
results of the averages of each row will be printed on separate
lines.
-s <string> [for rows]
-- When used with the -r flag, this will specify the separator for
rows. By default it is the new line character. It could be a
character, set of characters or even a number.
Options that might be included eventually.
-n <n> -- Where <n> is some number. This would be a shortcut for averaging
all the numbers in the <n>th column of the input. By default, the
columns would be determined by white space, but could also be
determined by the -s flag. This must be used with the -c or -r flags.
Usage options (in addition to the standard options)
-- normalize --
This program will distribute a group of numbers between 0 and 1 by default according
to their initial value. You can change the range using the -R option.
Usage options:
X -R <range> -- This is for specifying a range to normalize for instead of 0..1
-- round --
This utility will round each number encountered up or down depending on it's
decimal value or it's relation to a number. It will probably also deal with
finding the floor or ceiling values of numbers. It should also be able to
round to factors of certain numbers.
Options:
X -n <n> -- Round to the nearest factor of <n>. Instead of just rounding all
decimal numbers, you can also round to a factor of any number. So
if you set <n> to 1000 and you encounter the number 6777, it will
round that number to 7000. If you set <n> to 3 and encounter the
number 7, it will round it to 6.
X -c -- Find the ceiling of each number encountered. Round up.
X -f -- Find the floor of each number. Round down.
-- range --
Print out a range of numbers for use in for loops and such.
o Do zero or character padding with the -p option. So 1 becomes 001 if the
range has an upper limit of 3 digits.
o Accept ranges in the following formats:
n1..n2 (ex. 1..100 ; All integers from 1 to 100)
n1..n2,n3..n4 (ex. 1..10,50..100 ; All integers from 1 to 10 and
from 50 to 100)
n1..n2i2 (ex. 2..20i2 ; All even numbers from 2 to 20)
(ex. 1..19i2 ; All odd numbers from 1 to 19)
(ex. 3..21i3 ; 3, 6, 9, 12, 15, 18, 21)
n1.d..n2.di0.1
(ex. 1.1..2.0i0.1 ; 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8,
1.9, 2.0)
Or any combination of the above.
Usage options (in addition to the standard options)
-p <prefix> -- Put the following prefix before each number.
-s <suffix> -- Put the following suffix after each number.
X -n <separator> -- Use <separator> as a character to separate each number.
By default, a space is used. Use the sequence \n to specify
a newline.
X -N -- Shortcut for using a newline separator.
X -e <set>
-- Exclude the numbers in <set> from the output. This is so that if
you want to do a complex range without including certain numbers.
<set> is a list of numbers separated by a ','.
-- random --
Print out a random number or numbers. This program will take a range of
numbers much like the range command does. Except that instead of printing out
all the numbers in that range, it will print out a random number from that
range.
Ex:
$ random 1..100
37
$ random 1.0..100.0
56.9
$ random 1.000..100.000
42.397
$ random 2..100i2 [only pick among the even integers from 2 to 100]
68
Usage options (in addition to the standard options)
-n <n> -- Generate <n> random numbers separated by a space.
-s <c> -- Use <c> as the separation character.
-N -- Shortcut for using a newline separation character.
Feature submitted by [email protected]:
- You should specify the distribution you are generating;
probably, this is just an equal distribution. What would come in handy
quite often is to have a way of producing pseudo-random numbers that
realize a chosen distribution with a given set of parameters. But that
should probably be a different program than random, e.g. distribution,
as the name is already taken. So, e.g., ``distribution --Gaussian 3.0
4.0 100'' should produce 100 numbers distributed according to a Gauss
distribution with mean 3 and variance 4.
-- bound --
This program is for finding the maximum, minimum and surrounding numbers in a
set. You can use different options to specify how many numbers are returned as
to show top maximum and minimum lists. Also you can show the numbers that
occur around the context of the boundary numbers.
Ex:
$ cat numbers
1 10 8 100 15 1000
2 9 5 15 27 12 136
$ bound -u numbers (upper)
1000
$ bound -l numbers (lower)
1
Top 4 upper numbers sorted
$ bound -u -n 4 numbers
1000
136
100
27
Bottom 4 lower numbers sorted)
$ bound -l -n 4 numbers
1
2
5
8
Context around lowest number. Show 2 numbers of context.
$ bound -l -c 2 numbers
1 10 8
Find the 5 closet numbers to the number 75. They will print out
in closeness order.
$ bound -f 75 -n 6
100
27
15
15
136
10
Usage options (in addition to the standard options)
-u -- Return the upper bound number in the set (the maximum number)
-l -- Return the lower bound number in the set (the minimum number)
-n <n> -- Return the top or bottom <n> numbers or <n> numbers around
a number.
-c <n> -- In addition to the number returned, show <n> numbers on both sides
of the number.
-f <n> -- Find the number <n> or the closest number to <n>.
-- numprocess -- (maybe a name like mutate, alter, or process would be better)
This program mutates numbers as it encounters them. It should do the
following operations:
o Add/Subtract a value to a number
o Multiply/Divide a number by a factor
o Raise a number to a power (includes the concept of roots)
o Round a number up or down
o Do any mathematical function to a number.
o Quantify a number. So 1024 bytes is 1KB and so on.
Ex:
Add 1 to each number
$ numprocess /+1/
Multiply each number by 8 and then divide by 5
$ numprocess /*8,%5/
Convert from Fahrenheit to Celsius
$ numprocess /-32,*5,%9/
Usage options (in addition to the standard options)
-- interval --
This program calculates and displays the interval between one number
and the next.
I'd like to add a feature to this program so that you can specify a boundary
number that if the interval goes negative, it subtracts the previous number from the boundary
and adds the new number to that to get the interval. This is useful when you are calculating
intervals between numbers that have an upper bound, like 32 bit integers. I myself will use
it for bandwidth counters.
=== Roadmap to 1.0 ===
- Fix as many bugs as I can along the way.
- Feature freeze at 0.8
-- 0.6 --
- Aim for June 20th/21st,2008 release
- Fix reported bugs
- Fix spelling mistakes in documentation
- Rename all programs to have 'num' prefix.
- Create numformat utility that takes formating goals away from numprocess and puts them into its own util.
- Implment 25% of user feature requests
- Investigate feature requests from users and try to implement the ones that are most possible.
- Setup CVS/git or subversion for num-utils
- Make official dpkg for Debian and Ubuntu, and ebuild for Gentoo
- Create independent wiki or portal for num-utils
-- 0.7 --
- Aim for October 15th,2008 release
- Fix reported bugs
- Implment 25% of user feature requests
- Add support for arbitrary number sizes through bignum
- Add processing for different numeric bases
- Add scientific notation processing and output
-- 0.8 --
- Aim for a January 31st,2009 release
- Implment 25% of user feature requests
- Add support for regex in numgrep so that you can use numeric expressions in addition to regex.
-- 0.9 --
- Aim for a April 15th,2009 release
- Feature freeze
- Fix all bugs
- Rigurous testing
-- 1.0 --
- June 18th, 2009 release
- Fix any remaining bugs