Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vc cc improvements #16

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Mar 10, 2015

fix a syntax error with VC 6, make a "release" optimized VC 6 option, VC 2005 and newer warnings fixes, a slight perf improvement on VC 2005 and newer as compared to the previous situation, but VC 2003 and earlier builds will always be faster than a VC 2005 or newer build of dmake. The vast majority of stdio is not available in _nolock versions thanks to MS so only fputc got optimized to a nolock version, its still something.

bulk88 added 7 commits March 10, 2015 02:45
unix/dcache:
- make extern.h be first for Win32 warnings disable

unix/runargv.c:
- extern.h includes signal.h, specifying signal.h before extern.h causes
  the Win32 specific defines which disable warnings to be skipped, the
  reason for signal.h before extern.h is before git history and is
  therefore unknown

win95/switchar.c:
- all 3 includes are in extern.h
In MS and Mingw's headers, there is no define from fputc to _fputc_nolock.
This has to be done manually.

mingw 4.6.3 has _fputc_nolock in its headers so use it.
VC6 doesn't understand ULL suffix (only UI64), >= VC 2003 understand ULL,
to make things simple, just drop the suffix. This was a syntax error on
VC6.

Previously a VC6 build was built with -Od, this is inefficient, and not
needed since the vast majority of dmake users will never C debug it. make
mk.bat similar to mk70.bat. I dont think anyone wants the COFF and CV
symbol formats, they bloat dmake.exe by 10s of KBs. My VC 2003 C debugger
understood the VC 6 PDB without a problem but not the legacy symbol
formats.

64 bit seems to work, no fixes or anything special needs to be done, just
use the right vcvars*.bat file to select the right cl.exe
the Win32 implementation of readdir (FindNextFile) gives the mtime unlike
on unix, so no need to chdir, so a relative stat() call later works, since
no stat call is done. This saves 2 chdirs per directory enumerated and
cached and there are about 3-5 dirs for a Perl ExtUtils::MakeMaker build
and 5-8 dirs for Perl core build enumerated.On VC 2003, each chdir is
22us wall time with children according to my C profiler. On VC >= 2005
chdir is more expensive since a number of locks are acquired since MS
dropped the single threaded static libc. The vast majority of chdirs come
from finished_child, not CacheStat, for example, in parallel building
perl interp, for me 317 calls chdirs dropped to 302 after this patch.
In Def_recipe dont assign NULL to a calloc-ed struct and don't check for
rcp being NULL, as it was already checked for NULL in Add_recipe_to_list
which is the only caller of Def_recipe.

In Add_recipe_to_list, remove the NULL check, all callers of
Add_recipe_to_list will never pass a NULL, the callers are Parse and
Parse_rule_def.

In DmStrDup, don't compute strlen twice, once explicitly, the other time
inside strcpy. Anyways, memcpy is more deterministic to the CPU as to end
of the copying loop, than strcpy.

In Parse_macro, reduce the liveness of var cp, the newly created var sp
will replace var cp's C stack or non-vol reg slot instead of both sp and
cp being saved around the DmStrDup call.
Previously, if none of the chars are found, another loop (strlen())
through the string must be done. With strcspn, another loop isn't done,
just 1 add instruction.
see explanation in commit
"use strpbrk instead of textbook inefficient roll your own version"
@bulk88
Copy link
Contributor Author

bulk88 commented Mar 11, 2015

added more comits, here is a before (at commit "eliminate chdir from CacheStat on Win32") and
after (at commit "DmStrSpn: use strspn instead of self rolling outselves"). Most notable, 3537077-3343092=193985 calls to strchr were removed because of the DmStrSpn change, but in the after DmStrSpn and strspn only executed, 186193 times, which means 193985-186193=7792 times the original self rolled rolled version called strchr more than once, and did more than 1 pass through the string. There is some time savings in DmStrPbrk, which went from 0.48 us to 0.46 us in "average with children", this is because of the strlen removal, although I have no data from the profiler on how often that old strlen branch executed in DmStrPbrk (strlen is an "intrinsic" and is inlined and is not profilable). DmStrSpn actually stayed the same time wise at 0.42 us. Note this profiler stops inlining of most of dmake's code. In a non-instrumented -O2 dmake DmStrPbrk and DmStrSpn are always inlined, since those null checks on its args are often optimized away by VC.
more strings b4
more strings after

bulk88 added 5 commits March 11, 2015 07:35
Remove_file and the "!" operator are always inlined into the caller into
a test asm op then conditional jump on VC 2003 -O2.
fclose contains a fflush or equivelent call per posix/C. Calling fflush
after every line will mean <=80 byte writes to the disk/FS driver/kernel.
Calling fflush after every write of a line console is understandable, for
a disk it isn't. By removing/merging the fflush with fclose, in most cases
the entire temp file will be buffered in stdio in user mode and sent to the
kernel only when the file is closed. Move the error check to fclose. On
VS >= 2005, this also means that the MT lock will be aquired once for the
fclose, not once (or more) for fflush then again for fclose.

My test workload for dmake (a perl build) makes 3 temp files per
"dmake all". The number of fflush calls was reduced from 140 to 137 with
this patch. The number of WriteFile calls stayed the same at 145, probably
since the temp files are 1 line long. The majority of fflush calls are
writing to console in my workload.
I didn't realize dmstr2 existed when I created STRINGIFY and DM_StGiFy, so
remove my API names in favor dmake's older (and more authoritative) names.
commit "Stop tracking autoconf-generated files." discontinued distributing
configure
the winnt bat file builds are generally broken, but sychronize them to
avoid more rotting
@bulk88 bulk88 force-pushed the vc_cc_improvements branch from 72edc11 to 68d9de8 Compare March 18, 2015 03:55
done with cygwin autoconf, no gcc installed, vcvars32.bat env vars set from
VC 6, macros that were undefed were not included in this commit, also
autoconf undefed HAVE_UTIME_NULL, so leave the former hand set
HAVE_UTIME_NULL on.
@bulk88 bulk88 force-pushed the vc_cc_improvements branch from 68d9de8 to f6abcc2 Compare March 18, 2015 04:41
bulk88 added 2 commits March 18, 2015 00:49
The error I (bulk88) saw was

checking the operating system... configure: error: MINGW32_NT-5.1 operating syst
em is not suitable to build dmake!
*returned to prompt*
In commit
"dont call fflush for each line written when generating a tmp file" I
stopped flushing the fd of a temp file after every line, but that
introduced a new bug, since Close_temp was not always closing a fd (IDK if
that is intentional or not) even though it looked like it did. Change
Close_temp to fflush the fd if it wasn't fclose-d. The bug presented itself
as a temp file being empty when GCC read it, but examining the temp showed
it was filled correctly because it was filled as dmake was exiting after
processing the GCC non-zero exit.
@mohawk2 mohawk2 force-pushed the master branch 10 times, most recently from a5fd843 to 920ca5e Compare April 13, 2019 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant