Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better CI: modular build system #62

Closed
GreyCat opened this issue Dec 9, 2016 · 27 comments
Closed

Better CI: modular build system #62

GreyCat opened this issue Dec 9, 2016 · 27 comments
Assignees
Milestone

Comments

@GreyCat
Copy link
Member

GreyCat commented Dec 9, 2016

I have yet another huge, but pretty helpful proposal. The proposal is to get ourselves a better CI.

Problems with current CI

There is one main huge problem: it's monolithic:

  1. Check out everything
  2. Build compiler
  3. Build test .ksy files => target languages code
  4. For every target language:
    4.1. (If it's compilable language) Build:
    4.1.1. Compiled target languages code from tests
    4.1.2. KS runtime
    4.1.3. Actual test specs (i.e. stuff with "assert equals") + test runner (sometimes)
    4.2. Run tests (doing assertions), generate some sort of report
  5. Aggregate all the reports, generate CI report page
  6. Upload and update CI report page on our website

This leads us to:

  • Steadily increasing build times. All stuff builds sequentially, every new language means more time to build. Originally, we had 3.5 minutes per CI iteration, now we're steadily approaching 8-9 minutes per build.
  • We're actually abusing Travis infrastructure a lot for that one. Generally, in Travis, a project is supposed to use just one programming language (they supply heavily modified build environments with all the stuff pre-installed), thus we do lots of hacks and intricate installs (which also take precious time) to do such stuff.
  • We're unable to do multiple checks properly. For example:
    • Testing C++ code is a huge deal actually. To do it properly, we need to run at least against 3-4 major compilers / OS combinations (i.e. gcc / clang / MSVC + Linux / Windows / OS X). This is possible to do with Travis's environment matrix + AppVeyor's Windows builds, but it really needs to be modular for that.
    • Testing ksc properly: we're currently not testing that Linux vs Windows builds of compiler produce the same results, not to mention that we don't actually test JVM vs JS builds of compiler. We don't test OS X at all. That, again, would need some interaction between Linux (Travis) vs Windows (AppVeyor) builds.
  • A change in tests requires us to rebuild everything starting from the compiler — although in practice, if would be much faster to just get pre-built compiler from previous iteration and re-use it.

etc, etc. So, bottom line: monolithic = bad, modular = good.

@GreyCat GreyCat self-assigned this Dec 9, 2016
@KOLANICH
Copy link

KOLANICH commented Dec 9, 2016

#63

@GreyCat
Copy link
Member Author

GreyCat commented Dec 10, 2016

@KOLANICH Sorry, I don't understand you. Right now I've just described current state of things, there are no "build packages" right now. Besides, GitHub "Releases" stuff is not based upon uploads anyway — they're generated automatically from repo tags and are source-only.

@LogicAndTrick
Copy link
Collaborator

FYI you can attach binaries to a release by editing a tag in the releases page.

@KOLANICH
Copy link

KOLANICH commented Dec 10, 2016

@KOLANICH Sorry, I don't understand you. Right now I've just described currente state of things, there are no "build packages" right now.

I was a bit wrong. I have created an another issue #63, but it is closely related to this one, because you can build the modules separately (with every module having own travis build script), and then fetch the results from Releases pages and reuse them. There are some problems with module dependencies, but it can be solved by putting dependencies description into a separate repo.
1 travis build script fetches dependencies repo
2 travis build script builds and tests its targets
3 if there were no errors on the previous step it builds the packages and uploads them
4 It makes a dummy push into every repo dependent from the built repo to make them to be rebuilt and retested with Travis.

FYI you can attach binaries to a release by editing a tag in the releases page.

I propose to do it automatically on every successful Travis build.

@GreyCat
Copy link
Member Author

GreyCat commented Dec 10, 2016

because you can build the modules separately (with every module having own travis build script), and then fetch the results from Releases pages and reuse them.

Sorry, I don't quite understand almost everything you're mentioning in this paragraph. What are the "modules" and "dependencies" you're talking about? Why is that a problem in the first place?

FYI you can attach binaries to a release by editing a tag in the releases page.
I propose to do it automatically on every successful Travis build.

This is pretty much pointless. Travis does mostly unstable builds and releases are for stable (tagged) builds. While it is possible to attach only "tagged" build files to releases, it is probably pointless anyway, as lots of release artifacts (.deb repo files, Ruby .gem, Python packages, etc), must be published in designated places, and we already do all that.

@KOLANICH
Copy link

KOLANICH commented Dec 10, 2016

Travis does mostly unstable builds and releases are for stable

GH Releases are for whatever the repo owner wants.

Why is that a problem in the first place?

The problem stated in the first post in this issue. The solution is to divide ks compiler from ks runtimes and put them into separate repos and biuld and test them separately.

What are the "modules" and "dependencies" you're talking about?

So the module is a separate git repo with a standalone part of ks, dependencies is what depends on what. Runtime library depends on compiler - if the compiler changes its interface, the runtime library also is needed it be changed. So every compiler change requires to run tests for every runtime library using updated version of the compiler. We don't want to store this data in compiler repo so we should create a separate repo for dependecy description. When a runtime updates you only need to recheck that runtime. In this case you can take prebuilt and tested compiler binary and use runtime with it without retesting compiler.

@ghost
Copy link

ghost commented Feb 20, 2017

I have found an interesting example - .travis.yml for the ANTLR project. As Kaitai Struct the ANTLR project has runtime libraries for different languages (C#, C++, Go, Java, JavaScript, Python 2 and 3, Swift). ANTLR tool generates parsers and lexers, and then the generated parsers and lexers use the runtime libraries (the same principle as in Kaitai Struct). The .travis.yml calls scripts from the .travis directory. May be it may help somehow.

@GreyCat
Copy link
Member Author

GreyCat commented Apr 7, 2017

The time has (kind of) come: given that we'll need pretty sophisticated system to test writes (for #27), I've decided to take a few first steps.

Initially I had this idea of the workflow:

CI flow graph

I've started from running the actual tests:

  • Our main process now published compiled target language tests in a distinct repo for that purposes
  • As soon as any commit lands there, Travis launches several jobs in parallel according to its .travis.yml. It looks like this.
  • So far, I've implemented support for 4 languages: C++, Java, Python, Ruby.

To add new languages, the following is needed:

  • Add ./prepare-$TARGET script that will, at very least, download language-specific KS runtime (probably to runtime/$TARGET), and, probably, install some dependencies
  • Add one or more of relevant environments to matrix section of .travis.yml

The output is not saved anywhere so far. The next step is, obviously, publishing test artifacts, i.e. JUnit XML-style or whatever reports they can provide.

Even this PoC Travis run uncovered a few problems with our current build:

  • Our C++ tests are not compatible with (older, probably) clang, due to usage of 0.25d for double literal
  • We broke Ruby 1.8 compatibility with way too liberal debug mode usage of 1.9+ features
  • We broke Java JDK6 compatibility when we've switched to use java.nio for input

@GreyCat
Copy link
Member Author

GreyCat commented Apr 7, 2017

Tried to get Appveyor to build C++ using MSVC compiler: https://ci.appveyor.com/project/GreyCat/ci-targets

Wow, how naive I am. Right now it fails to run due to Boost (and Boost.Test) + zlib being unavailable. Is there a simple way to install boost / zlib on Windows?

@LogicAndTrick Probably running C# on several .NET platform on Windows would be possible now — wanna take a look? I can add you to Appveyor account.

@LogicAndTrick
Copy link
Collaborator

Sure, I'll see if I can get something running when I have time.

@GreyCat
Copy link
Member Author

GreyCat commented Apr 8, 2017

@LogicAndTrick I've tried to add you by e-mail. Hopefully you'll receive some invitation or something?..

@LogicAndTrick
Copy link
Collaborator

Looks like a few versions of Boost are installed in the AppVeyor image: https://www.appveyor.com/docs/build-environment/#boost
You might need to set up an environment variable to point to one of those paths. I don't know about zlib though.

@koczkatamas
Copy link
Member

Judging from appveyor config files on the internet ( eg. https://github.com/libgd/libgd/blob/master/appveyor.yml ) we may have to install zlib manually.

(It's a bit weird though a lot of projects are using zlib and it does not change that much, so you don't have to keep N versions. Maybe it worths to ask the appveyor guys to put it into the base image?)

@GreyCat
Copy link
Member Author

GreyCat commented Apr 8, 2017

I guess zlib is not that big of an issue (it's very small anyway), and, besides, we might want to test it on Windows with zlib disabled.

@LogicAndTrick
Copy link
Collaborator

Alright I've been experimenting with this and my scripts are not very good but they kind of work. Is this enough to start you off or do you need more info? I'm not really confident with this stuff so I'm probably doing some things wrong:

  • appveyor.yml file - This seems like a better way to manage the AV script, similar to Travis
    • The environment matrix seems very similar to Travis, it creates an isolated job for each language/platform
    • I was too lazy to mess with the tests repo so it's missing a cd tests at the end of the install script
  • run-csharp-dotnet-framework - uses Microsoft's msbuild and csc tools
    • Will need to be moved into the tests repo once you're happy with it
    • Seems that MSYS running in AV doesn't have the msbuild executable on the path, so it's hard-coded to the install location. I think if it was a powershell or cmd script, it would know about those variables. Could always add the directory onto the path, I'm not confident enough with bash to know how to do that properly...
  • run-csharp-mono - uses Mono's xbuild and mcs tools
    • Same problem with the mono tools referred to by path. Could be a problem if Mono is updated on AV, but that hasn't happened in 2 years so it doesn't seem to change very often
  • Example build results
  • Test results are generated in the same place, but in platform subfolders (test_out/csharp_mono/TestResult.xml and test_out/csharp_dotnetframework/TestResult.xml)
    • I assume you will want to add something to publish these results (AppVeyor artifact maybe?)
  • AppVeyor has built-in support for NUnit but I haven't tried it. Not sure if you can get the AV test report and get the xml file without having to run the tests twice (once using AV and once manually to get xml)
  • C++ will run too (but doesn't work right now because of the missing cd tests in the install script)
  • I can't seem to find a standalone csc type tool with dotnet core, so I didn't include it. Surely there's a way to do partial compilation, I will need to investigate more.
  • Currently the mono script will not work on linux because of the hard-coded path names. Should be fixed if the mono tools are put on the path
  • Mono script reports an error on the truncate command but I think this is because mcs reports both absolute and relative paths - the truncate works on the absolute path and then the error comes from the relative path. (Not sure if this is a big deal or not)

@GreyCat
Copy link
Member Author

GreyCat commented Apr 9, 2017

@LogicAndTrick Thanks for all that investigation, it will certainly help!

appveyor.yml file - This seems like a better way to manage the AV script, similar to Travis

run-csharp-dotnet-framework - uses Microsoft's msbuild and csc tools
run-csharp-mono - uses Mono's xbuild and mcs tools

Cool :) The only thing probably worth moving to prepare-* scripts is nuget restore ... stuff, as it is technically an initialization, not test run.

My idea is that run-* scripts should be perfectly usable on normal developers' boxes, not only on CI servers. If it needs any per-installation configuration, we can always do it in something like a config file. "Normal" (i.e. usable by a developer) installation will use one config and CI run will just use another one (for example, to reference specific paths in AV images).

C++ will run too (but doesn't work right now because of the missing cd tests in the install script)

I actually doubt that. Your version doesn't differ much from what I've launched, and it fails, being unable to find Boost and Boost.Test in CMake setup.

I assume you will want to add something to publish these results (AppVeyor artifact maybe?)

Yeah, it's the common next step for all CIs (both Travis and AppVeyor). I was thinking of two obvious choices:

  • publish it into yet another GitHub repo
  • publish it somewhere at BinTray

Then yet another Travis job should trigger, pick up these artifacts and aggregate them to update CI page. Both these choices are actually pretty messy :( Registering yet another dozen of repositories just for the sake of storing test results feels lot like abuse of GitHub (and it's tons of work too). BinTray uses an extremely complex API, both to publish and retrieve files, which is a major turnoff for me.

Any other ideas?

@LogicAndTrick
Copy link
Collaborator

Could you use one repo with a branch for each target? A little messy, but it means you don't have to have a separate repo for each language.
As for the reference paths, maybe some environment variables? e.g. MONO_INSTALL_DIRECTORY or something?

@GreyCat
Copy link
Member Author

GreyCat commented Apr 10, 2017

I've tried to do Bintray upload, and, after some experimenting, I'm tempted to say that it's mostly useless for these purposes: https://travis-ci.org/kaitai-io/ci_targets/jobs/220439314

  • It's very slow when uploading many files. Every file upload literally takes several seconds (and probably takes heavy toll on API usage). Uploading multi-file testing artifacts, like Java's, for example, takes like forever.
  • It is very inconvenient to download them all back for further usage. Again, it's not really good at handling multiple files, etc.
  • It has problems with files with spaces and that kind of minor stuff.

Could you use one repo with a branch for each target?

Yeah, I think that should work! I'll try it next.

As for the reference paths, maybe some environment variables?

Yeah, exactly :) Basically, that's what these config files are doing.

@LogicAndTrick
Copy link
Collaborator

I guess in this case these are variables that could change depending on the user's setup. Is that still okay to put in the config file, and expect the user the modify it if they need to? Right now the config variables are well-known (relative) paths, so they don't ever need to be changed.

I was thinking something like this (pseudocode):

# User modifies these if they want
MONO_INSTALL_LOCATION=/c/Program Files (x86)/Mono/bin/;
MSBUILD_INSTALL_LOCATION=/c/Program Files (x86)/MSBuild/14.0/Bin/;

PATH = $PATH + $MONO_INSTALL_LOCATION + $MSBUILD_INSTALL_LOCATION
# Scripts reference xbuild/msbuild/etc from the path
# If they are on the path already they'll "just work" even if the install locations are different from above

It feels a bit flimsy, but is there another way to do it? I don't think AppVeyor environment variables (Windows) will flow through to the MSYS environment so I'm not sure if there's a way to do it via the AV config.

@GreyCat
Copy link
Member Author

GreyCat commented Jan 28, 2018

I've got to do another approach on this issue, and I found out that, actually, there's a whole world of different CIs out there which support modular workflows/pipelines.

We have about a dozen or so of repositories, and they all should be built, tested and deployed in a complex manner. This implies an intersting difference: it would be highly beneficial for us to have a CI configuration not stored in a repository (along the with code), akin to .travis.yml file, but instead set up externally.

When orchestrating a complex flow/pipeline, there are a couple of key questions:

  • Is it possible to do flow with multiple repositories?
  • Is it possible to do complex flow with steps like a -> {b c d} -> e (usually CI guys call it "fanning in" / "fanning out")?
  • Is it possible to run flow partially, triggering relevant parts of it from a commit into one of the affected repositories (i.e. fix C# runtime → rerun C# tests only → update CI summary)
  • How a flow step passes the data to other steps
    • build artifacts
    • and some signal to trigger further steps
  • Is it possible to host artifacts inside the CI system, or should we publish it to some external service (i.e. GitHub, npm, etc)
  • Would it be possible to process pull requests / do test builds somehow? Can we isolate deployment secrets enough?

I'm currently checking out:

Self-hosted:

Things I've checked out and these probably not satisfy the criterias outlined above:

  • CircleCI — looks promising, but seems to be centered on "one repository" model :(
  • CodeFresh — one config per repo, very basic flows
  • SemaphoreCI — one config per repo, very basic flows (parallellism of tests)
  • GitLab CI — again, repo-centered, and offers basic flow features
  • Buildkite — repo-centered, novel "bring your own worker node" idea, but probably not the best fit for us

Ideally, I'd still like to stick to hosted infrastructure that someone else would support. But, if all else fails, I'm probably ok with hosting our own CI at some sort of generic server(s).

@arekbulski
Copy link
Member

There is a drawback, you need to build compiler and example schemas on each CI serve instead of once. So each build gets shorter but addup to more in sum total.

@GreyCat
Copy link
Member Author

GreyCat commented Jan 28, 2018

Um, you've commented on some sort of earliers plans?

@GreyCat GreyCat added this to the v0.9 milestone Feb 2, 2018
@GreyCat
Copy link
Member Author

GreyCat commented Nov 8, 2018

Ok, returning back to this, this time trying to complete it.

What's done already

  • First step(s) that build the compiler and use that do compile ksy → target languages
  • Second step is ci_targets repo, which gets new set of sources in target languages and kicks off many different languages compilation and validation in parallel, which, on completion, report their results to ci_artifacts.
  • Third step is ci_artifacts repo, which has many different branches (so they can be updated in parallel, independently of each other).

What's left to be done

Anyone can lend a help with HTML+JS (Vue, JQuery, whatever you prefer) here?

@GreyCat
Copy link
Member Author

GreyCat commented Dec 26, 2018

Ok, a very rough new CI page, aggregating everything is implemented as http://kaitai.io/ci/ci.html — and we already support quite a few new & old combinations. Please take a look and tell me what you think of it.

Obviously, missing stuff is:

  • proper success percentage calculation
  • ability to filter columns, as it is quite obvious that there are more columns than a typical screen would allow us to fit — for now, one can open a JS Console (F12) and enter something like that to select columns:
app.__vue__.gridColumns = ["name", "cpp_stl_11/gcc4.8_linux", "cpp_stl_11/clang7.3_osx", "cpp_stl_11/clang3.5_linux", "ruby/2.3"]

If anyone proficient in (or willing to learn) Vue wants to help, I'd be most grateful ;)

@GreyCat
Copy link
Member Author

GreyCat commented Dec 28, 2018

Language convesion to new CI checklist for me to track:

  • construct2
  • construct3
  • csharp
  • cpp_stl
  • go
  • java
  • javascript
  • lua
  • perl
  • php
  • python2
  • python3
  • ruby
  • rust

@GreyCat
Copy link
Member Author

GreyCat commented Mar 10, 2019

csharp/mono5.18.0 and lua/5.3 was ported to new CI system this morning. This also paves the way to do more well-round testing for C# with other systems (i.e. on Windows, .NET core, .NET standard, regular .NET, etc).

Unfortunately, we'll be most likely dropping Construct support eventually, as the project itself seems to be abandoned :(

Need to double-check what's going on with go, and, phew, this looks like it's almost done.

@GreyCat
Copy link
Member Author

GreyCat commented Mar 10, 2019

Ok, go has successfully joined the company. Which cosmetically it's still clumsy, I guess we can consider this task done.

@GreyCat GreyCat closed this as completed Mar 10, 2019
generalmimon added a commit to kaitai-io/kaitai_struct_tests that referenced this issue Mar 18, 2024
This drops support for Boost 1.62, but I don't see much value in
supporting this broken version specifically, given that none of the
`cpp_stl` targets use it at the moment. As of March 2024, we're using
these versions:

* Boost 1.54.0 - in `{clang3.4,gcc4.8}-linux-x86_64` targets, since they
  both run on Ubuntu 14.04 (see
  https://github.com/kaitai-io/kaitai_struct_docker_images/blob/ef0ad6e3/src/cpp_stl/gcc4.8-linux-x86_64/Dockerfile#L1),
  which comes with this version of Boost:
  https://launchpad.net/ubuntu/trusty/+source/boost-defaults
* Boost 1.71.0 - in `msvc141-windows-x64`
  (https://ci.appveyor.com/project/kaitai-io/ci-targets/builds/49319128/job/282j67vaxhklm2o4?fullLog=true#L120)
* Boost 1.74.0 - in `{clang11,gcc11}-linux-x86_64`, since they both run
  on Ubuntu 22.04 (see
  https://github.com/kaitai-io/kaitai_struct_docker_images/blob/ef0ad6e3/src/cpp_stl/gcc11-linux-x86_64/Dockerfile#L1),
  which comes with this version of Boost:
  https://launchpad.net/ubuntu/jammy/+source/boost-defaults
* Boost 1.84.0 - in `clang14-macos-x86_64`

On top of that, the documented workaround of the bug (`--log_sink`
broken in Boost v1.62) in the code inadverently changed the log format
to JUnit (`--logger=JUNIT,...`), which is apparently why additional code
had to be added in 5d2125c to support
JUnit test result format in addition to the Boost-specific XML format.

This part is definitely unnecessary - there's no reason why we should
support two log formats, we should choose one. Even in Boost 1.62 the
XML format was definitely available as well, though it would have to be
specified as `--logger=XML,...` instead of `--log_format=XML` (see
https://web.archive.org/web/20221205203459/https://svn.boost.org/trac10/ticket/12507).
It looks like the reason the workaround uses JUnit is that the
`--logger=JUNIT,...` option was copied from the [Stack Overflow
answer](https://stackoverflow.com/a/39999085/487064) mentioned in the
code comment without thinking.

Fun fact: apparently, there hasn't been a single run using Boost 1.62
since December 2018, because there isn't any occurrence of `<testcase`
(typical for JUnit format) in the
https://github.com/kaitai-io/ci_artifacts repo under
`test_out/cpp_stl{_98,_11}/`. You can check this by cloning
https://github.com/kaitai-io/ci_artifacts locally and running this
command:

```
git log --stat -G '<testcase' --all -- test_out/cpp_stl{_98,_11}/
```

This yields no results, meaning that this workaround predates the
migration of C++ to the "new" modular CI at ci.kaitai.io
(kaitai-io/kaitai_struct#62) and has never
been used since then.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants