Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite failures #2

Open
pipcet opened this issue Mar 9, 2017 · 5 comments
Open

Test suite failures #2

pipcet opened this issue Mar 9, 2017 · 5 comments

Comments

@pipcet
Copy link
Owner

pipcet commented Mar 9, 2017

make check output is currently:

# of expected passes		41
# of unexpected failures	4
# of unsupported tests		3

with the following failures:

FAIL: nm --size-sort
FAIL: strip with saving a symbol
FAIL: simple objcopy of executable
FAIL: strip executable with saving a symbol

These need fixing.

@pipcet
Copy link
Owner Author

pipcet commented Mar 9, 2017

Now I've made dejagnu recognize our targets as ELF, the output is:

# of expected passes		135
# of unexpected failures	10
# of unsupported tests		7

with these failures:

FAIL: strip with saving a symbol
FAIL: simple objcopy of executable
FAIL: strip executable with saving a symbol
FAIL: merge notes section (32-bits)
FAIL: unordered .debug_info references to .debug_ranges
FAIL: objdump -WL
FAIL: readelf -r
FAIL: readelf -wi: missing: .*DW_TAG_compile_unit.*
FAIL: readelf --debug-dump=loc
FAIL: readelf -wiaoRlL

@pipcet
Copy link
Owner Author

pipcet commented Mar 11, 2017

Okay, I finally discovered there are separate test suites for binutils, gas, and ld.

The gas testsuite passes now, except for some expected and harmless failures.

The binutils testsuite has a single failure, the "simple objcopy" test at #4.

The ld testsuite has numerous failures, many related to two things:

  • we don't have a .text section. We can't have a .text section either, since that's expected to be both the section where code goes and the section offsets into which describe function entry points. Those are separate sections for us.
  • dynamic libraries need to be turned into WebAssembly modules before the test case can find them.

The current stats are:

		=== ld Summary ===

# of expected passes		506
# of unexpected failures	110
# of expected failures		4
# of unresolved testcases	3
# of unsupported tests		108

@pipcet
Copy link
Owner Author

pipcet commented Mar 11, 2017

ld test failures, with comments:

FAIL: Handle no DWARF information [.text]
FAIL: Run with libdwarf1.so first [broken -feliminate-dwarf2-dups in gcc?]
FAIL: -Bsymbolic-functions [discarding .text section]
FAIL: Build pr20995-2.so [discarding .text sections]
FAIL: pr20995 [discarding .text section]
FAIL: pr20995-2 [discarding .text section]
FAIL: ld-elf/eh-frame-hdr [discarding .text section]
FAIL: ld-elf/eh5 [discarding .text section]
FAIL: ld-elf/empty [no entry point]
FAIL: ld-elf/empty2 [no entry point]
FAIL: ld-elf/endsym [investigate, looks harmless]
FAIL: ld-elf/group8a [no gc-sections]
FAIL: ld-elf/group8b [no gc-sections]
FAIL: ld-elf/group9a [no gc-sections]
FAIL: ld-elf/group9b [no gc-sections]
FAIL: ld-elf/linkonce1 [investigate, looks like a bug]
FAIL: ld-elf/linkonce2 [ditto]
FAIL: ld-elf/merge [looks like overalignment]
FAIL: ld-elf/orphan3 [discarding .foo section]
FAIL: ld-elf/pr12851 [no gc-sections]
FAIL: ld-elf/pr14156a [discarding .init section]
FAIL: ld-elf/pr14156b [discarding .fini section]
FAIL: ld-elf/pr14926 [discarding .text section]
FAIL: --gc-sections on tls variable [no gc-sections]
FAIL: PR ld/13195 [no gc-sections]
FAIL: readelf version information [segfault! investigate!]
FAIL: PR ld/20828 dynamic symbols with section GC (auxiliary shared library) [segfault!]
FAIL: PR ld/20828 dynamic symbols with section GC (plain) [segfault?]
FAIL: PR ld/20828 dynamic symbols with section GC (version script) [segfault?]
FAIL: PR ld/20828 dynamic symbols with section GC (versioned shared library) [segfault!]
FAIL: PR ld/20828 dynamic symbols with section GC (versioned) [segfault?]
FAIL: PR ld/20828 forcibly exported symbol version without section GC [segfault!]
FAIL: PR ld/20828 forcibly exported symbol version with section GC [segfault!]
FAIL: Build rdynamic-1 [-rdynamic unrecognized by gcc]
FAIL: Build dynamic-1 [no gc-sections]
FAIL: Build libpr19073.so [discarding .text sectin]
FAIL: Run with libdata1.so [libdata1.so failed to build]
FAIL: Check --gc-section [no gc-sections]
FAIL: Check --gc-section/-q [no gc-sections]
FAIL: Check --gc-section/-r/-e [no gc-sections]
FAIL: Check --gc-section/-r/-u [no gc-sections]
FAIL: --gc-sections with multiple debug sections for a function section [no gc-sections]
FAIL: --gc-sections with __gxx_personality [no gc-sections]
FAIL: --gc-sections with .text._init [no gc-sections]
FAIL: --gc-sections with --defsym [no gc-sections]
FAIL: --gc-sections with KEEP [no gc-sections]
FAIL: --gc-sections with __start_SECTIONNAME [no gc-sections]
FAIL: LTO 6 [no _etext]
FAIL: PR ld/12760 [bug: .wasm.payload.code doesn't translate to FILE:LINE]
FAIL: PR ld/13229 [investigate]
FAIL: PR ld/13244 [investigate: hidden symbols broken?]
FAIL: LTO 11 [investigate: "main" doesn't pull in liblto-11.a]
FAIL: Run pr20276 [investigate. symbol size changed. overaligned?]
FAIL: Run pr20267a [investigate. symbol size changed.]
FAIL: Run pr20267b [investigate. symbol size changed.]
FAIL: PR ld/12942 (1) [investigate: reference to link_error()]
FAIL: LTO 8 [investigate. hidden symbols broken?]
FAIL: Build pr20103a [investigate. "main" doesn't pull in library]
FAIL: Build pr20103b [investigate. "main" doesn't pull in library]
FAIL: Build pr20103c [investigate. "main" doesn't pull in library]
FAIL: PR ld/20103 (-O2 -flto tmpdir/thinpr20103a.a tmpdir/thinpr20103b.a tmpdir/thinpr20103c.a) (1) [investigate. "main" doesn't pull in library]
FAIL: PR ld/20103 (-O2 -flto tmpdir/fatpr20103a.a tmpdir/fatpr20103b.a tmpdir/fatpr20103c.a) (1) [investigate. "main" doesn't pull in library]
FAIL: PR ld/20103 (-O2 tmpdir/fatpr20103a.a tmpdir/fatpr20103b.a tmpdir/fatpr20103c.a) (1) [investigate. "main" doesn't pull in library]
FAIL: Build pr20103d [investigate. "main" doesn't pull in library]
FAIL: PR ld/20103 (-O2 tmpdir/thinpr20103a.a tmpdir/thinpr20103b.a tmpdir/thinpr20103c.a) (1) [investigate. "main" doesn't pull in library]
FAIL: plugin claimfile lost symbol [investigate. _main not defined]
FAIL: plugin claimfile replace symbol [investigate. _main not defined]
FAIL: plugin claimfile resolve symbol [investigate. _main not defined]
FAIL: plugin claimfile replace file [investigate. _main not defined]
FAIL: load plugin with source [investigate. _main not defined]
FAIL: plugin claimfile lost symbol with source [investigate. _main not defined]
FAIL: plugin claimfile replace symbol with source [investigate. _main not defined]
FAIL: plugin claimfile resolve symbol with source [investigate. _main not defined]
FAIL: plugin claimfile replace file with source [investigate. _main not defined]
FAIL: plugin error [investigate. _main not defined]
FAIL: plugin warning [investigate. _main not defined]
FAIL: plugin set symbol visibility [investigate. _main not defined]
FAIL: plugin set symbol visibility with source [investigate. _main not defined]
FAIL: plugin ignore lib [investigate. _main not defined]
FAIL: plugin claimfile replace lib [investigate. _main not defined]
FAIL: plugin ignore lib with source [investigate. _main not defined]
FAIL: plugin claimfile replace lib with source [investigate. _main not defined]
FAIL: plugin with empty archive [investigate. _main not defined]
FAIL: plugin 2 with source lib [investigate. _main not defined]
FAIL: load plugin 2 with source [investigate. _main not defined]
FAIL: load plugin 2 with source and -r [investigate. _main not defined]
FAIL: plugin 3 with source lib [investigate. _main not defined]
FAIL: load plugin 3 with source [investigate. _main not defined]
FAIL: load plugin 3 with source and -r [investigate. _main not defined]
FAIL: PR ld/20070 [investigate. _main not defined]
FAIL: NOCROSSREFS 1 [investigate. segfault rather than error message!]
FAIL: NOCROSSREFS 2 [???]
FAIL: NOCROSSREFS 3 [investigate. segfault!]
FAIL: NOCROSSREFS_TO 1 [investigate]
FAIL: NOCROSSREFS_TO 2 [investigate]
FAIL: NOCROSSREFS_TO 4 [???]
FAIL: Preserve default . = 0 [gc-sections]
FAIL: Preserve explicit . = 0 [gc-sections]
FAIL: selective1 [gc-sections]
FAIL: selective2 [gc-sectioons]
FAIL: selective3 [gc-sections]
FAIL: --entry foo archive [no entry]
FAIL: --entry foo -u foo archive [no entry]
FAIL: --entry foo [no entry]
FAIL: --entry foo -u foo [no entry]
FAIL: Check require-defined can require a symbol from an object [gc-section]
FAIL: Check require-defined can require a symbol from an archive [no entry?]
FAIL: Check require-defined can require two symbols [no gc-sections]
FAIL: undefined function [investigate]
FAIL: undefined line [investigate]

@pipcet
Copy link
Owner Author

pipcet commented Mar 11, 2017

Progress:

		=== ld Summary ===

# of expected passes		555
# of unexpected failures	55
# of expected failures		10
# of unresolved testcases	3
# of unsupported tests		108

pipcet pushed a commit that referenced this issue Mar 14, 2017
Commit d7e7473 ("Eliminate make_cleanup_ui_file_delete / make
ui_file a class hierarchy") introduced a problem when using "layout
regs", that leads gdb to crash when issuing:

./gdb ./a.out -ex 'layout regs' -ex start

From the backtrace, it's caused by this 'delete' on tui_restore_gdbout():

 (gdb) bt
 #0  0x00007ffff6b962b2 in free () from /lib64/libc.so.6
 #1  0x000000000059fa47 in tui_restore_gdbout (ui=0x22997b0) at ../../gdb/tui/tui-regs.c:714
 #2  0x0000000000619996 in do_my_cleanups (pmy_chain=pmy_chain@entry=0x1e08320 <cleanup_chain>, old_chain=old_chain@entry=0x235b4b0) at ../../gdb/common/cleanups.c:154
 #3  0x0000000000619b1d in do_cleanups (old_chain=old_chain@entry=0x235b4b0) at ../../gdb/common/cleanups.c:176
 #4  0x000000000059fb0d in tui_register_format (frame=frame@entry=0x22564e0, regnum=regnum@entry=0) at ../../gdb/tui/tui-regs.c:747
 #5  0x000000000059ffeb in tui_get_register (data=0x2434d18, changedp=0x0, regnum=0, frame=0x22564e0) at ../../gdb/tui/tui-regs.c:768
 #6  tui_show_register_group (refresh_values_only=<optimized out>, frame=0x22564e0, group=0x1e09250 <general_group>) at ../../gdb/tui/tui-regs.c:287
 #7  tui_show_registers (group=0x1e09250 <general_group>) at ../../gdb/tui/tui-regs.c:156
 bminor#8  0x00000000005a07cf in tui_check_register_values (frame=frame@entry=0x22564e0) at ../../gdb/tui/tui-regs.c:496
 bminor#9  0x00000000005a3e65 in tui_check_data_values (frame=frame@entry=0x22564e0) at ../../gdb/tui/tui-windata.c:232
 bminor#10 0x000000000059cf65 in tui_refresh_frame_and_register_information (registers_too_p=1) at ../../gdb/tui/tui-hooks.c:156
 bminor#11 0x00000000006d5c05 in generic_observer_notify (args=0x7fffffffdbe0, subject=<optimized out>) at ../../gdb/observer.c:167
 bminor#12 observer_notify_normal_stop (bs=<optimized out>, print_frame=print_frame@entry=1) at ./observer.inc:61
 bminor#13 0x00000000006a6409 in normal_stop () at ../../gdb/infrun.c:8364
 bminor#14 0x00000000006af8f5 in fetch_inferior_event (client_data=<optimized out>) at ../../gdb/infrun.c:3990
 #15 0x000000000066f0fd in gdb_wait_for_event (block=block@entry=0) at ../../gdb/event-loop.c:859
 #16 0x000000000066f237 in gdb_do_one_event () at ../../gdb/event-loop.c:322
 #17 0x000000000066f386 in gdb_do_one_event () at ../../gdb/event-loop.c:353
 #18 0x00000000007411bc in wait_sync_command_done () at ../../gdb/top.c:570
 #19 0x0000000000741426 in maybe_wait_sync_command_done (was_sync=0) at ../../gdb/top.c:587
 #20 execute_command (p=<optimized out>, p@entry=0x7fffffffe43a "start", from_tty=from_tty@entry=1) at ../../gdb/top.c:676
 #21 0x00000000006c2048 in catch_command_errors (command=0x741200 <execute_command(char*, int)>, arg=0x7fffffffe43a "start", from_tty=1) at ../../gdb/main.c:376
 #22 0x00000000006c2b60 in captured_main_1 (context=0x7fffffffde70) at ../../gdb/main.c:1119
 #23 captured_main (data=0x7fffffffde70) at ../../gdb/main.c:1140
 #24 gdb_main (args=args@entry=0x7fffffffdf90) at ../../gdb/main.c:1158
 #25 0x0000000000408cf5 in main (argc=<optimized out>, argv=<optimized out>) at ../../gdb/gdb.c:32
 (gdb) f 1
 #1  0x000000000059fa47 in tui_restore_gdbout (ui=0x22997b0) at ../../gdb/tui/tui-regs.c:714
 714	  delete gdb_stdout;

The problem is simply that the commit mentioned above made the ui_file
that gdb_stdout is temporarily set to be a stack-allocated
string_file, while before it used to be a heap-allocated ui_file.  The
fix is simply to remove the now-incorrect delete.

New test included, which exercises enabling all TUI layouts, with and
without execution.  (This particular crash only triggers with
execution.)

gdb/ChangeLog:
2017-03-07  Pedro Alves  <[email protected]>

	* tui/tui-regs.c (tui_restore_gdbout): Don't delete gdb_stdout.

gdb/testsuite/ChangeLog:
2017-03-07  Pedro Alves  <[email protected]>

	* gdb.base/tui-layout.c: New file.
	* gdb.base/tui-layout.exp: New file.
@pipcet
Copy link
Owner Author

pipcet commented Mar 14, 2017

Okay, passing all tests now, though this required some massaging of test conditions and some hacks for dynrelro support...which doesn't even make sense as long as there is no read-only memory in wasm.

pipcet pushed a commit that referenced this issue Apr 30, 2017
I build GDB with asan, and run test case hook-stop.exp, and threadapply.exp,
I got the following asan error,

=================================================================^M
^[[1m^[[31m==2291==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000999c4 at pc 0x000000826022 bp 0x7ffd28a8ff70 sp 0x7ffd28a8ff60^M
^[[1m^[[0m^[[1m^[[34mREAD of size 4 at 0x6160000999c4 thread T0^[[1m^[[0m^M
    #0 0x826021 in release_stop_context_cleanup ../../binutils-gdb/gdb/infrun.c:8203^M
    #1 0x72798a in do_my_cleanups ../../binutils-gdb/gdb/common/cleanups.c:154^M
    #2 0x727a32 in do_cleanups(cleanup*) ../../binutils-gdb/gdb/common/cleanups.c:176^M
    #3 0x826895 in normal_stop() ../../binutils-gdb/gdb/infrun.c:8381^M
    #4 0x815208 in fetch_inferior_event(void*) ../../binutils-gdb/gdb/infrun.c:4011^M
    #5 0x868aca in inferior_event_handler(inferior_event_type, void*) ../../binutils-gdb/gdb/inf-loop.c:44^M
....
^[[1m^[[32m0x6160000999c4 is located 68 bytes inside of 568-byte region [0x616000099980,0x616000099bb8)^M
^[[1m^[[0m^[[1m^[[35mfreed by thread T0 here:^[[1m^[[0m^M
    #0 0x7fb0bc1312ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)^M
    #1 0xb8c62f in xfree(void*) ../../binutils-gdb/gdb/common/common-utils.c:100^M
    #2 0x83df67 in free_thread ../../binutils-gdb/gdb/thread.c:207^M
    #3 0x83dfd2 in init_thread_list() ../../binutils-gdb/gdb/thread.c:223^M
    #4 0x805494 in kill_command ../../binutils-gdb/gdb/infcmd.c:2595^M
....

Detaching from program: /home/yao.qi/SourceCode/gnu/build-with-asan/gdb/testsuite/outputs/gdb.threads/threadapply/threadapply, process 2399^M
=================================================================^M
^[[1m^[[31m==2387==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000a98c0 at pc 0x00000083fd28 bp 0x7ffd401c3110 sp 0x7ffd401c3100^M
^[[1m^[[0m^[[1m^[[34mREAD of size 4 at 0x6160000a98c0 thread T0^[[1m^[[0m^M
    #0 0x83fd27 in thread_alive ../../binutils-gdb/gdb/thread.c:741^M
    #1 0x844277 in thread_apply_all_command ../../binutils-gdb/gdb/thread.c:1804^M
....
^M
^[[1m^[[32m0x6160000a98c0 is located 64 bytes inside of 568-byte region [0x6160000a9880,0x6160000a9ab8)^M
^[[1m^[[0m^[[1m^[[35mfreed by thread T0 here:^[[1m^[[0m^M
    #0 0x7f59a7e322ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)^M
    #1 0xb8c62f in xfree(void*) ../../binutils-gdb/gdb/common/common-utils.c:100^M
    #2 0x83df67 in free_thread ../../binutils-gdb/gdb/thread.c:207^M
    #3 0x83dfd2 in init_thread_list() ../../binutils-gdb/gdb/thread.c:223^M

This patch fixes the issue by deleting thread_info object if it is
deletable, otherwise, mark it as exited (by set_thread_exited).
Function set_thread_exited is shared from delete_thread_1.  This patch
also moves field "refcount" to private and methods incref and
decref.  Additionally, we stop using "ptid_t" in
"struct current_thread_cleanup" to reference threads, instead we use
"thread_info" directly.  Due to this change, we don't need
restore_current_thread_ptid_changed anymore.

gdb:

2017-04-10  Yao Qi  <[email protected]>

	PR gdb/19942
	* gdbthread.h (thread_info::deletable): New method.
	(thread_info::incref): New method.
	(thread_info::decref): New method.
	(thread_info::refcount): Move it to private.
	* infrun.c (save_stop_context): Call inc_refcount.
	(release_stop_context_cleanup): Likewise.
	* thread.c (set_thread_exited): New function.
	(init_thread_list): Delete "tp" only it is deletable, otherwise
	call set_thread_exited.
	(delete_thread_1): Call set_thread_exited.
	(current_thread_cleanup) <inferior_pid>: Remove.
	<thread>: New field.
	(restore_current_thread_ptid_changed): Removed.
	(do_restore_current_thread_cleanup): Adjust.
	(restore_current_thread_cleanup_dtor): Don't call
	find_thread_ptid.
	(set_thread_refcount): Use dec_refcount.
	(make_cleanup_restore_current_thread): Adjust.
	(thread_apply_all_command): Call inc_refcount.
	(_initialize_thread): Don't call
	observer_attach_thread_ptid_changed.
pipcet pushed a commit that referenced this issue Apr 30, 2017
Change

  if (PIC)
    {
      #1
    }
  else
    {
      #2
      if (VxWorks)
        {
          #3
        }
    }
  #4
  if (VxWorks && !PIC)
    {
      #5
    }

to

  #4
  if (PIC)
    {
      #1
    }
  else
    {
      #2
      if (VxWorks)
        {
          #3
          #5
        }
    }

	* elf32-i386.c (elf_i386_finish_dynamic_sections): Simplify
	VxWorks for non-PIC.
pipcet pushed a commit that referenced this issue May 31, 2017
This patch adds one unit test for gdbarch methods register_to_value and
value_to_register.  The test pass different combinations of {regnu, type}
to gdbarch_register_to_value and gdbarch_value_to_register.  In order
to do the test, add a new function create_new_frame to create a fake
frame.  It can be improved after we converted frame_info to class.

In order to isolate regcache (from target_ops operations on writing
registers, like target_store_registers), the sub-class of regcache in the
test override raw_write.  Also, in order to get the right regcache from
get_thread_arch_aspace_regcache, the sub-class of regcache inserts itself
to current_regcache.

Suppose I incorrectly modified the size of buffer as below,

@@ -1228,7 +1228,7 @@ ia64_register_to_value (struct frame_info *frame, int regnum,
                        int *optimizedp, int *unavailablep)
 {
   struct gdbarch *gdbarch = get_frame_arch (frame);
-  gdb_byte in[MAX_REGISTER_SIZE];
+  gdb_byte in[1];

   /* Convert to TYPE.  */
   if (!get_frame_register_bytes (frame, regnum, 0,

build GDB with "-fsanitize=address" and run unittest.exp, asan can detect
such error

==2302==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff98193870 at pc 0xbd55ea bp 0x7fff981935a0 sp 0x7fff98193598
WRITE of size 16 at 0x7fff98193870 thread T0
    #0 0xbd55e9 in frame_register_unwind(frame_info*, int, int*, int*, lval_type*, unsigned long*, int*, unsigned char*) /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1119
    #1 0xbd58c8 in frame_register(frame_info*, int, int*, int*, lval_type*, unsigned long*, int*, unsigned char*) /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1147
    #2 0xbd6e25 in get_frame_register_bytes(frame_info*, int, unsigned long, int, unsigned char*, int*, int*) /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1427
    #3 0x70080a in ia64_register_to_value /home/yao/SourceCode/gnu/gdb/git/gdb/ia64-tdep.c:1236
    #4 0xbf570e in gdbarch_register_to_value(gdbarch*, frame_info*, int, type*, unsigned char*, int*, int*) /home/yao/SourceCode/gnu/gdb/git/gdb/gdbarch.c:2619
    #5 0xc05975 in register_to_value_test /home/yao/SourceCode/gnu/gdb/git/gdb/gdbarch-selftests.c:131

Or, even if GDB is not built with asan, GDB just crashes.

*** stack smashing detected ***: ./gdb terminated
Aborted (core dumped)

gdb:

2017-05-24  Yao Qi  <[email protected]>

	* Makefile.in (SFILES): Add gdbarch-selftests.c.
	(COMMON_OBS): Add gdbarch-selftests.o.
	* frame.c [GDB_SELF_TESTS] (create_new_frame): New function.
	* frame.h [GDB_SELF_TESTS] (create_new_frame): Declare.
	* gdbarch-selftests.c: New file.
	* regcache.h (regcache) <~regcache>: Mark it virtual if
	GDB_SELF_TEST.
	<raw_write>: Likewise.
pipcet pushed a commit that referenced this issue Aug 27, 2017
In some cases we've been replacing heap-allocated gdb_byte buffers
managed with xmalloc/make_cleanup(xfree) with gdb::vector<gdb_byte>.
That usually pessimizes the code a little bit because std::vector
value-initializes elements (which for gdb_byte means
zero-initialization), while if you're creating a temporary buffer,
you're most certaintly going to fill it in with some data.  An
alternative is to use

  unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);

but it looks like that's not very popular.

Recently, a use of obstacks in dwarf2read.c was replaced with
std::vector<gdb_byte> and that as well introduced a pessimization for
always memsetting the buffer when it's garanteed that the zeros will
be overwritten immediately.  (see dwarf2read.c change in this patch to
find it.)

So here's a different take at addressing this issue "by design":

#1 - Introduce default_init_allocator<T>

I.e., a custom allocator that does default construction using default
initialization, meaning, no more zero initialization.  That's the
default_init_allocation<T> class added in this patch.

See "Notes" at
<http://en.cppreference.com/w/cpp/container/vector/resize>.

#2 - Introduce def_vector<T>

I.e., a convenience typedef, because typing the allocator is annoying:

  using def_vector<T> = std::vector<T, gdb::default_init_allocator<T>>;

#3 - Introduce byte_vector

Because gdb_byte vectors will be the common thing, add a convenience
"byte_vector" typedef:

  using byte_vector = def_vector<gdb_byte>;

which is really the same as:

  std::vector<gdb_byte, gdb::default_init_allocator<gdb_byte>>;

The intent then is to make "gdb::byte_vector" be the go-to for dynamic
byte buffers.  So the less friction, the better.

#4 - Adjust current code to use it.

To set the example going forward.  Replace std::vector uses and also
unique_ptr<byte[]> uses.

One nice thing is that with this allocator, for changes like these:

  -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
  +gdb::byte_vector buf (some_size);
   fill_with_data (buf.data (), buf.size ());

the generated code is the same as before.  I.e., the compiler
de-structures the vector and gets rid of the unused "reserved vs size"
related fields.

The other nice thing is that it's easier to write
  gdb::byte_vector buf (size);
than
  std::unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);
or even (C++14):
  auto buf = std::make_unique<gdb_byte[]> (size); // zero-initializes...

#5 - Suggest s/std::vector<gdb_byte>/gdb::byte_vector/ going forward.

Note that this commit actually fixes a couple of bugs where the current
code is incorrectly using "std::vector::reserve(new_size)" and then
accessing the vector's internal buffer beyond the vector's size: see
dwarf2loc.c and charset.c.  That's undefined behavior and may trigger
debug mode assertion failures.  With default_init_allocator,
"resize()" behaves like "reserve()" performance wise, in that it
leaves new elements with unspecified values, but, it does that safely
without triggering undefined behavior when you access those values.

gdb/ChangeLog:
2017-06-14  Pedro Alves  <[email protected]>

	* ada-lang.c: Include "common/byte-vector.h".
	(ada_value_primitive_packed_val): Use gdb::byte_vector.
	* charset.c (wchar_iterator::iterate): Resize the vector instead
	of reserving it.
	* common/byte-vector.h: Include "common/def-vector.h".
	(wchar_iterator::m_out): Now a gdb::def_vector<gdb_wchar_t>.
	* cli/cli-dump.c: Include "common/byte-vector.h".
	(dump_memory_to_file, restore_binary_file): Use gdb::byte_vector.
	* common/byte-vector.h: New file.
	* common/def-vector.h: New file.
	* common/default-init-alloc.h: New file.
	* dwarf2loc.c: Include "common/byte-vector.h".
	(rw_pieced_value): Use gdb::byte_vector, and resize the vector
	instead of reserving it.
	* dwarf2read.c: Include "common/byte-vector.h".
	(data_buf::m_vec): Now a gdb::byte_vector.
	* gdb_regex.c: Include "common/def-vector.h".
	(compiled_regex::compiled_regex): Use gdb::def_vector<char>.
	* mi/mi-main.c: Include "common/byte-vector.h".
	(mi_cmd_data_read_memory): Use gdb::byte_vector.
	* printcmd.c: Include "common/byte-vector.h".
	(print_scalar_formatted): Use gdb::byte_vector.
	* valprint.c: Include "common/byte-vector.h".
	(maybe_negate_by_bytes, print_decimal_chars): Use
	gdb::byte_vector.
pipcet pushed a commit that referenced this issue Aug 27, 2017
Ref: https://sourceware.org/ml/gdb-patches/2017-07/msg00162.html

Debugging x86-64 GNU/Linux programs currently crashes GDB in
tdesc_use_registers during gdbarch initialization:

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000001093eaf in htab_remove_elt_with_hash (htab=0x2ef9fa0, element=0x26af960, hash=557151073) at src/libiberty/hashtab.c:728
  728       if (*slot == HTAB_EMPTY_ENTRY)
  (top-gdb) p slot
  $1 = (void **) 0x0
  (top-gdb) bt
  #0  0x0000000001093eaf in htab_remove_elt_with_hash (htab=0x2ef9fa0, element=0x26af960, hash=557151073) at src/libiberty/hashtab.c:728
  #1  0x0000000001093e79 in htab_remove_elt (htab=0x2ef9fa0, element=0x26af960) at src/libiberty/hashtab.c:714
  #2  0x00000000009121b0 in tdesc_use_registers (gdbarch=0x3001240, target_desc=0x2659cb0, early_data=0x2881cb0)
      at src/gdb/target-descriptions.c:1328
  #3  0x000000000047c93e in i386_gdbarch_init (info=..., arches=0x0) at src/gdb/i386-tdep.c:8634
  #4  0x0000000000818d5f in gdbarch_find_by_info (info=...) at src/gdb/gdbarch.c:5394
  #5  0x00000000007198a8 in set_gdbarch_from_file (abfd=0x2f48250) at src/gdb/arch-utils.c:618
  #6  0x00000000007f21cb in exec_file_attach (filename=0x7fffffffddb0 "/home/pedro/gdb/tests/threads", from_tty=1) at src/gdb/exec.c:380
  #7  0x0000000000865c18 in catch_command_errors_const (command=0x7f1d83 <exec_file_attach(char const*, int)>, arg=0x7fffffffddb0 "/home/pedro/gdb/tests/threads",
      from_tty=1) at src/gdb/main.c:403
  bminor#8  0x00000000008669cf in captured_main_1 (context=0x7fffffffd860) at src/gdb/main.c:1035
  bminor#9  0x0000000000866de2 in captured_main (data=0x7fffffffd860) at src/gdb/main.c:1142
  bminor#10 0x0000000000866e24 in gdb_main (args=0x7fffffffd860) at src/gdb/main.c:1160
  bminor#11 0x000000000041312d in main (argc=3, argv=0x7fffffffd968) at src/gdb/gdb.c:32

The direct cause of the crash is that we tried to remove an element
from the hash which supposedly exists, but does not.  (htab_remove_elt
shouldn't really crash in this case, but that's secondary.)

The real problem is that early_data passed to tdesc_use_registers
includes regs from a target description that is not the target_desc,
which violates its assumptions.  The registers in question are the
fs_base/gs_base registers, added by amd64_init_abi:

      tdesc_numbered_register (feature, tdesc_data_segments,
		       AMD64_FSBASE_REGNUM, "fs_base");
      tdesc_numbered_register (feature, tdesc_data_segments,
		       AMD64_GSBASE_REGNUM, "gs_base");

and that happens because amd64_linux_init_abi uses amd64_init_abi as
helper, but they don't coordinate on which fallback tdesc to use.

amd64_init_abi does:

  if (! tdesc_has_registers (tdesc))
    tdesc = tdesc_amd64;

and then adds the fs_base/gs_base registers of the "tdesc_amd64" tdesc
to the tdesc_arch_data.

After amd64_init_abi returns, amd64_linux_init_abi does:

  if (! tdesc_has_registers (tdesc))
    tdesc = tdesc_amd64_linux;
  tdep->tdesc = tdesc;

and we end up tdesc_amd64_linux installed in tdep->tdesc.

The fix is to make sure that amd64_linux_init_abi and amd64_init_abi
agree on default tdesc, by adding a "default tdesc" parameter to
amd64_init_abi, instead of having amd64_init_abi hardcode a default.
With this, amd64_init_abi creates the fs_base/gs_base registers using
the tdesc_amd64_linux tdesc.

Tested on x86-64 GNU/Linux, -m64.  I don't have an x32 setup handy.

Thanks to John Baldwin, Yao Qi and Simon Marchi for the investigation.

gdb/ChangeLog:
2017-07-13  Pedro Alves  <[email protected]>

	* amd64-darwin-tdep.c (x86_darwin_init_abi_64): Pass tdesc_amd64
	as default tdesc.
	* amd64-dicos-tdep.c (amd64_dicos_init_abi):
	* amd64-fbsd-tdep.c (amd64fbsd_init_abi):
	* amd64-linux-tdep.c (amd64_linux_init_abi): Pass
	tdesc_amd64_linux as default tdesc.  Get final tdesc from the
	tdep.
	(amd64_x32_linux_init_abi): Pass tdesc_x32_linux as default tdesc.
	Get final tdesc from the tdep.
	* amd64-nbsd-tdep.c (amd64nbsd_init_abi): Pass tdesc_amd64 as
	default tdesc.
	* amd64-obsd-tdep.c (amd64obsd_init_abi): Likewise.
	* amd64-sol2-tdep.c (amd64_sol2_init_abi): Likewise.
	* amd64-tdep.c (amd64_init_abi): Add 'default_tdesc' parameter.
	Use it as default tdesc.
	(amd64_x32_init_abi): Add 'default_tdesc' parameter, and pass it
	down to amd_init_abi.  No longer handle fallback tdesc here.
	* amd64-tdep.h (tdesc_x32): Declare.
	(amd64_init_abi, amd64_x32_init_abi): Add 'default_tdesc'
	parameter.
	* amd64-windows-tdep.c (amd64_windows_init_abi): Pass tdesc_amd64
	as default tdesc.
pipcet pushed a commit that referenced this issue Aug 27, 2017
PR 21555 is caused by the exception during the prologue analysis when re-set
a breakpoint.

(gdb) bt
 #0  memory_error_message (err=TARGET_XFER_E_IO, gdbarch=0x153db50, memaddr=93824992233232) at ../../binutils-gdb/gdb/corefile.c:192
 #1  0x00000000005718ed in memory_error (err=TARGET_XFER_E_IO, memaddr=memaddr@entry=93824992233232) at ../../binutils-gdb/gdb/corefile.c:220
 #2  0x00000000005719d6 in read_memory_object (object=object@entry=TARGET_OBJECT_CODE_MEMORY, memaddr=93824992233232, memaddr@entry=1, myaddr=myaddr@entry=0x7fffffffd0a0 "P\333S\001", len=len@entry=1) at ../../binutils-gdb/gdb/corefile.c:259
 #3  0x0000000000571c6e in read_code (len=1, myaddr=0x7fffffffd0a0 "P\333S\001", memaddr=<optimized out>) at ../../binutils-gdb/gdb/corefile.c:287
 #4  read_code_unsigned_integer (memaddr=memaddr@entry=93824992233232, len=len@entry=1, byte_order=byte_order@entry=BFD_ENDIAN_LITTLE)                          at ../../binutils-gdb/gdb/corefile.c:362
 #5  0x000000000041d4a0 in amd64_analyze_prologue (gdbarch=gdbarch@entry=0x153db50, pc=pc@entry=93824992233232, current_pc=current_pc@entry=18446744073709551615, cache=cache@entry=0x7fffffffd1e0) at ../../binutils-gdb/gdb/amd64-tdep.c:2310
 #6  0x000000000041e404 in amd64_skip_prologue (gdbarch=0x153db50, start_pc=93824992233232) at ../../binutils-gdb/gdb/amd64-tdep.c:2459
 #7  0x000000000067bfb0 in skip_prologue_sal (sal=sal@entry=0x7fffffffd4e0) at ../../binutils-gdb/gdb/symtab.c:3628
 bminor#8  0x000000000067c4d8 in find_function_start_sal (sym=sym@entry=0x1549960, funfirstline=1) at ../../binutils-gdb/gdb/symtab.c:3501
 bminor#9  0x000000000060999d in symbol_to_sal (result=result@entry=0x7fffffffd5f0, funfirstline=<optimized out>, sym=sym@entry=0x1549960) at ../../binutils-gdb/gdb/linespec.c:3860
....
 #16 0x000000000054b733 in location_to_sals (b=b@entry=0x15792d0, location=0x157c230, search_pspace=search_pspace@entry=0x1148120, found=found@entry=0x7fffffffdc64) at ../../binutils-gdb/gdb/breakpoint.c:14211
 #17 0x000000000054c1f5 in breakpoint_re_set_default (b=0x15792d0) at ../../binutils-gdb/gdb/breakpoint.c:14301
 #18 0x00000000005412a9 in breakpoint_re_set_one (bint=bint@entry=0x15792d0) at ../../binutils-gdb/gdb/breakpoint.c:14412

This problem can be fixed by

 - either each prologue analyzer doesn't throw exception,
 - or catch the exception thrown from gdbarch_skip_prologue,

I choose the latter because the former needs to fix *every* prologue
analyzer to not throw exception.

This error can be reproduced by changing reread.exp.  The test reread.exp
has already test that breakpoint can be reset correctly after the
executable is re-read.  This patch extends this test by compiling test c
file with and without -fPIE.

(gdb) run ^M
The program being debugged has been started already.^M
Start it from the beginning? (y or n) y^M
x86_64/gdb/testsuite/outputs/gdb.base/reread/reread' has changed; re-reading symbols.
Error in re-setting breakpoint 1: Cannot access memory at address 0x555555554790^M
Error in re-setting breakpoint 2: Cannot access memory at address 0x555555554790^M
Starting program: /scratch/yao/gdb/build-git/x86_64/gdb/testsuite/outputs/gdb.base/reread/reread ^M
This is foo^M
[Inferior 1 (process 27720) exited normally]^M
(gdb) FAIL: gdb.base/reread.exp: opts= "-fPIE" "ldflags=-pie" : run to foo() second time (the program exited)

This patch doesn't re-indent the code, to keep the patch simple.

gdb:

2017-07-25  Yao Qi  <[email protected]>

	PR gdb/21555
	* arch-utils.c (gdbarch_skip_prologue_noexcept): New function.
	* arch-utils.h (gdbarch_skip_prologue_noexcept): Declare.
	* infrun.c: Include arch-utils.h
	(handle_step_into_function): Call gdbarch_skip_prologue_noexcept.
	(handle_step_into_function_backward): Likewise.
	* symtab.c (skip_prologue_sal): Likewise.

gdb/testsuite:

2017-07-25  Yao Qi  <[email protected]>

	PR gdb/21555
	* gdb.base/reread.exp: Wrap the whole test with two kinds of
	compilation flags, with -fPIE and without -fPIE.
pipcet pushed a commit that referenced this issue Aug 27, 2017
…ping

(Ref: https://sourceware.org/ml/gdb/2017-06/msg00020.html)

Assuming int_t is a typedef to int:

 typedef int int_t;

gdb currently loses this expression's typedef:

 (gdb) p (int_t) 0
 $1 = 0
 (gdb) whatis $1
 type = int

or:

 (gdb) whatis (int_t) 0
 type = int

or, to get "whatis" out of the way:

 (gdb) maint print type (int_t) 0
 ...
 name 'int'
 code 0x8 (TYPE_CODE_INT)
 ...

This prevents a type printer for "int_t" kicking in, with e.g.:

 (gdb) p (int_t) 0

From the manual, we can see that that "whatis (int_t) 0" command
invocation should have printed "type = int_t":

 If @var{arg} is a variable or an expression, @code{whatis} prints its
 literal type as it is used in the source code.  If the type was
 defined using a @code{typedef}, @code{whatis} will @Emph{not} print
 the data type underlying the @code{typedef}.
 (...)
 If @var{arg} is a type name that was defined using @code{typedef},
 @code{whatis} @dfn{unrolls} only one level of that @code{typedef}.

That one-level stripping is currently done here, in
gdb/eval.c:evaluate_subexp_standard, handling OP_TYPE:

...
     else if (noside == EVAL_AVOID_SIDE_EFFECTS)
	{
	  struct type *type = exp->elts[pc + 1].type;

	  /* If this is a typedef, then find its immediate target.  We
	     use check_typedef to resolve stubs, but we ignore its
	     result because we do not want to dig past all
	     typedefs.  */
	  check_typedef (type);
	  if (TYPE_CODE (type) == TYPE_CODE_TYPEDEF)
	    type = TYPE_TARGET_TYPE (type);
	  return allocate_value (type);
	}

However, this stripping is reachable in both:

 #1 - (gdb) whatis (int_t)0     # ARG is an expression with a cast to
                                # typedef type.
 #2 - (gdb) whatis int_t        # ARG is a type name.

while only case #2 should strip the typedef.  Removing that code from
evaluate_subexp_standard is part of the fix.  Instead, we make the
"whatis" command implementation itself strip one level of typedefs
when the command argument is a type name.

We then run into another problem, also fixed by this commit:
value_cast always drops any typedefs of the destination type.

With all that fixed, "whatis (int_t) 0" now works as expected:

 (gdb) whatis int_t
 type = int
 (gdb) whatis (int_t)0
 type = int_t

value_cast has many different exit/convertion paths, for handling many
different kinds of casts/conversions, and most of them had to be
tweaked to construct the value of the right "to" type.  The new tests
try to exercise most of it, by trying castin of many different
combinations of types.  With:

 $ make check TESTS="*/whatis-ptype*.exp */gnu_vector.exp */dfp-test.exp"

... due to combinatorial explosion, the testsuite results for the
tests above alone grow like:

 - # of expected passes            246
 + # of expected passes            3811

You'll note that the tests exposed one GCC buglet, filed here:

  Missing DW_AT_type in DW_TAG_typedef of "typedef of typedef of void"
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81267

gdb/ChangeLog:
2017-08-21  Pedro Alves  <[email protected]>

	* eval.c (evaluate_subexp_standard) <OP_TYPE>: Don't dig past
	typedefs.
	* typeprint.c (whatis_exp): If handling "whatis", and expression
	is OP_TYPE, strip one typedef level.  Otherwise don't strip
	typedefs here.
	* valops.c (value_cast): Save "to" type before resolving
	stubs/typedefs.  Use that type as resulting value's type.

gdb/testsuite/ChangeLog:
2017-08-21  Pedro Alves  <[email protected]>

	* gdb.base/dfp-test.c
	(d32_t, d64_t, d128_t, d32_t2, d64_t2, d128_t2, v_d32_t, v_d64_t)
	(v_d128_t, v_d32_t2, v_d64_t2, v_d128_t2): New.
	* gdb.base/dfp-test.exp: Add whatis/ptype/cast tests.
	* gdb.base/gnu_vector.exp: Add whatis/ptype/cast tests.
	* gdb.base/whatis-ptype-typedefs.c: New.
	* gdb.base/whatis-ptype-typedefs.exp: New.
	* gdb.python/py-prettyprint.c (int_type, int_type2): New typedefs.
	(an_int, an_int_type, an_int_type2): New globals.
	* gdb.python/py-prettyprint.exp (run_lang_tests): Add tests
	involving typedefs and cast expressions.
	* gdb.python/py-prettyprint.py (class pp_int_typedef): New.
	(lookup_typedefs_function): New.
	(typedefs_pretty_printers_dict): New.
	(top level): Register lookup_typedefs_function in
	gdb.pretty_printers.
pipcet pushed a commit that referenced this issue Jun 21, 2020
Since the multi-target patch, the run command fails on Solaris with an
assertion failure even for a trivial program:

$ ./gdb -D ./data-directory ./hello
GNU gdb (GDB) 10.0.50.20200106-git
[...]
Reading symbols from ./hello...
(gdb) run
Starting program: /vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello
/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:336: internal-error:
thread_info::thread_info(inferior*, ptid_t): Assertion `inf_ != NULL'
failed.

Here's the start of the corresponding stack trace:

#0  internal_error (
    file=file@entry=0x966150
"/vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c", line=line@entry=336,
fmt=0x9ddb94 "%s: Assertion `%s' failed.")
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c:51
#1  0x0000000000ef81f4 in thread_info::thread_info (this=0x1212020,
    inf_=<optimized out>, ptid_=...)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:344
#2  0x0000000000ef82cd in new_thread (inf=inf@entry=0x0, ptid=...)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:239
#3  0x0000000000efac3c in add_thread_silent (
    targ=targ@entry=0x11b0940 <the_procfs_target>, ptid=...)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/thread.c:304
#4  0x0000000000d90692 in procfs_target::create_inferior (
    this=0x11b0940 <the_procfs_target>,
    exec_file=0x13dbef0
"/vol/obj/gnu/gdb/gdb/reghunt/no-resync/122448/gdb/hello", allargs="",
env=0x13c48f0, from_tty=<optimized out>)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/gdbsupport/ptid.h:47
#5  0x0000000000c84e64 in run_command_1 (args=<optimized out>, from_tty=1,
    run_how=run_how@entry=RUN_NORMAL)
    at /vol/gcc-9/include/c++/9.1.0/bits/basic_string.h:263
#6  0x0000000000c85007 in run_command (args=<optimized out>,
    from_tty=<optimized out>)
    at /vol/src/gnu/gdb/hg/master/reghunt/gdb/infcmd.c:687

Looking closer, I found that in add_thread_silent as called from
procfs.c (procfs_target::create_inferior) find_inferior_ptid returns
NULL.  The all_inferiors (targ) iterator comes up empty.

Going from there, I see that in add_thread_silent

m_target_stack = {m_top = file_stratum, m_stack = {0x20190e0
<the_dummy_target>, 0x200b8c0 <exec_ops>, 0x0, 0x0, 0x0, 0x0, 0x0}}}

i.e. the_procfs_target is missing compared to the_amd64_linux_nat_target
on Linux/x86_64.

Moving the push_target call earlier allows debugging to get over the
initial assertion failure.  I run instead into

procfs: couldn't find pid 0 in procinfo list.

which is fixed by

	https://sourceware.org/pipermail/gdb-patches/2020-June/169674.html

Both patches tested together on amd64-pc-solaris2.11.

	PR gdb/25939
	* procfs.c (procfs_target::procfs_init_inferior): Move push_target
	call ...
	(procfs_target::create_inferior): ... here.
pipcet pushed a commit that referenced this issue Jun 29, 2020
Some C/C++ testcases unconditionally pass -Wno-foo as additional
options to disable some warning.  That is OK with GCC, because GCC
accepts -Wno-foo silently even if it doesn't support -Wfoo.  This is a
feature which allows disabling warnings with newer compilers without
breaking builds with older compilers.  Clang however warns about
unknown -Wno-foo by default, unless you pass
-Wno-unknown-warning-option as well:

 $ gcc -Wno-foo test.c
 * nothing, compiles successfuly *

 $ clang -Wno-foo test.c
 warning: unknown warning option '-Wno-foo [-Wunknown-warning-option]

This commit adds -Wunknown-warning-option centrally in gdb_compile, so
that individual testcases don't have to worry about breaking older
Clangs.

IOW, this avoids this problematic scenario:

#1 - A testcase compiles successfully with Clang version X.
#2 - Clang version "X + 1" adds a new warning, enabled by default,
     which breaks the test.
#3 - We add -Wno-newwarning to the testcase, fixing the testcase with
     clang "X + 1".
#4 - Now building the test with Clang version X no longer works, due
     to "unknown warning option".

gdb/testsuite/ChangeLog:
2020-06-24  Pedro Alves  <[email protected]>

	* lib/gdb.exp (gdb_compile): Update intro comment.  If C/C++ with
	Clang, add "-Wno-unknown-warning-option" to the options.
pipcet pushed a commit that referenced this issue Jul 8, 2020
When building gdb with CFLAGS=-std=gnu17 and CXXFLAGS=-std=gnu++17 and running
test-case gdb.tui/new-layout.exp, we run into:
...
UNRESOLVED: gdb.tui/new-layout.exp: left window box after shrink (ll corner)
FAIL: gdb.tui/new-layout.exp: right window box after shrink (ll corner)
...

In a minimal form, we run into an abort when issuing a winheight command:
...
$ gdb -tui -ex "winheight src - 5"
   <tui stuff>
Aborted (core dumped)
$
...
with this backtrace at the abort:
...
\#0  0x0000000000438db0 in std::char_traits<char>::length (__s=0x0)
     at /usr/include/c++/9/bits/char_traits.h:335
\#1  0x000000000043b72e in std::basic_string_view<char, \
   std::char_traits<char> >::basic_string_view (this=0x7fffffffd4f0, \
   __str=0x0) at /usr/include/c++/9/string_view:124
\#2  0x000000000094971b in tui_partial_win_by_name (name="src")
     at src/gdb/tui/tui-win.c:663
...
due to a NULL comparison which constructs a string_view object from NULL:
...
   657  /* Answer the window represented by name.  */
   658  static struct tui_win_info *
   659  tui_partial_win_by_name (gdb::string_view name)
   660  {
   661    struct tui_win_info *best = nullptr;
   662
   663    if (name != NULL)
...

In gdbsupport/gdb_string_view.h, we either use:
- gdb's copy of libstdc++-v3/include/experimental/string_view, or
- the standard implementation of string_view, when built with C++17 or later
  (which in gcc's case comes from libstdc++-v3/include/std/string_view)

In the first case, there's support for constructing a string_view from a NULL
pointer:
...
      /*constexpr*/ basic_string_view(const _CharT* __str)
      : _M_len{__str == nullptr ? 0 : traits_type::length(__str)},
        _M_str{__str}
      { }
...
but in the second case, there's not:
...
      __attribute__((__nonnull__)) constexpr
      basic_string_view(const _CharT* __str) noexcept
      : _M_len{traits_type::length(__str)},
        _M_str{__str}
      { }
...

Fix this by removing the NULL comparison altogether.

Build on x86_64-linux with CFLAGS=-std=gnu17 and CXXFLAGS=-std=gnu++17, and
tested.

gdb/ChangeLog:

2020-07-06  Tom de Vries  <[email protected]>

	PR tui/26205
	* tui/tui-win.c (tui_partial_win_by_name): Don't test for NULL name.
pipcet pushed a commit that referenced this issue Jul 18, 2020
This started with me running into the bug described in python/22748,
in summary, if the frame sniffing code accessed any registers within
an inline frame then GDB would crash with this error:

  gdb/frame.c:579: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

The problem is that, when in the Python unwinder I write this:

  pending_frame.read_register ("register-name")

This is translated internally into a call to `value_of_register',
which in turn becomes a call to `value_of_register_lazy'.

Usually this isn't a problem, `value_of_register_lazy' requires the
next frame (more inner) to have a valid frame_id, which will be the
case (if we're sniffing frame #1, then frame #0 will have had its
frame-id figured out).

Unfortunately if frame #0 is inline within frame #1, then the frame-id
for frame #0 can't be computed until we have the frame-id for #1.  As
a result we can't create a lazy register for frame #1 when frame #0 is
inline.

Initially I proposed a solution inline with that proposed in bugzilla,
changing value_of_register to avoid creating a lazy register value.
However, when this was discussed on the mailing list I got this reply:

  https://sourceware.org/pipermail/gdb-patches/2020-June/169633.html

Which led me to look at these two patches:

  [1] https://sourceware.org/pipermail/gdb-patches/2020-April/167612.html
  [2] https://sourceware.org/pipermail/gdb-patches/2020-April/167930.html

When I considered patches [1] and [2] I saw that all of the issues
being addressed here were related, and that there was a single
solution that could address all of these issues.

First I wrote the new test gdb.opt/inline-frame-tailcall.exp, which
shows that [1] and [2] regress the inline tail-call unwinder, the
reason for this is that these two patches replace a call to
gdbarch_unwind_pc with a call to get_frame_register, however, this is
not correct.  The previous call to gdbarch_unwind_pc takes THIS_FRAME
and returns the $pc value in the previous frame.  In contrast
get_frame_register takes THIS_FRAME and returns the value of the $pc
in THIS_FRAME; these calls are not equivalent.

The reason these patches appear (or do) fix the regressions listed in
[1] is that the tail call sniffer depends on identifying the address
of a caller and a callee, GDB then looks for a tail-call sequence that
takes us from the caller address to the callee, if such a series is
found then tail-call frames are added.

The bug that was being hit, and which was address in patch [1] is that
in order to find the address of the caller, GDB ended up creating a
lazy register value for an inline frame with to frame-id.  The
solution in patch [1] is to instead take the address of the callee and
treat this as the address of the caller.  Getting the address of the
callee works, but we then end up looking for a tail-call series from
the callee to the callee, which obviously doesn't return any sane
results, so we don't insert any tail call frames.

The original patch [1] did cause some breakage, so patch [2] undid
patch [1] in all cases except those where we had an inline frame with
no frame-id.  It just so happens that there were no tests that fitted
this description _and_ which required tail-call frames to be
successfully spotted, as a result patch [2] appeared to work.

The new test inline-frame-tailcall.exp, exposes the flaw in patch [2].

This commit undoes patch [1] and [2], and replaces them with a new
solution, which is also different to the solution proposed in the
python/22748 bug report.

In this solution I propose that we introduce some special case logic
to value_of_register_lazy.  To understand what this logic is we must
first look at how inline frames unwind registers, this is very simple,
they do this:

  static struct value *
  inline_frame_prev_register (struct frame_info *this_frame,
                              void **this_cache, int regnum)
  {
    return get_frame_register_value (this_frame, regnum);
  }

And remember:

  struct value *
  get_frame_register_value (struct frame_info *frame, int regnum)
  {
    return frame_unwind_register_value (frame->next, regnum);
  }

So in all cases, unwinding a register in an inline frame just asks the
next frame to unwind the register, this makes sense, as an inline
frame doesn't really exist, when we unwind a register in an inline
frame, we're really just asking the next frame for the value of the
register in the previous, non-inline frame.

So, if we assume that we only get into the missing frame-id situation
when we try to unwind a register from an inline frame during the frame
sniffing process, then we can change value_of_register_lazy to not
create lazy register values for an inline frame.

Imagine this stack setup, where #1 is inline within #2.

  #3 -> #2 -> #1 -> #0
        \______/
         inline

Now when trying to figure out the frame-id for #1, we need to compute
the frame-id for #2.  If the frame sniffer for #2 causes a lazy
register read in #2, either due to a Python Unwinder, or for the
tail-call sniffer, then we call value_of_register_lazy passing in
frame #2.

In value_of_register_lazy, we grab the next frame, which is #1, and we
used to then ask for the frame-id of #1, which was not computed, and
this was our bug.

Now, I propose we spot that #1 is an inline frame, and so lookup the
next frame of #1, which is #0.  As #0 is not inline it will have a
valid frame-id, and so we create a lazy register value using #0 as the
next-frame-id.  This will give us the exact same result we had
previously (thanks to the code we inspected above).

Encoding into value_of_register_lazy the knowledge that reading an
inline frame register will always just forward to the next frame
feels.... not ideal, but this seems like the cleanest solution to this
recursive frame-id computation/sniffing issue that appears to crop
up.

The following two commits are fully reverted with this commit, these
correspond to patches [1] and [2] respectively:

  commit 5939967
  Date:   Tue Apr 14 17:26:22 2020 -0300

      Fix inline frame unwinding breakage

  commit 991a3e2
  Date:   Sat Apr 25 00:32:44 2020 -0300

      Fix remaining inline/tailcall unwinding breakage for x86_64

gdb/ChangeLog:

	PR python/22748
	* dwarf2/frame-tailcall.c (dwarf2_tailcall_sniffer_first): Remove
	special handling for inline frames.
	* findvar.c (value_of_register_lazy): Skip inline frames when
	creating lazy register values.
	* frame.c (frame_id_computed_p): Delete definition.
	* frame.h (frame_id_computed_p): Delete declaration.

gdb/testsuite/ChangeLog:

	PR python/22748
	* gdb.opt/inline-frame-tailcall.c: New file.
	* gdb.opt/inline-frame-tailcall.exp: New file.
	* gdb.python/py-unwind-inline.c: New file.
	* gdb.python/py-unwind-inline.exp: New file.
	* gdb.python/py-unwind-inline.py: New file.
pipcet pushed a commit that referenced this issue Jan 23, 2021
When GDB is waiting trying to connect to a remote target and it receives
a SIGWINCH (terminal gets resized), the blocking system call gets
interrupted and we abort.

For example, I connect to some port (on which nothing listens):

    (gdb) tar rem :1234
    ... GDB blocks here, resize the terminal ...
    🔢 Interrupted system call.

The backtrace where GDB is blocked while waiting for the connection to
establish is:

    #0  0x00007fe9db805b7b in select () from /usr/lib/libc.so.6
    #1  0x000055f2472e9c42 in gdb_select (n=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7ffe8fafe050) at /home/simark/src/binutils-gdb/gdb/posix-hdep.c:31
    #2  0x000055f24759c212 in wait_for_connect (sock=-1, polls=0x7ffe8fafe300) at /home/simark/src/binutils-gdb/gdb/ser-tcp.c:147
    #3  0x000055f24759d0e8 in net_open (scb=0x62500015b900, name=0x6020000601d8 ":1234") at /home/simark/src/binutils-gdb/gdb/ser-tcp.c:356
    #4  0x000055f2475a0395 in serial_open_ops_1 (ops=0x55f24892ca60 <tcp_ops>, open_name=0x6020000601d8 ":1234") at /home/simark/src/binutils-gdb/gdb/serial.c:244
    #5  0x000055f2475a01d6 in serial_open (name=0x6020000601d8 ":1234") at /home/simark/src/binutils-gdb/gdb/serial.c:231
    #6  0x000055f2474d5274 in remote_serial_open (name=0x6020000601d8 ":1234") at /home/simark/src/binutils-gdb/gdb/remote.c:5019
    #7  0x000055f2474d7025 in remote_target::open_1 (name=0x6020000601d8 ":1234", from_tty=1, extended_p=0) at /home/simark/src/binutils-gdb/gdb/remote.c:5571
    bminor#8  0x000055f2474d47d5 in remote_target::open (name=0x6020000601d8 ":1234", from_tty=1) at /home/simark/src/binutils-gdb/gdb/remote.c:4898
    bminor#9  0x000055f24776379f in open_target (args=0x6020000601d8 ":1234", from_tty=1, command=0x611000042bc0) at /home/simark/src/binutils-gdb/gdb/target.c:242

Fix that by using interruptible_select in wait_for_connect, instead of
gdb_select.  Resizing the terminal now no longer aborts the connection.
It is still possible to interrupt the connection using ctrl-c.

gdb/ChangeLog:

	* ser-tcp.c (wait_for_connect): Use interruptible_select instead
	of gdb_select.

Change-Id: Ie25577bd1e5699e4847b6b53fdfa10b8c0dc5c89
pipcet pushed a commit that referenced this issue Jan 30, 2021
When running test-case gdb.arch/i386-gnu-cfi.exp with target board unix/-m32, I get:
...
(gdb) up 3^M
79      abort.c: No such file or directory.^M
(gdb) FAIL: gdb.arch/i386-gnu-cfi.exp: shift up to the modified frame
...

The preceding backtrace looks like this:
...
(gdb) bt^M
 #0  0xf7fcf549 in __kernel_vsyscall ()^M
 #1  0xf7ce8896 in __libc_signal_restore_set (set=0xffffc3bc) at \
     ../sysdeps/unix/sysv/linux/internal-signals.h:104^M
 #2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:47^M
 #3  0xf7cd0314 in __GI_abort () at abort.c:79^M
 #4  0x0804919f in gate (gate=0x8049040 <abort@plt>, data=0x0) at gate.c:3^M
 #5  0x08049176 in main () at i386-gnu-cfi.c:27^M
...
with function gate at position #4, while on another system where the test passes,
I see instead function gate at position #3.

Fix this by capturing the position of function gate in the backtrace, and
using that in the rest of the test instead of hardcoded constant 3.

Tested on x86_64-linux.

gdb/testsuite/ChangeLog:

2021-01-28  Tom de Vries  <[email protected]>

	* gdb.arch/i386-gnu-cfi.exp: Capture the position of function gate
	in the backtrace, and use that in the rest of the test instead of
	hardcoded constant 3.  Use "frame" instead of "up" for robustness.
pipcet pushed a commit that referenced this issue Feb 3, 2021
Attaching in non-stop mode currently misbehaves, like so:

 (gdb) attach 1244450
 Attaching to process 1244450
 [New LWP 1244453]
 [New LWP 1244454]
 [New LWP 1244455]
 [New LWP 1244456]
 [New LWP 1244457]
 [New LWP 1244458]
 [New LWP 1244459]
 [New LWP 1244461]
 [New LWP 1244462]
 [New LWP 1244463]
 No unwaited-for children left.

At this point, GDB's stopped/running thread state is out of sync with
the inferior:

(gdb) info threads
  Id   Target Id                     Frame
* 1    LWP 1244450 "attach-non-stop" 0xf1b443bf in ?? ()
  2    LWP 1244453 "attach-non-stop" (running)
  3    LWP 1244454 "attach-non-stop" (running)
  4    LWP 1244455 "attach-non-stop" (running)
  5    LWP 1244456 "attach-non-stop" (running)
  6    LWP 1244457 "attach-non-stop" (running)
  7    LWP 1244458 "attach-non-stop" (running)
  8    LWP 1244459 "attach-non-stop" (running)
  9    LWP 1244461 "attach-non-stop" (running)
  10   LWP 1244462 "attach-non-stop" (running)
  11   LWP 1244463 "attach-non-stop" (running)
(gdb)
(gdb) interrupt -a
(gdb)
*nothing*

The problem is that attaching installs an inferior continuation,
called when the target reports the initial attach stop, here, in
inf-loop.c:inferior_event_handler:

      /* Do all continuations associated with the whole inferior (not
	 a particular thread).  */
      if (inferior_ptid != null_ptid)
	do_all_inferior_continuations (0);

However, currently in non-stop mode, inferior_ptid is still null_ptid
when we get here.

If you try to do "set debug infrun 1" to debug the problem, however,
then the attach completes correctly, with GDB reporting a stop for
each thread.

The bug is that we're missing a switch_to_thread/context_switch call
when handling the initial stop, here:

  if (stop_soon == STOP_QUIETLY_NO_SIGSTOP
      && (ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_STOP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_TRAP
	  || ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_0))
    {
      stop_print_frame = true;
      stop_waiting (ecs);
      ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
      return;
    }

Note how the STOP_QUIETLY / STOP_QUIETLY_REMOTE case above that does
call context_switch.

And the reason "set debug infrun 1" "fixes" it, is that the debug path
has a switch_to_thread call.

This patch fixes it by moving the main context_switch call earlier.
It also removes the:

   if (ecs->ptid != inferior_ptid)

check at the same time because:

 #1 - that is half of what context_switch already does

 #2 - deprecated_context_hook is only used in Insight, and all it does
      is set an int.  It won't care if we call it when the current
      thread hasn't actually changed.

A testcase exercising this will be added in a following patch.

gdb/ChangeLog:

	PR gdb/27055
	* infrun.c (handle_signal_stop): Move main context_switch call
	earlier, before STOP_QUIETLY_NO_SIGSTOP.
pipcet pushed a commit that referenced this issue Feb 3, 2021
With "target extended-remote" + "maint set target-non-stop", attaching
hangs like so:

 (gdb) attach 1244450
 Attaching to process 1244450
 [New Thread 1244450.1244450]
 [New Thread 1244450.1244453]
 [New Thread 1244450.1244454]
 [New Thread 1244450.1244455]
 [New Thread 1244450.1244456]
 [New Thread 1244450.1244457]
 [New Thread 1244450.1244458]
 [New Thread 1244450.1244459]
 [New Thread 1244450.1244461]
 [New Thread 1244450.1244462]
 [New Thread 1244450.1244463]
 * hang *

Attaching to the hung GDB shows that GDB is busy in an infinite loop
in stop_all_threads:

 (top-gdb) bt
 #0  stop_all_threads () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:4755
 #1  0x000055555597b424 in stop_waiting (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:7738
 #2  0x0000555555976fba in handle_signal_stop (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:5868
 #3  0x0000555555975f6a in handle_inferior_event (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:5527
 #4  0x0000555555971da4 in fetch_inferior_event () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:3910
 #5  0x00005555559540b2 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/pedro/gdb/binutils-gdb/src/gdb/inf-loop.c:42
 #6  0x000055555597e825 in infrun_async_inferior_event_handler (data=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:9162
 #7  0x0000555555687d1d in check_async_event_handlers () at /home/pedro/gdb/binutils-gdb/src/gdb/async-event.c:328
 bminor#8  0x0000555555e48284 in gdb_do_one_event () at /home/pedro/gdb/binutils-gdb/src/gdbsupport/event-loop.cc:216
 bminor#9  0x00005555559e7512 in start_event_loop () at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:347
 bminor#10 0x00005555559e765d in captured_command_loop () at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:407
 bminor#11 0x00005555559e8f80 in captured_main (data=0x7fffffffdb70) at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:1239
 bminor#12 0x00005555559e8ff2 in gdb_main (args=0x7fffffffdb70) at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:1254
 bminor#13 0x0000555555627c86 in main (argc=12, argv=0x7fffffffdc88) at /home/pedro/gdb/binutils-gdb/src/gdb/gdb.c:32

The problem is that the remote sends stops for all the threads:

 Packet received: l/home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/attach-non-stop/attach-non-stop
 Sending packet: $vStopped#55...Packet received: T0006:f06e25edec7f0000;07:f06e25edec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd2f;core:15;
 Sending packet: $vStopped#55...Packet received: T0006:f0dea5f0ec7f0000;07:f0dea5f0ec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd27;core:4;
 Sending packet: $vStopped#55...Packet received: T0006:f0ee25f1ec7f0000;07:f0ee25f1ec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd26;core:5;
 Sending packet: $vStopped#55...Packet received: T0006:f0bea5efec7f0000;07:f0bea5efec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd29;core:1;
 Sending packet: $vStopped#55...Packet received: T0006:f0ce25f0ec7f0000;07:f0ce25f0ec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd28;core:a;
 Sending packet: $vStopped#55...Packet received: T0006:f07ea5edec7f0000;07:f07ea5edec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd2e;core:f;
 Sending packet: $vStopped#55...Packet received: T0006:f0ae25efec7f0000;07:f0ae25efec7f0000;10:df4190ccf4550000;thread:p12fd22.12fd2a;core:6;
 Sending packet: $vStopped#55...Packet received: T0006:0000000000000000;07:c0e8a381fe7f0000;10:bf43b4f1ec7f0000;thread:p12fd22.12fd22;core:2;
 Sending packet: $vStopped#55...Packet received: T0006:f0fea5f1ec7f0000;07:f0fea5f1ec7f0000;10:df4190ccf4550000;thread:p12fd22.12fd25;core:8;
 Sending packet: $vStopped#55...Packet received: T0006:f09ea5eeec7f0000;07:f09ea5eeec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd2b;core:b;
 Sending packet: $vStopped#55...Packet received: OK

But then wait_one never consumes them, always hitting this path:

 4473          if (nfds == 0)
 4474            {
 4475              /* No waitable targets left.  All must be stopped.  */
 4476              return {NULL, minus_one_ptid, {TARGET_WAITKIND_NO_RESUMED}};
 4477            }

Resulting in GDB constanly calling target_stop to stop threads, but
the remote target never reporting back the stops to infrun.

That TARGET_WAITKIND_NO_RESUMED path shown above is always taken
because here, in wait_one too, just above:

 4428          for (inferior *inf : all_inferiors ())
 4429            {
 4430              process_stratum_target *target = inf->process_target ();
 4431              if (target == NULL
 4432                  || !target->is_async_p ()
                           ^^^^^^^^^^^^^^^^^^^^^
 4433                  || !target->threads_executing)
 4434                continue;

... the remote target is not async.

And in turn that happened because extended_remote_target::attach
misses enabling async in the target-non-stop path.

A testcase exercising this will be added in a following patch.

gdb/ChangeLog:

	* remote.c (extended_remote_target::attach): Set target async in
	the target-non-stop path too.
pipcet pushed a commit that referenced this issue Mar 28, 2021
…PR gdb/27147)

PR 27147 shows that on sparc64, GDB is unable to properly unwind:

Expected result (from GDB 9.2):

    #0  0x0000000000108de4 in puts ()
    #1  0x0000000000100950 in hello () at gdb-test.c:4
    #2  0x0000000000100968 in main () at gdb-test.c:8

Actual result (from GDB latest git):

    #0  0x0000000000108de4 in puts ()
    #1  0x0000000000100950 in hello () at gdb-test.c:4
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The first failing commit is 5b6d1e4 ("Multi-target support").  The cause
of the change in behavior is due to (thanks for Andrew Burgess for finding
this):

 - inferior_ptid is no longer set on entry of target_ops::wait, whereas
   it was set to something valid previously
 - deep down in linux_nat_target::wait (see stack trace below), we fetch
   the registers of the event thread
 - on sparc64, fetching registers involves reading memory (in
   sparc_supply_rwindow, see stack trace below)
 - reading memory (target_ops::xfer_partial) relies on inferior_ptid
   being set to the thread from which we want to read memory

This is where things go wrong:

    #0  linux_nat_target::xfer_partial (this=0x10000fa2c40 <the_sparc64_linux_nat_target>, object=TARGET_OBJECT_MEMORY, annex=0x0, readbuf=0x7feffe3b000 "", writebuf=0x0, offset=8791798050744, len=8, xfered_len=0x7feffe3ae88) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3697
    #1  0x00000100007f5b10 in raw_memory_xfer_partial (ops=0x10000fa2c40 <the_sparc64_linux_nat_target>, readbuf=0x7feffe3b000 "", writebuf=0x0, memaddr=8791798050744, len=8, xfered_len=0x7feffe3ae88) at /home/simark/src/binutils-gdb/gdb/target.c:912
    #2  0x00000100007f60e8 in memory_xfer_partial_1 (ops=0x10000fa2c40 <the_sparc64_linux_nat_target>, object=TARGET_OBJECT_MEMORY, readbuf=0x7feffe3b000 "", writebuf=0x0, memaddr=8791798050744, len=8, xfered_len=0x7feffe3ae88) at /home/simark/src/binutils-gdb/gdb/target.c:1043
    #3  0x00000100007f61b4 in memory_xfer_partial (ops=0x10000fa2c40 <the_sparc64_linux_nat_target>, object=TARGET_OBJECT_MEMORY, readbuf=0x7feffe3b000 "", writebuf=0x0, memaddr=8791798050744, len=8, xfered_len=0x7feffe3ae88) at /home/simark/src/binutils-gdb/gdb/target.c:1072
    #4  0x00000100007f6538 in target_xfer_partial (ops=0x10000fa2c40 <the_sparc64_linux_nat_target>, object=TARGET_OBJECT_MEMORY, annex=0x0, readbuf=0x7feffe3b000 "", writebuf=0x0, offset=8791798050744, len=8, xfered_len=0x7feffe3ae88) at /home/simark/src/binutils-gdb/gdb/target.c:1129
    #5  0x00000100007f7094 in target_read_partial (ops=0x10000fa2c40 <the_sparc64_linux_nat_target>, object=TARGET_OBJECT_MEMORY, annex=0x0, buf=0x7feffe3b000 "", offset=8791798050744, len=8, xfered_len=0x7feffe3ae88) at /home/simark/src/binutils-gdb/gdb/target.c:1375
    #6  0x00000100007f721c in target_read (ops=0x10000fa2c40 <the_sparc64_linux_nat_target>, object=TARGET_OBJECT_MEMORY, annex=0x0, buf=0x7feffe3b000 "", offset=8791798050744, len=8) at /home/simark/src/binutils-gdb/gdb/target.c:1415
    #7  0x00000100007f69d4 in target_read_memory (memaddr=8791798050744, myaddr=0x7feffe3b000 "", len=8) at /home/simark/src/binutils-gdb/gdb/target.c:1218
    bminor#8  0x0000010000758520 in sparc_supply_rwindow (regcache=0x10000fea4f0, sp=8791798050736, regnum=-1) at /home/simark/src/binutils-gdb/gdb/sparc-tdep.c:1960
    bminor#9  0x000001000076208c in sparc64_supply_gregset (gregmap=0x10000be3190 <sparc64_linux_ptrace_gregmap>, regcache=0x10000fea4f0, regnum=-1, gregs=0x7feffe3b230) at /home/simark/src/binutils-gdb/gdb/sparc64-tdep.c:1974
    bminor#10 0x0000010000751b64 in sparc_fetch_inferior_registers (regcache=0x10000fea4f0, regnum=80) at /home/simark/src/binutils-gdb/gdb/sparc-nat.c:170
    bminor#11 0x0000010000759d68 in sparc64_linux_nat_target::fetch_registers (this=0x10000fa2c40 <the_sparc64_linux_nat_target>, regcache=0x10000fea4f0, regnum=80) at /home/simark/src/binutils-gdb/gdb/sparc64-linux-nat.c:38
    bminor#12 0x00000100008146ec in target_fetch_registers (regcache=0x10000fea4f0, regno=80) at /home/simark/src/binutils-gdb/gdb/target.c:3287
    bminor#13 0x00000100006a8c5c in regcache::raw_update (this=0x10000fea4f0, regnum=80) at /home/simark/src/binutils-gdb/gdb/regcache.c:584
    bminor#14 0x00000100006a8d94 in readable_regcache::raw_read (this=0x10000fea4f0, regnum=80, buf=0x7feffe3b7c0 "") at /home/simark/src/binutils-gdb/gdb/regcache.c:598
    #15 0x00000100006a93b8 in readable_regcache::cooked_read (this=0x10000fea4f0, regnum=80, buf=0x7feffe3b7c0 "") at /home/simark/src/binutils-gdb/gdb/regcache.c:690
    #16 0x00000100006b288c in readable_regcache::cooked_read<unsigned long, void> (this=0x10000fea4f0, regnum=80, val=0x7feffe3b948) at /home/simark/src/binutils-gdb/gdb/regcache.c:777
    #17 0x00000100006a9b44 in regcache_cooked_read_unsigned (regcache=0x10000fea4f0, regnum=80, val=0x7feffe3b948) at /home/simark/src/binutils-gdb/gdb/regcache.c:791
    #18 0x00000100006abf3c in regcache_read_pc (regcache=0x10000fea4f0) at /home/simark/src/binutils-gdb/gdb/regcache.c:1295
    #19 0x0000010000507920 in save_stop_reason (lp=0x10000fc5b10) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:2612
    #20 0x00000100005095a4 in linux_nat_filter_event (lwpid=520983, status=1407) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3050
    #21 0x0000010000509f9c in linux_nat_wait_1 (ptid=..., ourstatus=0x7feffe3c8f0, target_options=...) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3194
    #22 0x000001000050b1d0 in linux_nat_target::wait (this=0x10000fa2c40 <the_sparc64_linux_nat_target>, ptid=..., ourstatus=0x7feffe3c8f0, target_options=...) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3432
    #23 0x00000100007f8ac0 in target_wait (ptid=..., status=0x7feffe3c8f0, options=...) at /home/simark/src/binutils-gdb/gdb/target.c:2000
    #24 0x00000100004ac17c in do_target_wait_1 (inf=0x1000116d280, ptid=..., status=0x7feffe3c8f0, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3464
    #25 0x00000100004ac3b8 in operator() (__closure=0x7feffe3c678, inf=0x1000116d280) at /home/simark/src/binutils-gdb/gdb/infrun.c:3527
    #26 0x00000100004ac7cc in do_target_wait (wait_ptid=..., ecs=0x7feffe3c8c8, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3540
    #27 0x00000100004ad8c4 in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:3880
    #28 0x0000010000485568 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/simark/src/binutils-gdb/gdb/inf-loop.c:42
    #29 0x000001000050d394 in handle_target_event (error=0, client_data=0x0) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:4060
    #30 0x0000010000ab5c8c in handle_file_event (file_ptr=0x10001207270, ready_mask=1) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:575
    #31 0x0000010000ab6334 in gdb_wait_for_event (block=0) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:701
    #32 0x0000010000ab487c in gdb_do_one_event () at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:212
    #33 0x0000010000542668 in start_event_loop () at /home/simark/src/binutils-gdb/gdb/main.c:348
    #34 0x000001000054287c in captured_command_loop () at /home/simark/src/binutils-gdb/gdb/main.c:408
    #35 0x0000010000544e84 in captured_main (data=0x7feffe3d188) at /home/simark/src/binutils-gdb/gdb/main.c:1242
    #36 0x0000010000544f2c in gdb_main (args=0x7feffe3d188) at /home/simark/src/binutils-gdb/gdb/main.c:1257
    #37 0x00000100000c1f14 in main (argc=4, argv=0x7feffe3d548) at /home/simark/src/binutils-gdb/gdb/gdb.c:32

There is a target_read_memory call in sparc_supply_rwindow, whose return
value is not checked.  That call fails, because inferior_ptid does not
contain a valid ptid, and uninitialized buffer contents is used.
Ultimately it results in a corrupt stop_pc.

target_ops::fetch_registers can be (and should remain, in my opinion)
independent of inferior_ptid, because the ptid of the thread from which
to fetch registers can be obtained from the regcache.  In other words,
implementations of target_ops::fetch_registers should not rely on
inferior_ptid having a sensible value on entry.

The sparc64_linux_nat_target::fetch_registers case is special, because it calls
a target method that is dependent on the inferior_ptid value
(target_read_inferior, and ultimately target_ops::xfer_partial).  So I would
say it's the responsibility of sparc64_linux_nat_target::fetch_registers to set
up inferior_ptid correctly prior to calling target_read_inferior.

This patch makes sparc64_linux_nat_target::fetch_registers (and
store_registers, since it works the same) temporarily set inferior_ptid.  If we
ever make target_ops::xfer_partial independent of inferior_ptid, setting
inferior_ptid won't be necessary, we'll simply pass down the ptid as a
parameter in some way.

I chose to set/restore inferior_ptid in sparc_fetch_inferior_registers, because
I am not convinced that doing so in an inner location (in sparc_supply_rwindow
for instance) would always be correct.  We have access to the ptid in
sparc_supply_rwindow (from the regcache), so we _could_ set inferior_ptid
there.  However, I don't want to just set inferior_ptid, as that would make it
not desync'ed with `current_thread ()` and `current_inferior ()`.  It's
preferable to use switch_to_thread instead, as that switches all the global
"current" stuff in a coherent way.  But doing so requires a `thread_info *`,
and getting a `thread_info *` from a ptid requires a `process_stratum_target
*`.  We could use `current_inferior()->process_target()` in
sparc_supply_rwindow for this (using target_read_memory uses the current
inferior's target stack anyway).  However, sparc_supply_rwindow is also used in
the context of BSD uthreads, where a thread stratum target defines threads.  I
presume the ptid in the regcache would be the ptid of the uthread, defined by
the thread stratum target (bsd_uthread_target).  Using
`current_inferior()->process_target()` would look up a ptid defined by the
thread stratum target using the process stratum target.  I don't think it would
give good results.  So I prefer playing it safe and looking up the thread
earlier, in sparc_fetch_inferior_registers.

I added some assertions (in sparc_supply_rwindow and others) to verify
that the regcache's ptid matches inferior_ptid.  That verifies that the
caller has properly set the correct global context.  This would have
caught (though a failed assertion) the current problem.

gdb/ChangeLog:

	PR gdb/27147
	* sparc-nat.h (sparc_fetch_inferior_registers): Add
	process_stratum_target parameter,
	sparc_store_inferior_registers): update callers.
	* sparc-nat.c (sparc_fetch_inferior_registers,
	sparc_store_inferior_registers): Add process_stratum_target
	parameter.  Switch current thread before calling
	sparc_supply_gregset / sparc_collect_rwindow.
	(sparc_store_inferior_registers): Likewise.
	* sparc-obsd-tdep.c (sparc32obsd_supply_uthread): Add assertion.
	(sparc32obsd_collect_uthread): Likewise.
	* sparc-tdep.c (sparc_supply_rwindow, sparc_collect_rwindow):
	Add assertion.
	* sparc64-obsd-tdep.c (sparc64obsd_collect_uthread,
	sparc64obsd_supply_uthread): Add assertion.

Change-Id: I16c658cd70896cea604516714f7e2428fbaf4301
pipcet pushed a commit that referenced this issue Mar 28, 2021
Running gdb-term.exp against gdbserver with "maint set target-non-stop
on", runs into this:

  [infrun] fetch_inferior_event: exit
  [infrun] fetch_inferior_event: enter
  /home/pedro/gdb/binutils-gdb/src/gdb/thread.c:72: internal-error: thread_info* inferior_thread(): Assertion `current_thread_ != nullptr' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.

  This is a bug, please report it.  For instructions, see:
  <https://www.gnu.org/software/gdb/bugs/>.

  FAIL: gdb.base/gdb-sigterm.exp: expect eof #2 (GDB internal error)
  Resyncing due to internal error.
  ERROR: : spawn id exp9 not open
      while executing
  "expect {
  -i exp9 -timeout 10
	      -re "Quit this debugging session\\? \\(y or n\\) $" {
		  send_gdb "n\n" answer
		  incr count
	      }
	      -re "Create ..."
      ("uplevel" body line 1)
      invoked from within
  "uplevel $body" NONE : spawn id exp9 not open
  ERROR: Could not resync from internal error (timeout)
  gdb.base/gdb-sigterm.exp: expect eof #2: stepped 0 times
  UNRESOLVED: gdb.base/gdb-sigterm.exp: 50 SIGTERM passes

The assertion fails here:

  ...
  #5  0x000055af4b4a7164 in internal_error (file=0x55af4b5e5de8 "/home/pedro/gdb/binutils-gdb/src/gdb/thread.c", line=72, fmt=0x55af4b5e5ce9 "%s: Assertion `%s' failed.") at /home/pedro/gdb/binutils-gdb/src/gdbsupport/errors.cc:55
  #6  0x000055af4b25fc43 in inferior_thread () at /home/pedro/gdb/binutils-gdb/src/gdb/thread.c:72
  #7  0x000055af4b26177e in any_thread_of_inferior (inf=0x55af4cf874f0) at /home/pedro/gdb/binutils-gdb/src/gdb/thread.c:638
  bminor#8  0x000055af4b26eec8 in kill_or_detach (inf=0x55af4cf874f0, from_tty=0) at /home/pedro/gdb/binutils-gdb/src/gdb/top.c:1665
  bminor#9  0x000055af4b26f37f in quit_force (exit_arg=0x0, from_tty=0) at /home/pedro/gdb/binutils-gdb/src/gdb/top.c:1767
  bminor#10 0x000055af4b2f72a7 in quit () at /home/pedro/gdb/binutils-gdb/src/gdb/utils.c:633
  bminor#11 0x000055af4b2f730b in maybe_quit () at /home/pedro/gdb/binutils-gdb/src/gdb/utils.c:657
  bminor#12 0x000055af4b1adb74 in ser_base_wait_for (scb=0x55af4d02e460, timeout=0) at /home/pedro/gdb/binutils-gdb/src/gdb/ser-base.c:236
  bminor#13 0x000055af4b1adf0f in do_ser_base_readchar (scb=0x55af4d02e460, timeout=0) at /home/pedro/gdb/binutils-gdb/src/gdb/ser-base.c:365
  bminor#14 0x000055af4b1ae06d in generic_readchar (scb=0x55af4d02e460, timeout=0, do_readchar=0x55af4b1adeb1 <do_ser_base_readchar(serial*, int)>) at /home/pedro/gdb/binutils-gdb/src/gdb/ser-base.c:444
  ...

The bug is that any_thread_of_inferior incorrectly assumes that
there's always a selected thread.  This fixes it.

gdb/ChangeLog:

	* thread.c (any_thread_of_inferior): Check if there's a selected
	thread before calling inferior_thread().

Change-Id: Ica4b9ec746121a7a7c22bef09baea72103b3853d
pipcet pushed a commit that referenced this issue Mar 28, 2021
When testing with "maint set target-non-stop on",
gdb.server/bkpt-other-inferior.exp sometimes fails like so:

 (gdb) inferior 2
 [Switching to inferior 2 [process 368191] (<noexec>)]
 [Switching to thread 2.1 (Thread 368191.368191)]
 [remote] Sending packet: $m7ffff7fd0100,1#5b
 [remote] Packet received: 48
 [remote] Sending packet: $m7ffff7fd0100,1#5b
 [remote] Packet received: 48
 [remote] Sending packet: $m7ffff7fd0100,9#63
 [remote] Packet received: 4889e7e8e80c000049
 #0  0x00007ffff7fd0100 in ?? ()
 (gdb) PASS: gdb.server/bkpt-other-inferior.exp: inf 2: switch to inferior
 break -q main
 Breakpoint 2 at 0x1138: file /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.server/server.c, line 21.
 (gdb) PASS: gdb.server/bkpt-other-inferior.exp: inf 2: set breakpoint
 delete breakpoints
 Delete all breakpoints? (y or n) y
 (gdb) [remote] wait: enter
 [remote] wait: exit
 FAIL: gdb.server/bkpt-other-inferior.exp: inf 2: delete all breakpoints in delete_breakpoints (timeout)
 ERROR: breakpoints not deleted
 Remote debugging from host ::1, port 55876
 monitor exit

The problem is here:

 (gdb) [remote] wait: enter

The testcase isn't expecting any output after the prompt.

Why is that "[remote] wait" output?  What happens is that "delete
breakpoints" queries the user, and `query` disables/reenables target
async, which results in the remote target's async event handler ending
up marked:

 (top-gdb) bt
 #0  mark_async_event_handler (async_handler_ptr=0x556bffffffff) at ../../src/gdb/async-event.c:295
 #1  0x0000556bf71b711f in infrun_async (enable=1) at ../../src/gdb/infrun.c:119
 #2  0x0000556bf7471387 in target_async (enable=1) at ../../src/gdb/target.c:3684
 #3  0x0000556bf748a0bd in gdb_readline_wrapper_cleanup::~gdb_readline_wrapper_cleanup (this=0x7ffe3cf30eb0, __in_chrg=<optimized out>) at ../../src/gdb/top.c:1074
 #4  0x0000556bf74874e2 in gdb_readline_wrapper (prompt=0x556bfa17da60 "Delete all breakpoints? (y or n) ") at ../../src/gdb/top.c:1096
 #5  0x0000556bf75111c5 in defaulted_query(const char *, char, typedef __va_list_tag __va_list_tag *) (ctlstr=0x556bf7717f34 "Delete all breakpoints? ", defchar=0 '\000', args=0x7ffe3cf31020) at ../../src/gdb/utils.c:893
 #6  0x0000556bf751166f in query (ctlstr=0x556bf7717f34 "Delete all breakpoints? ") at ../../src/gdb/utils.c:985
 #7  0x0000556bf6f11404 in delete_command (arg=0x0, from_tty=1) at ../../src/gdb/breakpoint.c:13500
 ...

... which then later results in a target_wait call:

 (top-gdb) bt
 #0  remote_target::wait_ns (this=0x7ffe3cf30f80, ptid=..., status=0xde530314f0802800, options=...) at ../../src/gdb/remote.c:7937
 #1  0x0000556bf7369dcb in remote_target::wait (this=0x556bfa0b2180, ptid=..., status=0x7ffe3cf31568, options=...) at ../../src/gdb/remote.c:8173
 #2  0x0000556bf745e527 in target_wait (ptid=..., status=0x7ffe3cf31568, options=...) at ../../src/gdb/target.c:2000
 #3  0x0000556bf71be686 in do_target_wait_1 (inf=0x556bfa1573d0, ptid=..., status=0x7ffe3cf31568, options=...) at ../../src/gdb/infrun.c:3463
 #4  0x0000556bf71be88b in <lambda(inferior*)>::operator()(inferior *) const (__closure=0x7ffe3cf31320, inf=0x556bfa1573d0) at ../../src/gdb/infrun.c:3526
 #5  0x0000556bf71bebcd in do_target_wait (wait_ptid=..., ecs=0x7ffe3cf31540, options=...) at ../../src/gdb/infrun.c:3539
 #6  0x0000556bf71bf97b in fetch_inferior_event () at ../../src/gdb/infrun.c:3879
 #7  0x0000556bf71a27f8 in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:42
 bminor#8  0x0000556bf71cc8b7 in infrun_async_inferior_event_handler (data=0x0) at ../../src/gdb/infrun.c:9220
 bminor#9  0x0000556bf6ecb80f in check_async_event_handlers () at ../../src/gdb/async-event.c:327
 bminor#10 0x0000556bf76b011a in gdb_do_one_event () at ../../src/gdbsupport/event-loop.cc:216
 ...

... which returns TARGET_WAITKIND_IGNORE.

Fix this by only enabling remote output around setting the breakpoint.

gdb/testsuite/ChangeLog:

	* gdb.server/bkpt-other-inferior.exp: Only enable remote output
	around setting the breakpoint.

Change-Id: I2fd152fd9c46b1c5e7fa678cc4d4054dac0b2bd4
pipcet pushed a commit that referenced this issue Jun 17, 2021
When building with AddressSanitizer, sim/m32c fails with:

./opc2c -l r8c.out /home/simark/src/binutils-gdb/sim/m32c/r8c.opc > r8c.c
sim_log: r8c.out

=================================================================
==3919390==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 4 byte(s) in 1 object(s) allocated from:
        #0 0x7ffff7677459 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:145
        #1 0x55555555b3df in main /home/simark/src/binutils-gdb/sim/m32c/opc2c.c:658
        #2 0x7ffff741fb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

Fix the leak in main by removing the vlist variable, which seems unused.
pipcet pushed a commit that referenced this issue Jun 17, 2021
This commit replaces this patch:

  https://sourceware.org/pipermail/gdb-patches/2021-January/174933.html

which was itself a replacement for this patch:

  https://sourceware.org/pipermail/gdb-patches/2020-July/170335.html

The motivation behind the original patch can be seen in the new test,
which currently gives a GDB session like this:

  (gdb) ptype var8
  type = Type type6
      PTR TO -> ( Type type2 :: ptr_1 )
      PTR TO -> ( Type type2 :: ptr_2 )
  End Type type6
  (gdb) ptype var8%ptr_2
  type = PTR TO -> ( Type type2
      integer(kind=4) :: spacer
      Type type1, allocatable :: t2_array(:)	<------ Issue #1
  End Type type2 )
  (gdb) ptype var8%ptr_2%t2_array
  Cannot access memory at address 0x38		<------ Issue #2
  (gdb)

Issue #1: Here we see the abstract dynamic type, rather than the
resolved concrete type.  Though in some cases the user might be
interested in the abstract dynamic type, I think that in most cases
showing the resolved concrete type will be of more use.  Plus, the
user can always figure out the dynamic type (by source code inspection
if nothing else) given the concrete type, but it is much harder to
figure out the concrete type given only the dynamic type.

Issue #2: In this example, GDB evaluates the expression in
EVAL_AVOID_SIDE_EFFECTS mode (due to ptype).  The value returned for
var8%ptr_2 will be a non-lazy, zero value of the correct dynamic
type.  However, when GDB asks about the type of t2_array this requires
GDB to access the value of var8%ptr_2 in order to read the dynamic
properties.  As this value was forced to zero (thanks to the use of
EVAL_AVOID_SIDE_EFFECTS) then GDB ends up accessing memory at a base
of zero plus some offset.

Both this patch, and my previous two attempts, have all tried to
resolve this problem by stopping EVAL_AVOID_SIDE_EFFECTS replacing the
result value with a zero value in some cases.

This new patch is influenced by how Ada handles its tagged typed.
There are plenty of examples in ada-lang.c, but one specific case is
ada_structop_operation::evaluate.  When GDB spots that we are dealing
with a tagged (dynamic) type, and we're in EVAL_AVOID_SIDE_EFFECTS
mode, then GDB re-evaluates the child operation in EVAL_NORMAL mode.

This commit handles two cases like this specifically for Fortran, a
new fortran_structop_operation, and the already existing
fortran_undetermined, which is where we handle array accesses.

In these two locations we spot when we are dealing with a dynamic type
and re-evaluate the child operation in EVAL_NORMAL mode so that we
are able to access the dynamic properties of the type.

The rest of this commit message is my attempt to record why my
previous patches failed.

To understand my second patch, and why it failed lets consider two
expressions, this Fortran expression:

  (gdb) ptype var8%ptr_2%t2_array	--<A>
  Operation: STRUCTOP_STRUCT		--(1)
   Operation: STRUCTOP_STRUCT		--(2)
    Operation: OP_VAR_VALUE		--(3)
     Symbol: var8
     Block: 0x3980ac0
    String: ptr_2
   String: t2_array

And this C expression:

  (gdb) ptype ptr && ptr->a == 3	--<B>
  Operation: BINOP_LOGICAL_AND		--(4)
   Operation: OP_VAR_VALUE		--(5)
    Symbol: ptr
    Block: 0x45a2a00
   Operation: BINOP_EQUAL		--(6)
    Operation: STRUCTOP_PTR		--(7)
     Operation: OP_VAR_VALUE		--(8)
      Symbol: ptr
      Block: 0x45a2a00
     String: a
    Operation: OP_LONG			--(9)
     Type: int
     Constant: 0x0000000000000003

In expression <A> we should assume that t2_array is of dynamic type.
Nothing has dynamic type in expression <B>.

This is how GDB currently handles expression <A>, in all cases,
EVAL_AVOID_SIDE_EFFECTS or EVAL_NORMAL, an OP_VAR_VALUE operation
always returns the real value of the symbol, this is not forced to a
zero value even in EVAL_AVOID_SIDE_EFFECTS mode.  This means that (3),
(5), and (8) will always return a real lazy value for the symbol.

However a STRUCTOP_STRUCT will always replace its result with a
non-lazy, zero value with the same type as its result.  So (2) will
lookup the field ptr_2 and create a zero value with that type.  In
this case the type is a pointer to a dynamic type.

Then, when we evaluate (1) to figure out the resolved type of
t2_array, we need to read the types dynamic properties.  These
properties are stored in memory relative to the objects base address,
and the base address is in var8%ptr_2, which we already figured out
has the value zero.  GDB then evaluates the DWARF expressions that
take the base address, add an offset and dereference.  GDB then ends
up trying to access addresses like 0x16, 0x8, etc.

To fix this, I proposed changing STRUCTOP_STRUCT so that instead of
returning a zero value we instead returned the actual value
representing the structure's field in the target.  My thinking was
that GDB would not try to access the value's contents unless it needed
it to resolve a dynamic type.  This belief was incorrect.

Consider expression <B>.  We already know that (5) and (8) will return
real values for the symbols being referenced.  The BINOP_LOGICAL_AND,
operation (4) will evaluate both of its children in
EVAL_AVOID_SIDE_EFFECTS in order to get the types, this is required
for C++ operator lookup.  This means that even if the value of (5)
would result in the BINOP_LOGICAL_AND returning false (say, ptr is
NULL), we still evaluate (6) in EVAL_AVOID_SIDE_EFFECTS mode.

Operation (6) will evaluate both children in EVAL_AVOID_SIDE_EFFECTS
mode, operation (9) is easy, it just returns a value with the constant
packed into it, but (7) is where the problem lies.  Currently in GDB
this STRUCTOP_STRUCT will always return a non-lazy zero value of the
correct type.

When the results of (7) and (9) are back in the BINOP_LOGICAL_AND
operation (6), the two values are passed to value_equal which performs
the comparison and returns a result.  Note, the two things compared
here are the immediate value (9), and a non-lazy zero value from (7).

However, with my proposed patch operation (7) no longer returns a zero
value, instead it returns a lazy value representing the actual value
in target memory.  When we call value_equal in (6) this code causes
GDB to try and fetch the actual value from target memory.  If `ptr` is
NULL then this will cause GDB to access some invalid address at an
offset from zero, this will most likely fail, and cause GDB to throw
an error instead of returning the expected type.

And so, we can now describe the problem that we're facing.  The way
GDB's expression evaluator is currently written we assume, when in
EVAL_AVOID_SIDE_EFFECTS mode, that any value returned from a child
operation can safely have its content read without throwing an
error.  If child operations start returning real values (instead of
the fake zero values), then this is simply not true.

If we wanted to work around this then we would need to rewrite almost
all operations (I would guess) so that EVAL_AVOID_SIDE_EFFECTS mode
does not cause evaluation of an operation to try and read the value of
a child operation.  As an example, consider this current GDB code from
eval.c:

  struct value *
  eval_op_equal (struct type *expect_type, struct expression *exp,
  	       enum noside noside, enum exp_opcode op,
  	       struct value *arg1, struct value *arg2)
  {
    if (binop_user_defined_p (op, arg1, arg2))
      {
        return value_x_binop (arg1, arg2, op, OP_NULL, noside);
      }
    else
      {
        binop_promote (exp->language_defn, exp->gdbarch, &arg1, &arg2);
        int tem = value_equal (arg1, arg2);
        struct type *type = language_bool_type (exp->language_defn,
  					      exp->gdbarch);
        return value_from_longest (type, (LONGEST) tem);
      }
  }

We could change this function to be this:

  struct value *
  eval_op_equal (struct type *expect_type, struct expression *exp,
  	       enum noside noside, enum exp_opcode op,
  	       struct value *arg1, struct value *arg2)
  {
    if (binop_user_defined_p (op, arg1, arg2))
      {
        return value_x_binop (arg1, arg2, op, OP_NULL, noside);
      }
    else
      {
        struct type *type = language_bool_type (exp->language_defn,
  					      exp->gdbarch);
        if (noside == EVAL_AVOID_SIDE_EFFECTS)
  	  return value_zero (type, VALUE_LVAL (arg1));
        else
  	{
  	  binop_promote (exp->language_defn, exp->gdbarch, &arg1, &arg2);
  	  int tem = value_equal (arg1, arg2);
  	  return value_from_longest (type, (LONGEST) tem);
  	}
      }
  }

Now we don't call value_equal unless we really need to.  However, we
would need to make the same, or similar change to almost all
operations, which would be a big task, and might not be a direction we
wanted to take GDB in.

So, for now, I'm proposing we go with the more targeted, Fortran
specific solution, that does the minimal required in order to
correctly resolve the dynamic types.

gdb/ChangeLog:

	* f-exp.h (class fortran_structop_operation): New class.
	* f-exp.y (exp): Create fortran_structop_operation instead of the
	generic structop_operation.
	* f-lang.c (fortran_undetermined::evaluate): Re-evaluate
	expression as EVAL_NORMAL if the result type was dynamic so we can
	extract the actual array bounds.
	(fortran_structop_operation::evaluate): New function.

gdb/testsuite/ChangeLog:

	* gdb.fortran/dynamic-ptype-whatis.exp: New file.
	* gdb.fortran/dynamic-ptype-whatis.f90: New file.
pipcet pushed a commit that referenced this issue Jun 17, 2021
While working on some changes to 'info sources' I ran into a situation
where I was seeing the same source files reported twice in the output
of the 'info sources' command when using either .gdb_index or the
.debug_name index.

I traced the problem back to some caching in
dwarf2_base_index_functions::map_symbol_filenames; when called GDB
caches the set of filenames, but, filesnames are not removed as the
index entries are expanded into full symtabs.  As a result we can end
up seeing filenames reported both from a full symtab _and_ from
a (stale) previously cached index entry.

Now, obviously, when seeing a problem like this the "correct" fix is
to remove the stale entries from the cache, however, I ran a few
experiments to see why this wasn't really hitting us anywhere, and, as
far as I can tell, ::map_symbol_filenames is only called from three
places:

  1. The mi command -file-list-exec-source-files,
  2. The 'info sources' command, and
  3. Filename completion

However, the result of this "bug" is that we will see duplicate
filenames, and readline's completion mechanism already removes
duplicates, so for case #3 we will never see any problems.

Cases #1 and #2 are basically the same, and in each case, to see a
problem we need to ensure we craft the test in a particular way, start
up ensuring we have some unexpected symtabs, then run one of the
commands to populate the cache, then expand one of the symtabs, and
list the sources again.  At this point you'll see duplicate entries in
the results.  Hardly surprising we haven't randomly hit this situation
in testing.

So, considering that use cases #1 and #2 are certainly not "high
performance" code (i.e. I don't think these justify the need for
caching) this leaves use case #3.  Does this use justify the need for
caching?  Well the psymbol_functions::map_symbol_filenames function
doesn't seem to do any extra caching, and within
dwarf2_base_index_functions::map_symbol_filenames, the only expensive
bit appears to be the call to dw2_get_file_names, and this already
does its own caching via this_cu->v.quick->file_names.

The upshot of all this analysis was that I'm not convinced the need
for the additional caching is justified, and so, I propose that to fix
the bug in GDB, I just remove the extra caching (for now).

If we later find that the caching _was_ useful, then we can
reintroduce it, but add it back such that it doesn't reintroduce this
bug.

As I was changing dwarf2_base_index_functions::map_symbol_filenames I
replaced the use of htab_up with std::unordered_set.

Tested using target_boards cc-with-debug-names and dwarf4-gdb-index.

gdb/ChangeLog:

	* dwarf2/read.c: Add 'unordered_set' include.
	(dwarf2_base_index_functions::map_symbol_filenames): Replace
	'visited' hash table with 'qfn_cache' unordered_set.  Remove use
	of per_Bfd->filenames_cache cache, and use function local
	filenames_cache instead.  Reindent.
	* dwarf2/read.h (struct dwarf2_per_bfd) <filenames_cache>: Delete.

gdb/testsuite/ChangeLog:

	* gdb.base/info_sources.exp: Add new tests.
pipcet pushed a commit that referenced this issue Jun 17, 2021
When loading the debug info package
libLLVM.so.10-10.0.1-lp152.30.4.x86_64.debug from openSUSE Leap 15.2, we
run into a dwarf error:
...
$ gdb -q -batch libLLVM.so.10-10.0.1-lp152.30.4.x86_64.debug
Dwarf Error: Cannot not find DIE at 0x18a936e7 \
  [from module libLLVM.so.10-10.0.1-lp152.30.4.x86_64.debug]
...
The DIE @ 0x18a936e7 does in fact exist, and is part of a CU @ 0x18a23e52.
No error message is printed when using -readnow.

What happens is the following:
- a dwarf2_per_cu_data P is created for the CU.
- a dwarf2_cu A is created for the same CU.
- another dwarf2_cu B is created for the same CU.
- the dwarf2_cu B is set in per_objfile->m_dwarf2_cus, such that
  per_objfile->get_cu (P) returns B.
- P->load_all_dies is set to 1.
- all dies are read into the A->partial_dies htab
- dwarf2_cu A is destroyed.
- we try to find the partial_die for the DIE @ 0x18a936e7 in B->partial_dies.
  We can't find it, but do not try to load all dies, because P->load_all_dies
  is already set to 1.
- an error message is generated.

The question is why we're creating dwarf2_cu A and B for the same CU.

The dwarf2_cu A is created here:
...
 (gdb) bt
 #0  dwarf2_cu::dwarf2_cu (this=0x79a9660, per_cu=0x23c0b30,
     per_objfile=0x1ad01b0) at dwarf2/cu.c:38
 #1  0x0000000000675799 in cutu_reader::cutu_reader (this=0x7fffffffd040,
     this_cu=0x23c0b30, per_objfile=0x1ad01b0, abbrev_table=0x0,
     existing_cu=0x0, skip_partial=false) at dwarf2/read.c:6487
 #2  0x0000000000676eb3 in process_psymtab_comp_unit (this_cu=0x23c0b30,
      per_objfile=0x1ad01b0, want_partial_unit=false,
      pretend_language=language_minimal) at dwarf2/read.c:7028
...

And the dwarf2_cu B is created here:
...
 (gdb) bt
 #0  dwarf2_cu::dwarf2_cu (this=0x885e8c0, per_cu=0x23c0b30,
     per_objfile=0x1ad01b0) at dwarf2/cu.c:38
 #1  0x0000000000675799 in cutu_reader::cutu_reader (this=0x7fffffffcc50,
     this_cu=0x23c0b30, per_objfile=0x1ad01b0, abbrev_table=0x0,
     existing_cu=0x0, skip_partial=false) at dwarf2/read.c:6487
 #2  0x0000000000678118 in load_partial_comp_unit (this_cu=0x23c0b30,
     per_objfile=0x1ad01b0, existing_cu=0x0) at dwarf2/read.c:7436
 #3  0x000000000069721d in find_partial_die (sect_off=(unknown: 0x18a55054),
     offset_in_dwz=0, cu=0x0) at dwarf2/read.c:19391
 #4  0x000000000069755b in partial_die_info::fixup (this=0x9096900,
     cu=0xa6a85f0) at dwarf2/read.c:19512
 #5  0x0000000000697586 in partial_die_info::fixup (this=0x8629bb0,
     cu=0xa6a85f0) at dwarf2/read.c:19516
 #6  0x00000000006787b1 in scan_partial_symbols (first_die=0x8629b40,
     lowpc=0x7fffffffcf58, highpc=0x7fffffffcf50, set_addrmap=0, cu=0x79a9660)
     at dwarf2/read.c:7563
 #7  0x0000000000678878 in scan_partial_symbols (first_die=0x796ebf0,
     lowpc=0x7fffffffcf58, highpc=0x7fffffffcf50, set_addrmap=0, cu=0x79a9660)
     at dwarf2/read.c:7580
 bminor#8  0x0000000000676b82 in process_psymtab_comp_unit_reader
     (reader=0x7fffffffd040, info_ptr=0x7fffc1b3f29b, comp_unit_die=0x6ea90f0,
     pretend_language=language_minimal) at dwarf2/read.c:6954
 bminor#9  0x0000000000676ffd in process_psymtab_comp_unit (this_cu=0x23c0b30,
     per_objfile=0x1ad01b0, want_partial_unit=false,
     pretend_language=language_minimal) at dwarf2/read.c:7057
...

So in frame bminor#9, a cutu_reader is created with dwarf2_cu A.  Then a fixup takes
us to the following CU @ 0x18aa33d6, in frame #5.  And a similar fixup in
frame #4 takes us back to CU @ 0x18a23e52.  At that point, there's no
information available that we're already trying to read that CU, and we end up
creating another cutu_reader with dwarf2_cu B.

It seems that there are two related problems:
- creating two dwarf2_cu's is not optimal
- the unoptimal case is not handled correctly

This patch addresses the last problem, by moving the load_all_dies flag from
dwarf2_per_cu_data to dwarf2_cu, such that it is paired with the partial_dies
field, which ensures that the two can be kept in sync.

Tested on x86_64-linux.

gdb/ChangeLog:

2021-05-27  Tom de Vries  <[email protected]>

	PR symtab/27898
	* dwarf2/cu.c (dwarf2_cu::dwarf2_cu): Add load_all_dies init.
	* dwarf2/cu.h (dwarf2_cu): Add load_all_dies field.
	* dwarf2/read.c (load_partial_dies, find_partial_die): Update.
	* dwarf2/read.h (dwarf2_per_cu_data::dwarf2_per_cu_data): Remove
	load_all_dies init.
	(dwarf2_per_cu_data): Remove load_all_dies field.
pipcet pushed a commit that referenced this issue Jun 17, 2021
Building GDB with current git (future 13) Clang runs into these two
issues:

#1:

 src/gdb/symtab.h:1139:3: error: definition of implicit copy assignment operator for 'symbol' is deprecated because it has a user-declared copy constructor [-Werror,-Wdeprecated-copy]
   symbol (const symbol &) = default;
   ^

#2:

 src/gdb/dwarf2/read.c:834:23: error: definition of implicit copy constructor for 'partial_die_info' is deprecated because it has a user-declared copy assignment operator [-Werror,-Wdeprecated-copy]
     partial_die_info& operator=(const partial_die_info& rhs) = delete;
		       ^

Fix them by adding the explicit defaulted versions of copy ctor and
copy-assign op appropriately.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	* dwarf2/read.c (struct partial_die_info): Add defaulted copy
	ctor.
	* symtab.h (struct symbol): Add defaulted copy assignment
	operator.
pipcet pushed a commit that referenced this issue Jun 17, 2021
… when attaching / handling a fork child

When trying to attach to a pthread process on a Linux system with glibc 2.33,
we get:

    $ ./gdb -q -nx --data-directory=data-directory -p 1472010
    Attaching to process 1472010
    [New LWP 1472013]
    [New LWP 1472014]
    [New LWP 1472015]
    Error while reading shared library symbols for /usr/lib/libpthread.so.0:
    Cannot find user-level thread for LWP 1472015: generic error
    0x00007ffff6d3637f in poll () from /usr/lib/libc.so.6
    (gdb)

When attaching to a process (or handling a fork child, an operation very
similar to attaching), GDB reads the shared library list from the
process.  For each shared library (if "set auto-solib-add" is on), it
reads its symbols and calls the "new_objfile" observable.

The libthread-db code monitors this observable, and if it sees an
objfile named somewhat like "libpthread.so" go by, it tries to load
libthread_db.so in the GDB process itself.  libthread_db knows how to
navigate libpthread's data structures to get information about the
existing threads.

To locate these data structures, libthread_db calls ps_pglobal_lookup
(implemented in proc-service.c), passing in a symbol name and expecting
an address in return.

Before glibc 2.33, libthread_db always asked for symbols found in
libpthread.  There was no ordering problem: since we were always trying
to load libthread_db in reaction to processing libpthread (and reading
in its symbols) and libthread_db only asked symbols from libpthread, the
requested symbols could always be found.  Starting with glibc 2.33,
libthread_db now asks for a symbol name that can be found in
/lib/ld-linux-x86-64.so.2 (_rtld_global).  And the ordering in which GDB
reads the shared libraries from the inferior when attaching is
unfortunate, in that libpthread is processed before ld-linux.  So when
loading libthread_db in reaction to processing libpthread, and
libthread_db requests the symbol that is from ld-linux, GDB is not yet
able to supply it.

That problematic symbol lookup happens in the thread_from_lwp function,
when we call td_ta_map_lwp2thr_p, and an exception is thrown at this
point:

    #0  0x00007ffff6681012 in __cxxabiv1::__cxa_throw (obj=0x60e000006100, tinfo=0x555560033b50 <typeinfo for gdb_exception_error>, dest=0x55555d9404bc <gdb_exception_error::~gdb_exception_error()>) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:78
    #1  0x000055555e5d3734 in throw_it(return_reason, errors, const char *, typedef __va_list_tag __va_list_tag *) (reason=RETURN_ERROR, error=GENERIC_ERROR, fmt=0x55555f0c5360 "Cannot find user-level thread for LWP %ld: %s", ap=0x7fffffffaae0) at /home/simark/src/binutils-gdb/gdbsupport/common-exceptions.cc:200
    #2  0x000055555e5d37d4 in throw_verror (error=GENERIC_ERROR, fmt=0x55555f0c5360 "Cannot find user-level thread for LWP %ld: %s", ap=0x7fffffffaae0) at /home/simark/src/binutils-gdb/gdbsupport/common-exceptions.cc:208
    #3  0x000055555e0b0ed2 in verror (string=0x55555f0c5360 "Cannot find user-level thread for LWP %ld: %s", args=0x7fffffffaae0) at /home/simark/src/binutils-gdb/gdb/utils.c:171
    #4  0x000055555e5e898a in error (fmt=0x55555f0c5360 "Cannot find user-level thread for LWP %ld: %s") at /home/simark/src/binutils-gdb/gdbsupport/errors.cc:43
    #5  0x000055555d06b4bc in thread_from_lwp (stopped=0x617000035d80, ptid=...) at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:418
    #6  0x000055555d07040d in try_thread_db_load_1 (info=0x60c000011140) at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:912
    #7  0x000055555d071103 in try_thread_db_load (library=0x55555f0c62a0 "libthread_db.so.1", check_auto_load_safe=false) at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:1014
    bminor#8  0x000055555d072168 in try_thread_db_load_from_sdir () at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:1091
    bminor#9  0x000055555d072d1c in thread_db_load_search () at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:1146
    bminor#10 0x000055555d07365c in thread_db_load () at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:1203
    bminor#11 0x000055555d07373e in check_for_thread_db () at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:1246
    bminor#12 0x000055555d0738ab in thread_db_new_objfile (objfile=0x61300000c0c0) at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:1275
    bminor#13 0x000055555bd10740 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x616000068d88: 0x55555d073745 <thread_db_new_objfile(objfile*)>) at /usr/include/c++/10.2.0/bits/invoke.h:60
    bminor#14 0x000055555bd02096 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x616000068d88: 0x55555d073745 <thread_db_new_objfile(objfile*)>) at /usr/include/c++/10.2.0/bits/invoke.h:153
    #15 0x000055555bce0392 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffb4a0: 0x61300000c0c0) at /usr/include/c++/10.2.0/bits/std_function.h:291
    #16 0x000055555d3595c0 in std::function<void (objfile*)>::operator()(objfile*) const (this=0x616000068d88, __args#0=0x61300000c0c0) at /usr/include/c++/10.2.0/bits/std_function.h:622
    #17 0x000055555d356b7f in gdb::observers::observable<objfile*>::notify (this=0x555566727020 <gdb::observers::new_objfile>, args#0=0x61300000c0c0) at /home/simark/src/binutils-gdb/gdb/../gdbsupport/observable.h:106
    #18 0x000055555da3f228 in symbol_file_add_with_addrs (abfd=0x61200001ccc0, name=0x6190000d9090 "/usr/lib/libpthread.so.0", add_flags=..., addrs=0x7fffffffbc10, flags=..., parent=0x0) at /home/simark/src/binutils-gdb/gdb/symfile.c:1131
    #19 0x000055555da3f763 in symbol_file_add_from_bfd (abfd=0x61200001ccc0, name=0x6190000d9090 "/usr/lib/libpthread.so.0", add_flags=<error reading variable: Cannot access memory at address 0xffffffffffffffb0>, addrs=0x7fffffffbc10, flags=<error reading variable: Cannot access memory at address 0xffffffffffffffc0>, parent=0x0) at /home/simark/src/binutils-gdb/gdb/symfile.c:1167
    #20 0x000055555d95f9fa in solib_read_symbols (so=0x6190000d8e80, flags=...) at /home/simark/src/binutils-gdb/gdb/solib.c:681
    #21 0x000055555d96233d in solib_add (pattern=0x0, from_tty=0, readsyms=1) at /home/simark/src/binutils-gdb/gdb/solib.c:987
    #22 0x000055555d93646e in enable_break (info=0x608000008f20, from_tty=0) at /home/simark/src/binutils-gdb/gdb/solib-svr4.c:2238
    #23 0x000055555d93cfc0 in svr4_solib_create_inferior_hook (from_tty=0) at /home/simark/src/binutils-gdb/gdb/solib-svr4.c:3049
    #24 0x000055555d96610d in solib_create_inferior_hook (from_tty=0) at /home/simark/src/binutils-gdb/gdb/solib.c:1195
    #25 0x000055555cdee318 in post_create_inferior (from_tty=0) at /home/simark/src/binutils-gdb/gdb/infcmd.c:318
    #26 0x000055555ce00e6e in setup_inferior (from_tty=0) at /home/simark/src/binutils-gdb/gdb/infcmd.c:2439
    #27 0x000055555ce59c34 in handle_one (event=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:4887
    #28 0x000055555ce5cd00 in stop_all_threads () at /home/simark/src/binutils-gdb/gdb/infrun.c:5064
    #29 0x000055555ce7f0da in stop_waiting (ecs=0x7fffffffd170) at /home/simark/src/binutils-gdb/gdb/infrun.c:8006
    #30 0x000055555ce67f5c in handle_signal_stop (ecs=0x7fffffffd170) at /home/simark/src/binutils-gdb/gdb/infrun.c:6062
    #31 0x000055555ce63653 in handle_inferior_event (ecs=0x7fffffffd170) at /home/simark/src/binutils-gdb/gdb/infrun.c:5727
    #32 0x000055555ce4f297 in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4105
    #33 0x000055555cdbe3bf in inferior_event_handler (event_type=INF_REG_EVENT) at /home/simark/src/binutils-gdb/gdb/inf-loop.c:42
    #34 0x000055555d018047 in handle_target_event (error=0, client_data=0x0) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:4060
    #35 0x000055555e5ea77e in handle_file_event (file_ptr=0x60600008b1c0, ready_mask=1) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:575
    #36 0x000055555e5eb09c in gdb_wait_for_event (block=0) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:701
    #37 0x000055555e5e8d19 in gdb_do_one_event () at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:212
    #38 0x000055555dd6e0d4 in wait_sync_command_done () at /home/simark/src/binutils-gdb/gdb/top.c:528
    #39 0x000055555dd6e372 in maybe_wait_sync_command_done (was_sync=0) at /home/simark/src/binutils-gdb/gdb/top.c:545
    #40 0x000055555d0ec7c8 in catch_command_errors (command=0x55555ce01bb8 <attach_command(char const*, int)>, arg=0x7fffffffe28d "1472010", from_tty=1, do_bp_actions=false) at /home/simark/src/binutils-gdb/gdb/main.c:452
    #41 0x000055555d0f03ad in captured_main_1 (context=0x7fffffffdd10) at /home/simark/src/binutils-gdb/gdb/main.c:1149
    #42 0x000055555d0f1239 in captured_main (data=0x7fffffffdd10) at /home/simark/src/binutils-gdb/gdb/main.c:1232
    #43 0x000055555d0f1315 in gdb_main (args=0x7fffffffdd10) at /home/simark/src/binutils-gdb/gdb/main.c:1257
    #44 0x000055555bb70cf9 in main (argc=7, argv=0x7fffffffde88) at /home/simark/src/binutils-gdb/gdb/gdb.c:32

The exception is caught here:

    #0  __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x60e0000060e0) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_catch.cc:84
    #1  0x000055555d95fded in solib_read_symbols (so=0x6190000d8e80, flags=...) at /home/simark/src/binutils-gdb/gdb/solib.c:689
    #2  0x000055555d96233d in solib_add (pattern=0x0, from_tty=0, readsyms=1) at /home/simark/src/binutils-gdb/gdb/solib.c:987
    #3  0x000055555d93646e in enable_break (info=0x608000008f20, from_tty=0) at /home/simark/src/binutils-gdb/gdb/solib-svr4.c:2238
    #4  0x000055555d93cfc0 in svr4_solib_create_inferior_hook (from_tty=0) at /home/simark/src/binutils-gdb/gdb/solib-svr4.c:3049
    #5  0x000055555d96610d in solib_create_inferior_hook (from_tty=0) at /home/simark/src/binutils-gdb/gdb/solib.c:1195
    #6  0x000055555cdee318 in post_create_inferior (from_tty=0) at /home/simark/src/binutils-gdb/gdb/infcmd.c:318
    #7  0x000055555ce00e6e in setup_inferior (from_tty=0) at /home/simark/src/binutils-gdb/gdb/infcmd.c:2439
    bminor#8  0x000055555ce59c34 in handle_one (event=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:4887
    bminor#9  0x000055555ce5cd00 in stop_all_threads () at /home/simark/src/binutils-gdb/gdb/infrun.c:5064
    bminor#10 0x000055555ce7f0da in stop_waiting (ecs=0x7fffffffd170) at /home/simark/src/binutils-gdb/gdb/infrun.c:8006
    bminor#11 0x000055555ce67f5c in handle_signal_stop (ecs=0x7fffffffd170) at /home/simark/src/binutils-gdb/gdb/infrun.c:6062
    bminor#12 0x000055555ce63653 in handle_inferior_event (ecs=0x7fffffffd170) at /home/simark/src/binutils-gdb/gdb/infrun.c:5727
    bminor#13 0x000055555ce4f297 in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4105
    bminor#14 0x000055555cdbe3bf in inferior_event_handler (event_type=INF_REG_EVENT) at /home/simark/src/binutils-gdb/gdb/inf-loop.c:42
    #15 0x000055555d018047 in handle_target_event (error=0, client_data=0x0) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:4060
    #16 0x000055555e5ea77e in handle_file_event (file_ptr=0x60600008b1c0, ready_mask=1) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:575
    #17 0x000055555e5eb09c in gdb_wait_for_event (block=0) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:701
    #18 0x000055555e5e8d19 in gdb_do_one_event () at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:212
    #19 0x000055555dd6e0d4 in wait_sync_command_done () at /home/simark/src/binutils-gdb/gdb/top.c:528
    #20 0x000055555dd6e372 in maybe_wait_sync_command_done (was_sync=0) at /home/simark/src/binutils-gdb/gdb/top.c:545
    #21 0x000055555d0ec7c8 in catch_command_errors (command=0x55555ce01bb8 <attach_command(char const*, int)>, arg=0x7fffffffe28d "1472010", from_tty=1, do_bp_actions=false) at /home/simark/src/binutils-gdb/gdb/main.c:452
    #22 0x000055555d0f03ad in captured_main_1 (context=0x7fffffffdd10) at /home/simark/src/binutils-gdb/gdb/main.c:1149
    #23 0x000055555d0f1239 in captured_main (data=0x7fffffffdd10) at /home/simark/src/binutils-gdb/gdb/main.c:1232
    #24 0x000055555d0f1315 in gdb_main (args=0x7fffffffdd10) at /home/simark/src/binutils-gdb/gdb/main.c:1257
    #25 0x000055555bb70cf9 in main (argc=7, argv=0x7fffffffde88) at /home/simark/src/binutils-gdb/gdb/gdb.c:32

Catching the exception at this point means that the thread_db_info
object for this inferior will be left in place, despite the failure to
load libthread_db.  This means that there won't be further attempts at
loading libthread_db, because thread_db_load will think that
libthread_db is already loaded for this inferior and will always exit
early.  To fix this, add a try/catch around calling try_thread_db_load_1
in try_thread_db_load, such that if some exception is thrown while
trying to load libthread_db, we reset / delete the thread_db_info for
that inferior.  That alone makes attach work fine again, because
check_for_thread_db is called again in the thread_db_inferior_created
observer (that happens after we learned about all shared libraries and
their symbols), and libthread_db is successfully loaded then.

When attaching, I think that the inferior_created observer is a good
place to try to load libthread_db: it is called once everything has
stabilized, when we learned about all shared libraries.

The only problem then is that when we first try (and fail) to load
libthread_db, in reaction to learning about libpthread, we show this
warning:

    warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

This is misleading, because we do succeed in loading it later.  So when
attaching, I think we shouldn't try to load libthread_db in reaction to
the new_objfile events, we should wait until we have learned about all
shared libraries (using the inferior_created observable).  To do so, add
an `in_initial_library_scan` flag to struct inferior.  This flag is used
to postpone loading libthread_db if we are attaching or handling a fork
child.

When debugging remotely with GDBserver, the same problem happens, except
that the qSymbol mechanism (allowing the remote side to ask GDB for
symbols values) is involved.  The fix there is the same idea, we make
GDB wait until all shared libraries and their symbols are known before
sending out a qSymbol packet.  This way, we never present the remote
side a state where libpthread.so's symbols are known but ld-linux's
symbols aren't.

gdb/ChangeLog:

	* inferior.h (class inferior) <in_initial_library_scan>: New.
	* infcmd.c (post_create_inferior): Set in_initial_library_scan.
	* infrun.c (follow_fork_inferior): Likewise.
	* linux-thread-db.c (try_thread_db_load): Catch exception thrown
	by try_thread_db_load_1
	(thread_db_load): Return early if in_initial_library_scan is
	set.
	* remote.c (remote_new_objfile): Return early if
	in_initial_library_scan is set.

Change-Id: I7a279836cfbb2b362b4fde11b196b4aab82f5efb
pipcet pushed a commit that referenced this issue Jun 17, 2021
One consequence of changing libpthread_name_p() in solib.c to (also)
match libc is that the symbols for libc will now be loaded by
solib_add() in solib.c.  I think this is mostly harmless because
we'll likely want these symbols to be loaded anyway, but it did cause
two failures in gdb.base/print-symbol-loading.exp.

Specifically...

1)

sharedlibrary .*
(gdb) PASS: gdb.base/print-symbol-loading.exp: shlib off: load shared-lib

now looks like this:

sharedlibrary .*
Symbols already loaded for /lib64/libc.so.6
(gdb) PASS: gdb.base/print-symbol-loading.exp: shlib off: load shared-lib

2)

sharedlibrary .*
Loading symbols for shared libraries: .*
(gdb) PASS: gdb.base/print-symbol-loading.exp: shlib brief: load shared-lib

now looks like this:

sharedlibrary .*
Loading symbols for shared libraries: .*
Symbols already loaded for /lib64/libc.so.6
(gdb) PASS: gdb.base/print-symbol-loading.exp: shlib brief: load shared-lib

Fixing case #2 ended up being easier than #1.  #1 had been using
gdb_test_no_output to correctly match this no-output case.  I
ended up replacing it with gdb_test_multiple, matching the exact
expected output for each of the two now acceptable cases.

For case #2, I simply added an optional non-capturing group
for the potential new output.

gdb/testsuite/ChangeLog:

	* gdb.base/print-symbol-loading.exp (proc test_load_shlib):
	Allow "Symbols already loaded for..." messages.
pipcet pushed a commit that referenced this issue Jun 30, 2021
When loading a mach-o (macOS) executable and trying to set a breakpoint,
a GDB built with ASan or -D_GLIBCXX_DEBUG will crash with an
out-of-bound vector access.  This can be reproduced on Linux using the
repro files in bug 28017 [1]:

    $ ./gdb -nx --data-directory=data-directory -q repro/test -ex "b main" -batch
    /usr/include/c++/11.1.0/debug/vector:445:
    In function:
        std::__debug::vector<_Tp, _Allocator>::const_reference
        std::__debug::vector<_Tp,
        _Allocator>::operator[](std::__debug::vector<_Tp,
        _Allocator>::size_type) const [with _Tp = long unsigned int; _Allocator
        = std::allocator<long unsigned int>; std::__debug::vector<_Tp,
        _Allocator>::const_reference = const long unsigned int&;
        std::__debug::vector<_Tp, _Allocator>::size_type = long unsigned int]

    Error: attempt to subscript container with out-of-bounds index 13, but
    container only holds 13 elements.

    Objects involved in the operation:
        sequence "this" @ 0x0x61300000a590 {
          type = std::__debug::vector<unsigned long, std::allocator<unsigned long> >;
        }

The out-of-bound access happens here:

    #0  0x00007ffff6405d22 in raise () from /usr/lib/libc.so.6
    #1  0x00007ffff63ef862 in abort () from /usr/lib/libc.so.6
    #2  0x00007ffff664e21e in __gnu_debug::_Error_formatter::_M_error() const [clone .cold] from /usr/lib/libstdc++.so.6
    #3  0x000055555699e5ff in std::__debug::vector<unsigned long, std::allocator<unsigned long> >::operator[] (this=0x61300000a590, __n=13) at /usr/include/c++/11.1.0/debug/vector:445
    #4  0x0000555556a58c17 in objfile::section_offset (this=0x61300000a4c0, section=0x55555bbe4ac0 <_bfd_std_section>) at /home/simark/src/binutils-gdb/gdb/objfiles.h:644
    #5  0x0000555556a58cac in obj_section::offset (this=0x62100016d2a8) at /home/simark/src/binutils-gdb/gdb/objfiles.h:838
    #6  0x0000555556a58cfa in obj_section::addr (this=0x62100016d2a8) at /home/simark/src/binutils-gdb/gdb/objfiles.h:850
    #7  0x000055555779f5f7 in sort_cmp (sect1=0x62100016d2a8, sect2=0x62100016d170) at /home/simark/src/binutils-gdb/gdb/objfiles.c:902
    bminor#8  0x00005555577aae35 in __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(obj_section const*, obj_section const*)>::operator()<obj_section**, obj_section**> (this=0x7fffffffa9e0, __it1=0x60c000015970, __it2=0x60c000015940) at /usr/include/c++/11.1.0/bits/predefined_ops.h:158
    bminor#9  0x00005555577aa2b8 in std::__insertion_sort<obj_section**, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(obj_section const*, obj_section const*)> > (__first=0x60c000015940, __last=0x60c0000159c0, __comp=...) at /usr/include/c++/11.1.0/bits/stl_algo.h:1826
    bminor#10 0x00005555577a8e26 in std::__final_insertion_sort<obj_section**, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(obj_section const*, obj_section const*)> > (__first=0x60c000015940, __last=0x60c0000159c0, __comp=...) at /usr/include/c++/11.1.0/bits/stl_algo.h:1871
    bminor#11 0x00005555577a723c in std::__sort<obj_section**, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(obj_section const*, obj_section const*)> > (__first=0x60c000015940, __last=0x60c0000159c0, __comp=...) at /usr/include/c++/11.1.0/bits/stl_algo.h:1957
    bminor#12 0x00005555577a50f4 in std::sort<obj_section**, bool (*)(obj_section const*, obj_section const*)> (__first=0x60c000015940, __last=0x60c0000159c0, __comp=0x55555779f4e7 <sort_cmp(obj_section const*, obj_section const*)>) at /usr/include/c++/11.1.0/bits/stl_algo.h:4875
    bminor#13 0x00005555577a147e in update_section_map (pspace=0x61200001d2c0, pmap=0x6030000d40b0, pmap_size=0x6030000d40b8) at /home/simark/src/binutils-gdb/gdb/objfiles.c:1165
    bminor#14 0x00005555577a19a0 in find_pc_section (pc=0x100003fa0) at /home/simark/src/binutils-gdb/gdb/objfiles.c:1212
    #15 0x00005555576dd39e in lookup_minimal_symbol_by_pc_section (pc_in=0x100003fa0, section=0x0, prefer=lookup_msym_prefer::TEXT, previous=0x0) at /home/simark/src/binutils-gdb/gdb/minsyms.c:750
    #16 0x00005555576de552 in lookup_minimal_symbol_by_pc (pc=0x100003fa0) at /home/simark/src/binutils-gdb/gdb/minsyms.c:986
    #17 0x0000555557d44b54 in find_pc_sect_line (pc=0x100003fa0, section=0x62100016d170, notcurrent=0) at /home/simark/src/binutils-gdb/gdb/symtab.c:3163
    #18 0x0000555557d489fa in find_function_start_sal_1 (func_addr=0x100003fa0, section=0x62100016d170, funfirstline=true) at /home/simark/src/binutils-gdb/gdb/symtab.c:3650
    #19 0x0000555557d49015 in find_function_start_sal (sym=0x621000191670, funfirstline=true) at /home/simark/src/binutils-gdb/gdb/symtab.c:3706
    #20 0x0000555557485283 in symbol_to_sal (result=0x7fffffffbb30, funfirstline=1, sym=0x621000191670) at /home/simark/src/binutils-gdb/gdb/linespec.c:4460
    #21 0x00005555574728c2 in convert_linespec_to_sals (state=0x7fffffffc390, ls=0x7fffffffc3e0) at /home/simark/src/binutils-gdb/gdb/linespec.c:2335
    #22 0x0000555557475a8e in parse_linespec (parser=0x7fffffffc360, arg=0x60200007a550 "main", match_type=symbol_name_match_type::WILD) at /home/simark/src/binutils-gdb/gdb/linespec.c:2716
    #23 0x0000555557479027 in event_location_to_sals (parser=0x7fffffffc360, location=0x606000097be0) at /home/simark/src/binutils-gdb/gdb/linespec.c:3173
    #24 0x00005555574798f7 in decode_line_full (location=0x606000097be0, flags=1, search_pspace=0x0, default_symtab=0x0, default_line=0, canonical=0x7fffffffcca0, select_mode=0x0, filter=0x0) at /home/simark/src/binutils-gdb/gdb/linespec.c:3253
    #25 0x0000555556b4949f in parse_breakpoint_sals (location=0x606000097be0, canonical=0x7fffffffcca0) at /home/simark/src/binutils-gdb/gdb/breakpoint.c:9134
    #26 0x0000555556b6ce95 in create_sals_from_location_default (location=0x606000097be0, canonical=0x7fffffffcca0, type_wanted=bp_breakpoint) at /home/simark/src/binutils-gdb/gdb/breakpoint.c:13819
    #27 0x0000555556b645a6 in bkpt_create_sals_from_location (location=0x606000097be0, canonical=0x7fffffffcca0, type_wanted=bp_breakpoint) at /home/simark/src/binutils-gdb/gdb/breakpoint.c:12631
    #28 0x0000555556b4badf in create_breakpoint (gdbarch=0x621000152d10, location=0x606000097be0, cond_string=0x0, thread=0, extra_string=0x0, force_condition=false, parse_extra=1, tempflag=0, type_wanted=bp_breakpoint, ignore_count=0, pending_break_support=AUTO_BOOLEAN_AUTO, ops=0x55555bd728a0 <bkpt_breakpoint_ops>, from_tty=0, enabled=1, internal=0, flags=0) at /home/simark/src/binutils-gdb/gdb/breakpoint.c:9410
    #29 0x0000555556b4d3b1 in break_command_1 (arg=0x7fffffffe291 "", flag=0, from_tty=0) at /home/simark/src/binutils-gdb/gdb/breakpoint.c:9590
    #30 0x0000555556b4dc1b in break_command (arg=0x7fffffffe28d "main", from_tty=0) at /home/simark/src/binutils-gdb/gdb/breakpoint.c:9660
    #31 0x0000555556d24ca9 in do_const_cfunc (c=0x61100003a240, args=0x7fffffffe28d "main", from_tty=0) at /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:102
    #32 0x0000555556d2fcd3 in cmd_func (cmd=0x61100003a240, args=0x7fffffffe28d "main", from_tty=0) at /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:2160
    #33 0x0000555557e84e93 in execute_command (p=0x7fffffffe290 "n", from_tty=0) at /home/simark/src/binutils-gdb/gdb/top.c:674
    #34 0x00005555575a9933 in catch_command_errors (command=0x555557e84043 <execute_command(char const*, int)>, arg=0x7fffffffe28b "b main", from_tty=0, do_bp_actions=true) at /home/simark/src/binutils-gdb/gdb/main.c:523
    #35 0x00005555575a9fdb in execute_cmdargs (cmdarg_vec=0x7fffffffd910, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffd5b0) at /home/simark/src/binutils-gdb/gdb/main.c:618
    #36 0x00005555575ad48a in captured_main_1 (context=0x7fffffffdd00) at /home/simark/src/binutils-gdb/gdb/main.c:1322
    #37 0x00005555575ada9c in captured_main (data=0x7fffffffdd00) at /home/simark/src/binutils-gdb/gdb/main.c:1343
    #38 0x00005555575adb31 in gdb_main (args=0x7fffffffdd00) at /home/simark/src/binutils-gdb/gdb/main.c:1368
    #39 0x000055555681e179 in main (argc=8, argv=0x7fffffffde78) at /home/simark/src/binutils-gdb/gdb/gdb.c:32

The section being dealt with at that moment is the special *COM*
section:

    (top-gdb) p section.name
    $1 = 0x55555a1bbe60 "*COM*"
    (top-gdb) p section
    $2 = (bfd_section *) 0x55555bbe4ac0 <_bfd_std_section>

I'm not too sure what this section is for, but this is one of four
special BFD sections that GDB puts after the regular sections in the
objfile::sections and objfile::section_offsets lists.  You can check
gdb_bfd_section_index to see how they are handled.
gdb_bfd_count_sections returns "+ 4" to account for those sections.

The problem is that macho_symfile_offsets uses bfd_count_sections
instead of gdb_bfd_count_sections when allocating the
objfile::section_offsets vector.  The vector will therefore contain,
say, 13 elements instead of 17.  When trying to access the section
offset of the *COM* section, the first after the regular sections, we
access section_offsets[13], which is out of bounds.

Fix that by using gdb_bfd_count_sections instead of bfd_count_sections.
I'm fairly confident that this is correct, as this is what
default_symfile_offsets does.

With this patch, the command shown above terminates normally:

    $ ./gdb -nx --data-directory=data-directory -q repro/test -ex "b main" -batch
    Breakpoint 1 at 0x100003fad: file test.c, line 2.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28017

gdb/ChangeLog:

	PR gdb/28017
	* machoread.c (macho_symfile_offsets): Use
	gdb_bfd_count_sections to allocate objfile::section_offsets.

Change-Id: Ic3a56f46f7232e9f24581f8255fc1ab981935450
pipcet pushed a commit that referenced this issue Jul 1, 2021
Currently, on GNU/Linux, if you try to access memory and you have a
running thread selected, GDB fails the memory accesses, like:

 (gdb) c&
 Continuing.
 (gdb) p global_var
 Cannot access memory at address 0x555555558010

Or:

 (gdb) b main
 Breakpoint 2 at 0x55555555524d: file access-mem-running.c, line 59.
 Warning:
 Cannot insert breakpoint 2.
 Cannot access memory at address 0x55555555524d

This patch removes this limitation.  It teaches the native Linux
target to read/write memory even if the target is running.  And it
does this without temporarily stopping threads.  We now get:

 (gdb) c&
 Continuing.
 (gdb) p global_var
 $1 = 123
 (gdb) b main
 Breakpoint 2 at 0x555555555259: file access-mem-running.c, line 62.

(The scenarios above work correctly with current GDBserver, because
GDBserver temporarily stops all threads in the process whenever GDB
wants to access memory (see prepare_to_access_memory /
done_accessing_memory).  Freezing the whole process makes sense when
we need to be sure that we have a consistent view of memory and don't
race with the inferior changing it at the same time as GDB is
accessing it.  But I think that's a too-heavy hammer for the default
behavior.  I think that ideally, whether to stop all threads or not
should be policy decided by gdb core, probably best implemented by
exposing something like gdbserver's prepare_to_access_memory /
done_accessing_memory to gdb core.)

Currently, if we're accessing (reading/writing) just a few bytes, then
the Linux native backend does not try accessing memory via
/proc/<pid>/mem and goes straight to ptrace
PTRACE_PEEKTEXT/PTRACE_POKETEXT.  However, ptrace always fails when
the ptracee is running.  So the first step is to prefer
/proc/<pid>/mem even for small accesses.  Without further changes
however, that may cause a performance regression, due to constantly
opening and closing /proc/<pid>/mem for each memory access.  So the
next step is to keep the /proc/<pid>/mem file open across memory
accesses.  If we have this, then it doesn't make sense anymore to even
have the ptrace fallback, so the patch disables it.

I've made it such that GDB only ever has one /proc/<pid>/mem file open
at any time.  As long as a memory access hits the same inferior
process as the previous access, then we reuse the previously open
file.  If however, we access memory of a different process, then we
close the previous file and open a new one for the new process.

If we wanted, we could keep one /proc/<pid>/mem file open per
inferior, and never close them (unless the inferior exits or execs).
However, having seen bfd patches recently about hitting too many open
file descriptors, I kept the logic to have only one file open tops.
Also, we need to handle memory accesses for processes for which we
don't have an inferior object, for when we need to detach a
fork-child, and we'd probaly want to handle caching the open file for
that scenario (no inferior for process) too, which would probably end
up meaning caching for last non-inferior process, which is very much
what I'm proposing anyhow.  So always having one file open likely ends
up a smaller patch.

The next step is handling the case of GDB reading/writing memory
through a thread that is running and exits.  The access should not
result in a user-visible failure if the inferior/process is still
alive.

Once we manage to open a /proc/<lwpid>/mem file, then that file is
usable for memory accesses even if the corresponding lwp exits and is
reaped.  I double checked that trying to open the same
/proc/<lwpid>/mem path again fails because the lwp is really gone so
there's no /proc/<lwpid>/ entry on the filesystem anymore, but the
previously open file remains usable.  It's only when the whole process
execs that we need to reopen a new file.

When the kernel destroys the whole address space, i.e., when the
process exits or execs, the reads/writes fail with 0 aka EOF, in which
case there's nothing else to do than returning a memory access
failure.  Note this means that when we get an exec event, we need to
reopen the file, to access the process's new address space.

If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening it for exits before we open it and before we reap the
LWP (i.e., the LWP is zombie), the open fails with EACCES.  The patch
handles this by just looking for another thread until it finds one
that we can open a /proc/<pid>/mem successfully for.

If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening has exited and we already reaped it, which is the case
if the selected thread is in THREAD_EXIT state, the open fails with
ENOENT.  The patch handles this the same way as a zombie race
(EACCES), instead of checking upfront whether we're accessing a
known-exited thread, because that would result in more complicated
code, because we also need to handle accessing lwps that are not
listed in the core thread list, and it's the core thread list that
records the THREAD_EXIT state.

The patch includes two testcases:

#1 - gdb.base/access-mem-running.exp

  This is the conceptually simplest - it is single-threaded, and has
  GDB read and write memory while the program is running.  It also
  tests setting a breakpoint while the program is running, and checks
  that the breakpoint is hit immediately.

#2 - gdb.threads/access-mem-running-thread-exit.exp

  This one is more elaborate, as it continuously spawns short-lived
  threads in order to exercise accessing memory just while threads are
  exiting.  It also spawns two different processes and alternates
  accessing memory between the two processes to exercise the reopening
  the /proc file frequently.  This also ends up exercising GDB reading
  from an exited thread frequently.  I confirmed by putting abort()
  calls in the EACCES/ENOENT paths added by the patch that we do hit
  all of them frequently with the testcase.  It also exits the
  process's main thread (i.e., the main thread becomes zombie), to
  make sure accessing memory in such a corner-case scenario works now
  and in the future.

The tests fail on GNU/Linux native before the code changes, and pass
after.  They pass against current GDBserver, again because GDBserver
supports memory access even if all threads are running, by
transparently pausing the whole process.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR mi/15729
	PR gdb/13463
	* linux-nat.c (linux_nat_target::detach): Close the
	/proc/<pid>/mem file if it was open for this process.
	(linux_handle_extended_wait) <PTRACE_EVENT_EXEC>: Close the
	/proc/<pid>/mem file if it was open for this process.
	(linux_nat_target::mourn_inferior): Close the /proc/<pid>/mem file
	if it was open for this process.
	(linux_nat_target::xfer_partial): Adjust.  Do not fall back to
	inf_ptrace_target::xfer_partial for memory accesses.
	(last_proc_mem_file): New.
	(maybe_close_proc_mem_file): New.
	(linux_proc_xfer_memory_partial_pid): New, with bits factored out
	from linux_proc_xfer_partial.
	(linux_proc_xfer_partial): Delete.
	(linux_proc_xfer_memory_partial): New.

gdb/testsuite/ChangeLog
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR mi/15729
	PR gdb/13463
	* gdb.base/access-mem-running.c: New.
	* gdb.base/access-mem-running.exp: New.
	* gdb.threads/access-mem-running-thread-exit.c: New.
	* gdb.threads/access-mem-running-thread-exit.exp: New.

Change-Id: Ib3c082528872662a3fc0ca9b31c34d4876c874c9
pipcet pushed a commit that referenced this issue Jul 2, 2021
This makes the simulator work the same regardless of the target (bare
metal m32r-elf or Linux m32r-linux-gnu) by unifying the traps code.
It was mostly already the same with the only difference being support
for trap #2 reserved for Linux syscalls.  We can move that logic to
runtime by checking the current environment operating mode instead.
pipcet pushed a commit that referenced this issue Jul 5, 2021
When loading a file using the file command on macOS, we get:

    $ ./gdb -nx --data-directory=data-directory -q -ex "file ./test"
    Reading symbols from ./test...
    Reading symbols from /Users/smarchi/build/binutils-gdb/gdb/test.dSYM/Contents/Resources/DWARF/test...
    /Users/smarchi/src/binutils-gdb/gdb/thread.c:72: internal-error: struct thread_info *inferior_thread(): Assertion `current_thread_ != nullptr' failed.
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    Quit this debugging session? (y or n)

The backtrace is:

    * frame #0: 0x0000000101fcb826 gdb`internal_error(file="/Users/smarchi/src/binutils-gdb/gdb/thread.c", line=72, fmt="%s: Assertion `%s' failed.") at errors.cc:52:3
      frame #1: 0x00000001018a2584 gdb`inferior_thread() at thread.c:72:3
      frame #2: 0x0000000101469c09 gdb`get_current_regcache() at regcache.c:421:31
      frame #3: 0x00000001015f9812 gdb`darwin_solib_get_all_image_info_addr_at_init(info=0x0000603000006d00) at solib-darwin.c:464:34
      frame #4: 0x00000001015f7a04 gdb`darwin_solib_create_inferior_hook(from_tty=1) at solib-darwin.c:515:5
      frame #5: 0x000000010161205e gdb`solib_create_inferior_hook(from_tty=1) at solib.c:1200:3
      frame #6: 0x00000001016d8f76 gdb`symbol_file_command(args="./test", from_tty=1) at symfile.c:1650:7
      frame #7: 0x0000000100abab17 gdb`file_command(arg="./test", from_tty=1) at exec.c:555:3
      frame bminor#8: 0x00000001004dc799 gdb`do_const_cfunc(c=0x000061100000c340, args="./test", from_tty=1) at cli-decode.c:102:3
      frame bminor#9: 0x00000001004ea042 gdb`cmd_func(cmd=0x000061100000c340, args="./test", from_tty=1) at cli-decode.c:2160:7
      frame bminor#10: 0x00000001018d4f59 gdb`execute_command(p="t", from_tty=1) at top.c:674:2
      frame bminor#11: 0x0000000100eee430 gdb`catch_command_errors(command=(gdb`execute_command(char const*, int) at top.c:561), arg="file ./test", from_tty=1, do_bp_actions=true)(char const*, int), char const*, int, bool) at main.c:523:7
      frame bminor#12: 0x0000000100eee902 gdb`execute_cmdargs(cmdarg_vec=0x00007ffeefbfeba0 size=1, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x00007ffeefbfec20) at main.c:618:9
      frame bminor#13: 0x0000000100eed3a4 gdb`captured_main_1(context=0x00007ffeefbff780) at main.c:1322:3
      frame bminor#14: 0x0000000100ee810d gdb`captured_main(data=0x00007ffeefbff780) at main.c:1343:3
      frame #15: 0x0000000100ee8025 gdb`gdb_main(args=0x00007ffeefbff780) at main.c:1368:7
      frame #16: 0x00000001000044f1 gdb`main(argc=6, argv=0x00007ffeefbff8a0) at gdb.c:32:10
      frame #17: 0x00007fff20558f5d libdyld.dylib`start + 1

The solib_create_inferior_hook call in symbol_file_command was added by
commit ea142fb ("Fix breakpoints on file reloads for PIE
binaries").  It causes solib_create_inferior_hook to be called while
the inferior is not running, which darwin_solib_create_inferior_hook
does not expect.  darwin_solib_get_all_image_info_addr_at_init, in
particular, assumes that there is a current thread, as it tries to get
the current thread's regcache.

Fix it by adding a target_has_execution check and returning early.  Note
that there is a similar check in svr4_solib_create_inferior_hook.

gdb/ChangeLog:

	* solib-darwin.c (darwin_solib_create_inferior_hook): Return
	early if no execution.

Change-Id: Ia11dd983a1e29786e5ce663d0fcaa6846dc611bb
pipcet pushed a commit that referenced this issue Jul 16, 2021
Commit 408f668 ("detach in all-stop
with threads running") regressed "detach" with "target remote":

 (gdb) detach
 Detaching from program: target:/any/program, process 3671843
 Detaching from process 3671843
 Ending remote debugging.
 [Inferior 1 (process 3671843) detached]
 In main
 terminate called after throwing an instance of 'gdb_exception_error'
 Aborted (core dumped)

Here's the exception above being thrown:

 (top-gdb) bt
 #0  throw_error (error=TARGET_CLOSE_ERROR, fmt=0x555556035588 "Remote connection closed") at src/gdbsupport/common-exceptions.cc:222
 #1  0x0000555555bbaa46 in remote_target::readchar (this=0x555556a11040, timeout=10000) at src/gdb/remote.c:9440
 #2  0x0000555555bbb9e5 in remote_target::getpkt_or_notif_sane_1 (this=0x555556a11040, buf=0x555556a11058, forever=0, expecting_notif=0, is_notif=0x0) at src/gdb/remote.c:9928
 #3  0x0000555555bbbda9 in remote_target::getpkt_sane (this=0x555556a11040, buf=0x555556a11058, forever=0) at src/gdb/remote.c:10030
 #4  0x0000555555bc0e75 in remote_target::remote_hostio_send_command (this=0x555556a11040, command_bytes=13, which_packet=14, remote_errno=0x7fffffffcfd0, attachment=0x0, attachment_len=0x0) at src/gdb/remote.c:12137
 #5  0x0000555555bc1b6c in remote_target::remote_hostio_close (this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12455
 #6  0x0000555555bc1bb4 in remote_target::fileio_close (During symbol reading: .debug_line address at offset 0x64f417 is 0 [in module build/gdb/gdb]
 this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12462
 #7  0x0000555555c9274c in target_fileio_close (fd=3, target_errno=0x7fffffffcfd0) at src/gdb/target.c:3365
 bminor#8  0x000055555595a19d in gdb_bfd_iovec_fileio_close (abfd=0x555556b9f8a0, stream=0x555556b11530) at src/gdb/gdb_bfd.c:439
 bminor#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556b9f8a0) at src/bfd/opncls.c:599
 bminor#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556b9f8a0) at src/bfd/opncls.c:847
 bminor#11 0x0000555555e0a27a in bfd_close (abfd=0x555556b9f8a0) at src/bfd/opncls.c:814
 bminor#12 0x000055555595a9d3 in gdb_bfd_close_or_warn (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:626
 bminor#13 0x000055555595ad29 in gdb_bfd_unref (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:715
 bminor#14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573
 #15 0x0000555555ae955a in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:377
 #16 0x000055555572b7c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:155
 #17 0x00005555557263c3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x555556bf0588, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
 #18 0x0000555555ae745e in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
 #19 0x0000555555ae747e in std::shared_ptr<objfile>::~shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
 #20 0x0000555555b1c1dc in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x5555564cdd60, __p=0x555556bf0580) at /usr/include/c++/9/ext/new_allocator.h:153
 #21 0x0000555555b1bb1d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=..., __p=0x555556bf0580) at /usr/include/c++/9/bits/alloc_traits.h:497
 #22 0x0000555555b1b73e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/stl_list.h:1921
 #23 0x0000555555b1afeb in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/list.tcc:158
 #24 0x0000555555b19576 in program_space::remove_objfile (this=0x5555564cdd20, objfile=0x555556515540) at src/gdb/progspace.c:210
 #25 0x0000555555ae4502 in objfile::unlink (this=0x555556515540) at src/gdb/objfiles.c:487
 #26 0x0000555555ae5a12 in objfile_purge_solibs () at src/gdb/objfiles.c:875
 #27 0x0000555555c09686 in no_shared_libraries (ignored=0x0, from_tty=1) at src/gdb/solib.c:1236
 #28 0x00005555559e3f5f in detach_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:2769

So frame #28 already detached the remote process, and then we're
purging the shared libraries.  GDB had opened remote shared libraries
via the target: sysroot, so it tries closing them.  GDBserver is
tearing down already, so remote communication breaks down and we close
the remote target and throw TARGET_CLOSE_ERROR.

Note frame bminor#14:

 bminor#14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573

That's a dtor, thus noexcept.  That's the reason for the
std::terminate.

Stepping back a bit, why do we still have open remote files if we've
managed to detach already, and, we're debugging with "target remote"?
The reason is that commit 408f668
makes detach_command hold a reference to the target, so the remote
target won't be finally closed until frame #28 returns.  It's closing
the target that invalidates target file I/O handles.

This commit fixes the issue by not relying on target_close to
invalidate the target file I/O handles, instead invalidate them
immediately in remote_unpush_target.  So when GDB purges the solibs,
and we end up in target_fileio_close (frame #7 above), there's nothing
to do, and we don't try to talk with the remote target anymore.

The regression isn't seen when testing with
--target_board=native-gdbserver, because that does "set sysroot" to
disable the "target:" sysroot, for test run speed reasons.  So this
commit adds a testcase that explicitly tests detach with "set sysroot
target:".

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR gdb/28080
	* remote.c (remote_unpush_target): Invalidate file I/O target
	handles.
	* target.c (fileio_handles_invalidate_target): Make extern.
	* target.h (fileio_handles_invalidate_target): Declare.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR gdb/28080
	* gdb.base/detach-sysroot-target.exp: New.
	* gdb.base/detach-sysroot-target.c: New.

Reported-By: Jonah Graham <[email protected]>

Change-Id: I851234910172f42a1b30e731161376c344d2727d
pipcet pushed a commit that referenced this issue Jul 16, 2021
…080)

Before PR gdb/28080 was fixed by the previous patch, GDB was crashing
like this:

 (gdb) detach
 Detaching from program: target:/any/program, process 3671843
 Detaching from process 3671843
 Ending remote debugging.
 [Inferior 1 (process 3671843) detached]
 In main
 terminate called after throwing an instance of 'gdb_exception_error'
 Aborted (core dumped)

Here's the exception above being thrown:

 (top-gdb) bt
 #0  throw_error (error=TARGET_CLOSE_ERROR, fmt=0x555556035588 "Remote connection closed") at src/gdbsupport/common-exceptions.cc:222
 #1  0x0000555555bbaa46 in remote_target::readchar (this=0x555556a11040, timeout=10000) at src/gdb/remote.c:9440
 #2  0x0000555555bbb9e5 in remote_target::getpkt_or_notif_sane_1 (this=0x555556a11040, buf=0x555556a11058, forever=0, expecting_notif=0, is_notif=0x0) at src/gdb/remote.c:9928
 #3  0x0000555555bbbda9 in remote_target::getpkt_sane (this=0x555556a11040, buf=0x555556a11058, forever=0) at src/gdb/remote.c:10030
 #4  0x0000555555bc0e75 in remote_target::remote_hostio_send_command (this=0x555556a11040, command_bytes=13, which_packet=14, remote_errno=0x7fffffffcfd0, attachment=0x0, attachment_len=0x0) at src/gdb/remote.c:12137
 #5  0x0000555555bc1b6c in remote_target::remote_hostio_close (this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12455
 #6  0x0000555555bc1bb4 in remote_target::fileio_close (During symbol reading: .debug_line address at offset 0x64f417 is 0 [in module build/gdb/gdb]
 this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12462
 #7  0x0000555555c9274c in target_fileio_close (fd=3, target_errno=0x7fffffffcfd0) at src/gdb/target.c:3365
 bminor#8  0x000055555595a19d in gdb_bfd_iovec_fileio_close (abfd=0x555556b9f8a0, stream=0x555556b11530) at src/gdb/gdb_bfd.c:439
 bminor#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556b9f8a0) at src/bfd/opncls.c:599
 bminor#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556b9f8a0) at src/bfd/opncls.c:847
 bminor#11 0x0000555555e0a27a in bfd_close (abfd=0x555556b9f8a0) at src/bfd/opncls.c:814
 bminor#12 0x000055555595a9d3 in gdb_bfd_close_or_warn (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:626
 bminor#13 0x000055555595ad29 in gdb_bfd_unref (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:715
 bminor#14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573
 #15 0x0000555555ae955a in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:377
 #16 0x000055555572b7c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:155
 #17 0x00005555557263c3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x555556bf0588, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
 #18 0x0000555555ae745e in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
 #19 0x0000555555ae747e in std::shared_ptr<objfile>::~shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
 #20 0x0000555555b1c1dc in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x5555564cdd60, __p=0x555556bf0580) at /usr/include/c++/9/ext/new_allocator.h:153
 #21 0x0000555555b1bb1d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=..., __p=0x555556bf0580) at /usr/include/c++/9/bits/alloc_traits.h:497
 #22 0x0000555555b1b73e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/stl_list.h:1921
 #23 0x0000555555b1afeb in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/list.tcc:158
 #24 0x0000555555b19576 in program_space::remove_objfile (this=0x5555564cdd20, objfile=0x555556515540) at src/gdb/progspace.c:210
 #25 0x0000555555ae4502 in objfile::unlink (this=0x555556515540) at src/gdb/objfiles.c:487
 #26 0x0000555555ae5a12 in objfile_purge_solibs () at src/gdb/objfiles.c:875
 #27 0x0000555555c09686 in no_shared_libraries (ignored=0x0, from_tty=1) at src/gdb/solib.c:1236
 #28 0x00005555559e3f5f in detach_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:2769

Note frame bminor#14:

 bminor#14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573

That's a dtor, thus noexcept.  That's the reason for the
std::terminate.

The previous patch fixed things such that the exception above isn't
thrown anymore.  However, it's possible that e.g., the remote
connection drops just while a user types "nosharedlibrary", or some
other reason that leads to objfile::~objfile, and then we end up the
same std::terminate problem.

Also notice that frames bminor#9-bminor#11 are BFD frames:

 bminor#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556bc27e0) at src/bfd/opncls.c:599
 bminor#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556bc27e0) at src/bfd/opncls.c:847
 bminor#11 0x0000555555e0a27a in bfd_close (abfd=0x555556bc27e0) at src/bfd/opncls.c:814

BFD is written in C and thus throwing exceptions over such frames may
either not clean up properly, or, may abort if bfd is not compiled
with -fasynchronous-unwind-tables (x86-64 defaults that on, but not
all GCC ports do).

Thus frame bminor#8 seems like a good place to swallow exceptions.  More so
since in this spot we already ignore target_fileio_close return
errors.  That's what this commit does.  Without the previous fix, we'd
see:

 (gdb) detach
 Detaching from program: target:/any/program, process 2197701
 Ending remote debugging.
 [Inferior 1 (process 2197701) detached]
 warning: cannot close "target:/lib64/ld-linux-x86-64.so.2": Remote connection closed

Note it prints a warning, which would still be a regression compared
to GDB 10, if it weren't for the previous fix.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR gdb/28080
	* gdb_bfd.c (gdb_bfd_close_warning): New.
	(gdb_bfd_iovec_fileio_close): Wrap target_fileio_close in
	try/catch and print warning on exception.
	(gdb_bfd_close_or_warn): Use gdb_bfd_close_warning.

Change-Id: Ic7a26ddba0a4444e3377b0e7c1c89934a84545d7
pipcet pushed a commit that referenced this issue Jul 31, 2021
Simon Marchi tried gdb on OpenBSD, and it immediately segfaults when
running a program.  Simon tracked down the problem to x86_dr_low.get_status
being nullptr at this point:

    (lldb) print x86_dr_low.get_status
    (unsigned long (*)()) $0 = 0x0000000000000000
    (lldb) bt
    * thread #1, stop reason = step over
      * frame #0: 0x0000033b64b764aa gdb`x86_dr_stopped_data_address(state=0x0000033d7162a310, addr_p=0x00007f7ffffc5688) at x86-dregs.c:645:12
        frame #1: 0x0000033b64b766de gdb`x86_dr_stopped_by_watchpoint(state=0x0000033d7162a310) at x86-dregs.c:687:10
        frame #2: 0x0000033b64ea5f72 gdb`x86_stopped_by_watchpoint() at x86-nat.c:206:10
        frame #3: 0x0000033b64637fbb gdb`x86_nat_target<obsd_nat_target>::stopped_by_watchpoint(this=0x0000033b65252820) at x86-nat.h:100:12
        frame #4: 0x0000033b64d3ff11 gdb`target_stopped_by_watchpoint() at target.c:468:46
        frame #5: 0x0000033b6469b001 gdb`watchpoints_triggered(ws=0x00007f7ffffc61c8) at breakpoint.c:4790:32
        frame #6: 0x0000033b64a8bb8b gdb`handle_signal_stop(ecs=0x00007f7ffffc61a0) at infrun.c:6072:29
        frame #7: 0x0000033b64a7e3a7 gdb`handle_inferior_event(ecs=0x00007f7ffffc61a0) at infrun.c:5694:7
        frame bminor#8: 0x0000033b64a7c1a0 gdb`fetch_inferior_event() at infrun.c:4090:5
        frame bminor#9: 0x0000033b64a51921 gdb`inferior_event_handler(event_type=INF_REG_EVENT) at inf-loop.c:41:7
        frame bminor#10: 0x0000033b64a827c9 gdb`infrun_async_inferior_event_handler(data=0x0000000000000000) at infrun.c:9384:3
        frame bminor#11: 0x0000033b6465bd4f gdb`check_async_event_handlers() at async-event.c:335:4
        frame bminor#12: 0x0000033b65070917 gdb`gdb_do_one_event() at event-loop.cc:216:10
        frame bminor#13: 0x0000033b64af0db1 gdb`start_event_loop() at main.c:421:13
        frame bminor#14: 0x0000033b64aefe9a gdb`captured_command_loop() at main.c:481:3
        frame #15: 0x0000033b64aed5c2 gdb`captured_main(data=0x00007f7ffffc6470) at main.c:1353:4
        frame #16: 0x0000033b64aed4f2 gdb`gdb_main(args=0x00007f7ffffc6470) at main.c:1368:7
        frame #17: 0x0000033b6459d787 gdb`main(argc=5, argv=0x00007f7ffffc6518) at gdb.c:32:10
        frame #18: 0x0000033b6459d521 gdb`___start + 321

On BSDs, get_status is set in _initialize_x86_bsd_nat, but only if
HAVE_PT_GETDBREGS is defined.  PT_GETDBREGS doesn't exist on OpenBSD, so
get_status (and the other fields of x86_dr_low) are left as nullptr.

OpenBSD doesn't support getting or setting the x86 debug registers, so
fix by omitting debug register support entirely on OpenBSD:

- Change x86bsd_nat_target to only inherit from x86_nat_target if
  PT_GETDBREGS is supported.

- Don't include x86-nat.o and nat/x86-dregs.o for OpenBSD/amd64.  They
  were already omitted for OpenBSD/i386.
pipcet pushed a commit that referenced this issue Aug 7, 2021
In PR28004 the following warning / Internal error is reported:
...
$ gdb -q -batch \
    -iex "set sysroot $(pwd -P)/repro" \
    ./repro/gdb \
    ./repro/core \
    -ex bt
  ...
 Program terminated with signal SIGABRT, Aborted.
 #0  0x00007ff8fe8e5d22 in raise () from repro/usr/lib/libc.so.6
 [Current thread is 1 (LWP 1762498)]
 #1  0x00007ff8fe8cf862 in abort () from repro/usr/lib/libc.so.6
 warning: (Internal error: pc 0x7ff8feb2c21d in read in psymtab, \
           but not in symtab.)
 warning: (Internal error: pc 0x7ff8feb2c218 in read in psymtab, \
           but not in symtab.)
  ...
 #2  0x00007ff8feb2c21e in __gnu_debug::_Error_formatter::_M_error() const \
   [clone .cold] (warning: (Internal error: pc 0x7ff8feb2c21d in read in \
   psymtab, but not in symtab.)

) from repro/usr/lib/libstdc++.so.6
...

The warning is about the following:
- in find_pc_sect_compunit_symtab we try to find the address
  (0x7ff8feb2c218 / 0x7ff8feb2c21d) in the symtabs.
- that fails, so we try again in the partial symtabs.
- we find a matching partial symtab
- however, the partial symtab has a full symtab, so
  we should have found a matching symtab in the first step.

The addresses are:
...
(gdb) info sym 0x7ff8feb2c218
__gnu_debug::_Error_formatter::_M_error() const [clone .cold] in \
  section .text of repro/usr/lib/libstdc++.so.6
(gdb) info sym 0x7ff8feb2c21d
__gnu_debug::_Error_formatter::_M_error() const [clone .cold] + 5 in \
  section .text of repro/usr/lib/libstdc++.so.6
...
which correspond to unrelocated addresses 0x9c218 and 0x9c21d:
...
$ nm -C  repro/usr/lib/libstdc++.so.6.0.29 | grep 000000000009c218
000000000009c218 t __gnu_debug::_Error_formatter::_M_error() const \
  [clone .cold]
...
which belong to function __gnu_debug::_Error_formatter::_M_error() in
/build/gcc/src/gcc/libstdc++-v3/src/c++11/debug.cc.

The partial symtab that is found for the addresses is instead the one for
/build/gcc/src/gcc/libstdc++-v3/src/c++98/bitmap_allocator.cc, which is
incorrect.

This happens as follows.

The bitmap_allocator.cc CU has DW_AT_ranges at .debug_rnglist offset 0x4b50:
...
    00004b50 0000000000000000 0000000000000056
    00004b5a 00000000000a4790 00000000000a479c
    00004b64 00000000000a47a0 00000000000a47ac
...

When reading the first range 0x0..0x56, it doesn't trigger the "start address
of zero" complaint here:
...
      /* A not-uncommon case of bad debug info.
         Don't pollute the addrmap with bad data.  */
      if (range_beginning + baseaddr == 0
          && !per_objfile->per_bfd->has_section_at_zero)
        {
          complaint (_(".debug_rnglists entry has start address of zero"
                       " [in module %s]"), objfile_name (objfile));
          continue;
        }
...
because baseaddr != 0, which seems incorrect given that when loading the
shared library individually in gdb (and consequently baseaddr == 0), we do see
the complaint.

Consequently, we run into this case in dwarf2_get_pc_bounds:
...
  if (low == 0 && !per_objfile->per_bfd->has_section_at_zero)
    return PC_BOUNDS_INVALID;
...
which then results in this code in process_psymtab_comp_unit_reader being
called with cu_bounds_kind == PC_BOUNDS_INVALID, which sets the set_addrmap
argument to 1:
...
      scan_partial_symbols (first_die, &lowpc, &highpc,
                            cu_bounds_kind <= PC_BOUNDS_INVALID, cu);
...
and consequently, the CU addrmap gets build using address info from the
functions.

During that process, addrmap_set_empty is called with a range that includes
0x9c218 and 0x9c21d:
...
(gdb) p /x start
$7 = 0x9989c
(gdb) p /x end_inclusive
$8 = 0xb200d
...
but it's called for a function at DIE 0x54153 with DW_AT_ranges at 0x40ae:
...
    000040ae 00000000000b1ee0 00000000000b200e
    000040b9 000000000009989c 00000000000998c4
    000040c3 <End of list>
...
and neither range includes 0x9c218 and 0x9c21d.

This is caused by this code in partial_die_info::read:
...
            if (dwarf2_ranges_read (ranges_offset, &lowpc, &highpc, cu,
                                    nullptr, tag))
             has_pc_info = 1;
...
which pretends that the function is located at addresses 0x9989c..0xb200d,
which is indeed not the case.

This patch fixes the first problem encountered: fix the "start address of
zero" complaint warning by removing the baseaddr part from the condition.
Same for dwarf2_ranges_process.

The effect is that:
- the complaint is triggered, and
- the warning / Internal error is no longer triggered.

This does not fix the observed problem in partial_die_info::read, which is
filed as PR28200.

Tested on x86_64-linux.

Co-Authored-By: Simon Marchi <[email protected]>

gdb/ChangeLog:

2021-07-29  Simon Marchi  <[email protected]>
	    Tom de Vries  <[email protected]>

	PR symtab/28004
	* gdb/dwarf2/read.c (dwarf2_rnglists_process, dwarf2_ranges_process):
	Fix zero address complaint.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range-shlib.c: New test.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range.c: New test.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range.exp: New file.
pipcet pushed a commit that referenced this issue Aug 16, 2021
While working on the testsuite, I ended up noticing that GDB fails to
produce a full backtrace from a thread waiting in pthread_join.  When
selecting the waiting thread and using the 'bt' command, the following
result can be observed:

	(gdb) bt
	#0  0x0000003ff7fccd20 in __futex_abstimed_wait_common64 () from /lib/riscv64-linux-gnu/libpthread.so.0
	#1  0x0000003ff7fc43da in __pthread_clockjoin_ex () from /lib/riscv64-linux-gnu/libpthread.so.0
	Backtrace stopped: frame did not save the PC

On my platform, I do not have debug symbols for glibc, so I need to rely
on prologue analysis in order to unwind stack.

Here is what the function prologue looks like:

	(gdb) disassemble __pthread_clockjoin_ex
	Dump of assembler code for function __pthread_clockjoin_ex:
	   0x0000003ff7fc42de <+0>:     addi    sp,sp,-144
	   0x0000003ff7fc42e0 <+2>:     sd      s5,88(sp)
	   0x0000003ff7fc42e2 <+4>:     auipc   s5,0xd
	   0x0000003ff7fc42e6 <+8>:     ld      s5,-2(s5) # 0x3ff7fd12e0
	   0x0000003ff7fc42ea <+12>:    ld      a5,0(s5)
	   0x0000003ff7fc42ee <+16>:    sd      ra,136(sp)
	   0x0000003ff7fc42f0 <+18>:    sd      s0,128(sp)
	   0x0000003ff7fc42f2 <+20>:    sd      s1,120(sp)
	   0x0000003ff7fc42f4 <+22>:    sd      s2,112(sp)
	   0x0000003ff7fc42f6 <+24>:    sd      s3,104(sp)
	   0x0000003ff7fc42f8 <+26>:    sd      s4,96(sp)
	   0x0000003ff7fc42fa <+28>:    sd      s6,80(sp)
	   0x0000003ff7fc42fc <+30>:    sd      s7,72(sp)
	   0x0000003ff7fc42fe <+32>:    sd      s8,64(sp)
	   0x0000003ff7fc4300 <+34>:    sd      s9,56(sp)
	   0x0000003ff7fc4302 <+36>:    sd      a5,40(sp)

As far as prologue analysis is concerned, the most interesting part is
done at address 0x0000003ff7fc42ee (<+16>): 'sd ra,136(sp)'. This stores
the RA (return address) register on the stack, which is the information
we are looking for in order to identify the caller.

In the current implementation of the prologue scanner, GDB stops when
hitting 0x0000003ff7fc42e6 (<+8>) because it does not know what to do
with the 'ld' instruction.  GDB thinks it reached the end of the
prologue but have not yet reached the important part, which explain
GDB's inability to unwind past this point.

The section of the prologue starting at <+4> until <+12> is used to load
the stack canary[1], which will then be placed on the stack at <+36> at
the end of the prologue.

In order to have the prologue properly handled, this commit proposes to
add support for the ld instruction in the RISC-V prologue scanner.
I guess riscv32 would use lw in such situation so this patch also adds
support for this instruction.

With this patch applied, gdb is now able to unwind past pthread_join:

	(gdb) bt
	#0  0x0000003ff7fccd20 in __futex_abstimed_wait_common64 () from /lib/riscv64-linux-gnu/libpthread.so.0
	#1  0x0000003ff7fc43da in __pthread_clockjoin_ex () from /lib/riscv64-linux-gnu/libpthread.so.0
	#2  0x0000002aaaaaa88e in bar() ()
	#3  0x0000002aaaaaa8c4 in foo() ()
	#4  0x0000002aaaaaa8da in main ()

I have had a look to see if I could reproduce this easily, but in my
simple testcases using '-fstack-protector-all', the canary is loaded
after the RA register is saved.  I do not have a reliable way of
generating a prologue similar to the problematic one so I forged one
instead.

The testsuite have been run on riscv64 ubuntu 21.01 with no regression
observed.

[1] https://en.wikipedia.org/wiki/Buffer_overflow_protection#Canaries
pipcet pushed a commit that referenced this issue Sep 4, 2021
The original reproducer for PR28030 required use of a specific
compiler version - gcc-c++-11.1.1-3.fc34 is mentioned in the PR,
though it seems probable that other gcc versions might also be able to
reproduce the bug as well.  This commit introduces a test case which,
using the DWARF assembler, provides a reproducer which is independent
of the compiler version.  (Well, it'll work with whatever compilers
the DWARF assembler works with.)

To the best of my knowledge, it's also the first test case which uses
the DWARF assembler to provide debug info for a shared object.  That
being the case, I provided more than the usual commentary which should
allow this case to be used as a template when a combo shared
library / DWARF assembler test case is required in the future.

I provide some details regarding the bug in a comment near the
beginning of locexpr-dml.exp.

This problem was difficult to reproduce; I found myself constantly
referring to the backtrace while trying to figure out what (else) I
might be missing while trying to create a reproducer.  Below is a
partial backtrace which I include for posterity.

 #0  internal_error (
    file=0xc50110 "/ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c", line=5575,
    fmt=0xc520c0 "Unexpected type field location kind: %d")
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdbsupport/errors.cc:51
 #1  0x00000000006ef0c5 in copy_type_recursive (objfile=0x1635930,
    type=0x274c260, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c:5575
 #2  0x00000000006ef382 in copy_type_recursive (objfile=0x1635930,
    type=0x274ca10, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c:5602
 #3  0x0000000000a7409a in preserve_one_value (value=0x24269f0,
    objfile=0x1635930, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/value.c:2529
 #4  0x000000000072012a in gdbscm_preserve_values (
    extlang=0xc55720 <extension_language_guile>, objfile=0x1635930,
    copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/guile/scm-value.c:94
 #5  0x00000000006a3f82 in preserve_ext_lang_values (objfile=0x1635930,
    copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/extension.c:568
 #6  0x0000000000a7428d in preserve_values (objfile=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/value.c:2579
 #7  0x000000000082d514 in objfile::~objfile (this=0x1635930,
    __in_chrg=<optimized out>)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:549
 bminor#8  0x0000000000831cc8 in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x1654580)
    at /usr/include/c++/11/bits/shared_ptr_base.h:348
 bminor#9  0x00000000004e6617 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x1654580) at /usr/include/c++/11/bits/shared_ptr_base.h:168
 bminor#10 0x00000000004e1d2f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x190bb88, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr_base.h:705
 bminor#11 0x000000000082feee in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x190bb80, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr_base.h:1154
 bminor#12 0x000000000082ff0a in std::shared_ptr<objfile>::~shared_ptr (
    this=0x190bb80, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr.h:122
 bminor#13 0x000000000085ed7e in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x114bc00,
    __p=0x190bb80) at /usr/include/c++/11/ext/new_allocator.h:168
 bminor#14 0x000000000085e88d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=...,
    __p=0x190bb80) at /usr/include/c++/11/bits/alloc_traits.h:531
 #15 0x000000000085e50c in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x114bc00, __position=
  std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x1635930})
    at /usr/include/c++/11/bits/stl_list.h:1925
 #16 0x000000000085df0e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x114bc00, __position=
  std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x1635930})
    at /usr/include/c++/11/bits/list.tcc:158
 #17 0x000000000085c748 in program_space::remove_objfile (this=0x114bbc0,
    objfile=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/progspace.c:210
 #18 0x000000000082d3ae in objfile::unlink (this=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:487
 #19 0x000000000082e68c in objfile_purge_solibs ()
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:875
 #20 0x000000000092dd37 in no_shared_libraries (ignored=0x0, from_tty=1)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/solib.c:1236
 #21 0x00000000009a37fe in target_pre_inferior (from_tty=1)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/target.c:2496
 #22 0x00000000007454d6 in run_command_1 (args=0x0, from_tty=1,
    run_how=RUN_NORMAL)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/infcmd.c:437

I'll note a few points regarding this backtrace:

Frame #1 is where the internal error occurs.  It's caused by an
unhandled case for FIELD_LOC_KIND_DWARF_BLOCK.  The fix for this bug
adds support for this case.

Frame #22 - it's a partial backtrace - shows that GDB is attempting to
(re)run the program.  You can see the exact command sequence that was
used for reproducing this problem in the PR (at
https://sourceware.org/bugzilla/show_bug.cgi?id=28030), but in a
nutshell, after starting the program and advancing to the appropriate
source line, GDB was asked to step into libstdc++; a "finish" command
was issued, returning a value.  The fact that a value was returned is
very important.  GDB was then used to step back into libstdc++.  A
breakpoint was set on a source line in the library after which a "run"
command was issued.

Frame #19 shows a call to objfile_purge_solibs.  It's aptly named.

Frame #7 is a call to the destructor for one of the objfile solibs; it
turned out to be the one for libstdc++.

Frames #6 thru #3 show various value preservation frames.  If you look
at preserve_values() in gdb/value.c, the value history is preserved
first, followed by internal variables, followed by values for the
extension languages (python and guile).
pipcet pushed a commit that referenced this issue Oct 12, 2021
…es.exp

When running test-case gdb.base/break-probes.exp on ubuntu 18.04.5, we have:
...
 (gdb) bt^M
 #0  0x00007ffff7dd6e12 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #1  0x00007ffff7dedf50 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #2  0x00007ffff7dd5128 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #3  0x00007ffff7dd4098 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #4  0x0000000000000001 in ?? ()^M
 #5  0x00007fffffffdaac in ?? ()^M
 #6  0x0000000000000000 in ?? ()^M
 (gdb) FAIL: gdb.base/break-probes.exp: ensure using probes
...

The test-case intends to emit an UNTESTED in this case, but fails to do so
because it tries to do it in a regexp clause in a gdb_test_multiple, which
doesn't trigger.  Instead, a default clause triggers which produces the FAIL.

Also the use of UNTESTED is not appropriate, and we should use UNSUPPORTED
instead.

Fix this by silencing the FAIL, and emitting an UNSUPPORTED after the
gdb_test_multiple:
...
 if { ! $using_probes } {
+    unsupported "probes not present on this system"
     return -1
 }
...

Tested on x86_64-linux.
pipcet pushed a commit that referenced this issue Oct 12, 2021
When running test-case gdb.base/break-probes.exp on ubuntu 18.04.5, we have:
...
 (gdb) run^M
 Starting program: break-probes^M
 Stopped due to shared library event (no libraries added or removed)^M
 (gdb) bt^M
 #0  0x00007ffff7dd6e12 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #1  0x00007ffff7dedf50 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #2  0x00007ffff7dd5128 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #3  0x00007ffff7dd4098 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #4  0x0000000000000001 in ?? ()^M
 #5  0x00007fffffffdaac in ?? ()^M
 #6  0x0000000000000000 in ?? ()^M
 (gdb) UNSUPPORTED: gdb.base/break-probes.exp: probes not present on this system
...

Using the backtrace, the test-case tries to establish that we're stopped in
dl_main, which is used as proof that we're using probes.

However, the backtrace only shows an address, because:
- the dynamic linker contains no minimal symbols and no debug info, and
- gdb is build without --with-separate-debug-dir so it can't find the
  corresponding .debug file, which does contain the mimimal symbols and
  debug info.

Fix this by instead printing the pc and grepping for the value in the
info probes output:
...
(gdb) p /x $pc^M
$1 = 0x7ffff7dd6e12^M
(gdb) info probes^M
Type Provider Name           Where              Object                      ^M
  ...
stap rtld     init_start     0x00007ffff7dd6e12 /lib64/ld-linux-x86-64.so.2 ^M
  ...
(gdb)
...

Tested on x86_64-linux.
pipcet pushed a commit that referenced this issue Oct 12, 2021
When running test-case gdb.base/break-interp.exp on ubuntu 18.04.5, we have:
...
 (gdb) bt^M
 #0  0x00007eff7ad5ae12 in ?? () from break-interp-LDprelinkNOdebugNO^M
 #1  0x00007eff7ad71f50 in ?? () from break-interp-LDprelinkNOdebugNO^M
 #2  0x00007eff7ad59128 in ?? () from break-interp-LDprelinkNOdebugNO^M
 #3  0x00007eff7ad58098 in ?? () from break-interp-LDprelinkNOdebugNO^M
 #4  0x0000000000000002 in ?? ()^M
 #5  0x00007fff505d7a32 in ?? ()^M
 #6  0x00007fff505d7a94 in ?? ()^M
 #7  0x0000000000000000 in ?? ()^M
 (gdb) FAIL: gdb.base/break-interp.exp: ldprelink=NO: ldsepdebug=NO: \
         first backtrace: dl bt
...

Using the backtrace, the test-case tries to establish that we're stopped in
dl_main.

However, the backtrace only shows an address, because:
- the dynamic linker contains no minimal symbols and no debug info, and
- gdb is build without --with-separate-debug-dir so it can't find the
  corresponding .debug file, which does contain the mimimal symbols and
  debug info.

As in "[gdb/testsuite] Improve probe detection in gdb.base/break-probes.exp",
fix this by doing info probes and grepping for the address.

Tested on x86_64-linux.
pipcet pushed a commit that referenced this issue Oct 12, 2021
The gdb.multi/multi-term-settings.exp testcase sometimes fails like so:

 Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.multi/multi-term-settings.exp ...
 FAIL: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: stop with control-c (SIGINT)

It's easier to reproduce if you stress the machine at the same time, like e.g.:

  $ stress -c 24

Looking at gdb.log, we see:

 (gdb) attach 60422
 Attaching to program: build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings, process 60422
 [New Thread 60422.60422]
 Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
 Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
 Reading symbols from /lib64/ld-linux-x86-64.so.2...
 (No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
 0x00007f2fc2485334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id@entry <mailto:clock_id@entry>=0, flags=flags@entry <mailto:flags@entry>=0, req=req@entry <mailto:req@entry>=0x7ffe23126940, rem=rem@entry <mailto:rem@entry>=0x0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
 78	../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: inf2: attach
 set schedule-multiple on
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: set schedule-multiple on
 info inferiors
   Num  Description       Connection                         Executable
   1    process 60404     1 (extended-remote localhost:2349) build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings
 * 2    process 60422     1 (extended-remote localhost:2349) build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: info inferiors
 pid=60422, count=46
 pid=60422, count=47
 pid=60422, count=48
 pid=60422, count=49
 pid=60422, count=50
 pid=60422, count=51
 pid=60422, count=52
 pid=60422, count=53
 pid=60422, count=54
 pid=60422, count=55
 pid=60422, count=56
 pid=60422, count=57
 pid=60422, count=58
 pid=60422, count=59
 pid=60422, count=60
 pid=60422, count=61
 pid=60422, count=62
 pid=60422, count=63
 pid=60422, count=64
 pid=60422, count=65
 pid=60422, count=66
 pid=60422, count=67
 pid=60422, count=68
 pid=60422, count=69
 pid=60404, count=54
 pid=60404, count=55
 pid=60404, count=56
 pid=60404, count=57
 pid=60404, count=58
 PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: continue
 Quit
 (gdb) FAIL: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: stop with control-c (SIGINT)

If you look at the testcase's sources, you'll see that the intention
is to resumes the program with "continue", wait to see a few of those
"pid=..., count=..." lines, and then interrupt the program with
Ctrl-C.  But somehow, that resulted in GDB printing "Quit", instead of
the Ctrl-C stopping the program with SIGINT.

Here's what is happening:

 #1 - those "pid=..., count=..." lines we see above weren't actually
      output by the inferior after it has been continued (see #1).
      Note that "inf1_how" and "inf2_how" are "attach".  What happened
      is that those "pid=..., count=..." lines were output by the
      inferiors _before_ they were attached to.  We see them at that
      point instead of earlier, because that's where the testcase
      reads from the inferiors' spawn_ids.

 #2 - The testcase mistakenly thinks those "pid=..., count=..." lines
      happened after the continue was processed by GDB, meaning it has
      waited enough, and so sends the Ctrl-C.  GDB hasn't yet passed
      the terminal to the inferior, so the Ctrl-C results in that
      Quit.

The fix here is twofold:

 #1 - flush inferior output right after attaching

 #2 - consume the "Continuing" printed by "continue", indicating the
      inferior has the terminal.  This is the same as done throughout
      the testsuite to handle this exact problem of sending Ctrl-C too
      soon.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected] <mailto:[email protected]>>

	* gdb.multi/multi-term-settings.exp (create_inferior): Flush
	inferior output.
	(coretest): Use $gdb_test_name.  After issuing "continue", wait
	for "Continuing".

Change-Id: Iba7671dfe1eee6b98d29cfdb05a1b9aa2f9defb9
pipcet pushed a commit that referenced this issue Dec 23, 2021
On openSUSE Tumbleweed with glibc-debuginfo installed I get:
...
 (gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print
 where^M
 #0  print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M
 #1  0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M
 #2  0x00007ffff7d56b37 in start_thread (arg=<optimized out>) \
                          at pthread_create.c:435^M
 #3  0x00007ffff7ddb640 in clone3 () \
                          at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81^M
 (gdb) PASS: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit
...
while without debuginfo installed I get instead:
...
 (gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print
 where^M
 #0  print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M
 #1  0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M
 #2  0x00007ffff7d56b37 in start_thread () from /lib64/libc.so.6^M
 #3  0x00007ffff7ddb640 in clone3 () from /lib64/libc.so.6^M
 (gdb) FAIL: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit
...

The problem is that the regexp used:
...
  "\(from .*libpthread\|at pthread_create\|in pthread_create\)"
...
expects the 'from' part to match libpthread, but in glibc 2.34 libpthread has
been merged into libc.

Fix this by updating the regexp.

Tested on x86_64-linux.
pipcet pushed a commit that referenced this issue Dec 23, 2021
This commit fixes Bug 28308, titled "Strange interactions with
dprintf and break/commands":

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28308

Since creating that bug report, I've found a somewhat simpler way of
reproducing the problem.  I've encapsulated it into the GDB test case
which I've created along with this bug fix.  The name of the new test
is gdb.base/dprintf-execution-x-script.exp, I'll demonstrate the
problem using this test case, though for brevity, I've placed all
relevant files in the same directory and have renamed the files to all
start with 'dp-bug' instead of 'dprintf-execution-x-script'.

The script file, named dp-bug.gdb, consists of the following commands:

dprintf increment, "dprintf in increment(), vi=%d\n", vi
break inc_vi
commands
  continue
end
run

Note that the final command in this script is 'run'.  When 'run' is
instead issued interactively, the  bug does not occur.  So, let's look
at the interactive case first in order to see the correct/expected
output:

$ gdb -q -x dp-bug.gdb dp-bug
... eliding buggy output which I'll discuss later ...
(gdb) run
Starting program: /mesquite2/sourceware-git/f34-master/bld/gdb/tmp/dp-bug
vi=0
dprintf in increment(), vi=0

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=1
dprintf in increment(), vi=1

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=2
dprintf in increment(), vi=2

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=3
[Inferior 1 (process 1539210) exited normally]

In this run, in which 'run' was issued from the gdb prompt (instead
of at the end of the script), there are three dprintf messages along
with three 'Breakpoint 2' messages.  This is the correct output.

Now let's look at the output that I snipped above; this is the output
when 'run' is issued from the script loaded via GDB's -x switch:

$ gdb -q -x dp-bug.gdb dp-bug
Reading symbols from dp-bug...
Dprintf 1 at 0x40116e: file dprintf-execution-x-script.c, line 38.
Breakpoint 2 at 0x40113a: file dprintf-execution-x-script.c, line 26.
vi=0
dprintf in increment(), vi=0

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	dprintf-execution-x-script.c: No such file or directory.
vi=1

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=2

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=3
[Inferior 1 (process 1539175) exited normally]

In the output shown above, only the first dprintf message is printed.
The 2nd and 3rd dprintf messages are missing!  However, all three
'Breakpoint 2...' messages are still printed.

Why does this happen?

bpstat_do_actions_1() in gdb/breakpoint.c contains the following
comment and code near the start of the function:

  /* Avoid endless recursion if a `source' command is contained
     in bs->commands.  */
  if (executing_breakpoint_commands)
    return 0;

  scoped_restore save_executing
    = make_scoped_restore (&executing_breakpoint_commands, 1);

Also, as described by this comment prior to the 'async' field
in 'struct ui' in top.h, the main UI starts off in sync mode
when processing command line arguments:

  /* True if the UI is in async mode, false if in sync mode.  If in
     sync mode, a synchronous execution command (e.g, "next") does not
     return until the command is finished.  If in async mode, then
     running a synchronous command returns right after resuming the
     target.  Waiting for the command's completion is later done on
     the top event loop.  For the main UI, this starts out disabled,
     until all the explicit command line arguments (e.g., `gdb -ex
     "start" -ex "next"') are processed.  */

This combination of things, the state of the static global
'executing_breakpoint_commands' plus the state of the async
field in the main UI causes this behavior.

This is a backtrace after hitting the dprintf breakpoint for
the second time when doing 'run' from the script file, i.e.
non-interactively:

Thread 1 "gdb" hit Breakpoint 3, bpstat_do_actions_1 (bsp=0x7fffffffc2b8)
    at /ironwood1/sourceware-git/f34-master/bld/../../worktree-master/gdb/breakpoint.c:4431
4431	  if (executing_breakpoint_commands)

 #0  bpstat_do_actions_1 (bsp=0x7fffffffc2b8)
     at gdb/breakpoint.c:4431
 #1  0x00000000004d8bc6 in dprintf_after_condition_true (bs=0x1538090)
     at gdb/breakpoint.c:13048
 #2  0x00000000004c5caa in bpstat_stop_status (aspace=0x116dbc0, bp_addr=0x40116e, thread=0x137f450, ws=0x7fffffffc718,
     stop_chain=0x1538090) at gdb/breakpoint.c:5498
 #3  0x0000000000768d98 in handle_signal_stop (ecs=0x7fffffffc6f0)
     at gdb/infrun.c:6172
 #4  0x00000000007678d3 in handle_inferior_event (ecs=0x7fffffffc6f0)
     at gdb/infrun.c:5662
 #5  0x0000000000763cd5 in fetch_inferior_event ()
     at gdb/infrun.c:4060
 #6  0x0000000000746d7d in inferior_event_handler (event_type=INF_REG_EVENT)
     at gdb/inf-loop.c:41
 #7  0x00000000007a702f in handle_target_event (error=0, client_data=0x0)
     at gdb/linux-nat.c:4207
 bminor#8  0x0000000000b8cd6e in gdb_wait_for_event (block=block@entry=0)
     at gdbsupport/event-loop.cc:701
 bminor#9  0x0000000000b8d032 in gdb_wait_for_event (block=0)
     at gdbsupport/event-loop.cc:597
 bminor#10 gdb_do_one_event () at gdbsupport/event-loop.cc:212
 bminor#11 0x00000000009d19b6 in wait_sync_command_done ()
     at gdb/top.c:528
 bminor#12 0x00000000009d1a3f in maybe_wait_sync_command_done (was_sync=0)
     at gdb/top.c:545
 bminor#13 0x00000000009d2033 in execute_command (p=0x7fffffffcb18 "", from_tty=0)
     at gdb/top.c:676
 bminor#14 0x0000000000560d5b in execute_control_command_1 (cmd=0x13b9bb0, from_tty=0)
     at gdb/cli/cli-script.c:547
 #15 0x000000000056134a in execute_control_command (cmd=0x13b9bb0, from_tty=0)
     at gdb/cli/cli-script.c:717
 #16 0x00000000004c3bbe in bpstat_do_actions_1 (bsp=0x137f530)
     at gdb/breakpoint.c:4469
 #17 0x00000000004c3d40 in bpstat_do_actions ()
     at gdb/breakpoint.c:4533
 #18 0x00000000006a473a in command_handler (command=0x1399ad0 "run")
     at gdb/event-top.c:624
 #19 0x00000000009d182e in read_command_file (stream=0x113e540)
     at gdb/top.c:443
 #20 0x0000000000563697 in script_from_file (stream=0x113e540, file=0x13bb0b0 "dp-bug.gdb")
     at gdb/cli/cli-script.c:1642
 #21 0x00000000006abd63 in source_gdb_script (extlang=0xc44e80 <extension_language_gdb>, stream=0x113e540,
     file=0x13bb0b0 "dp-bug.gdb") at gdb/extension.c:188
 #22 0x0000000000544400 in source_script_from_stream (stream=0x113e540, file=0x7fffffffd91a "dp-bug.gdb",
     file_to_open=0x13bb0b0 "dp-bug.gdb")
     at gdb/cli/cli-cmds.c:692
 #23 0x0000000000544557 in source_script_with_search (file=0x7fffffffd91a "dp-bug.gdb", from_tty=1, search_path=0)
     at gdb/cli/cli-cmds.c:750
 #24 0x00000000005445cf in source_script (file=0x7fffffffd91a "dp-bug.gdb", from_tty=1)
     at gdb/cli/cli-cmds.c:759
 #25 0x00000000007cf6d9 in catch_command_errors (command=0x5445aa <source_script(char const*, int)>,
     arg=0x7fffffffd91a "dp-bug.gdb", from_tty=1, do_bp_actions=false)
     at gdb/main.c:523
 #26 0x00000000007cf85d in execute_cmdargs (cmdarg_vec=0x7fffffffd1b0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND,
     ret=0x7fffffffd18c) at gdb/main.c:615
 #27 0x00000000007d0c8e in captured_main_1 (context=0x7fffffffd3f0)
     at gdb/main.c:1322
 #28 0x00000000007d0eba in captured_main (data=0x7fffffffd3f0)
     at gdb/main.c:1343
 #29 0x00000000007d0f25 in gdb_main (args=0x7fffffffd3f0)
     at gdb/main.c:1368
 #30 0x00000000004186dd in main (argc=5, argv=0x7fffffffd508)
     at gdb/gdb.c:32

There are two frames for bpstat_do_actions_1(), one at frame #16 and
the other at frame #0.  The one at frame #16 is processing the actions
for Breakpoint 2, which is a 'continue'.  The one at frame #0 is attempting
to process the dprintf breakpoint action.  However, at this point,
the value of 'executing_breakpoint_commands' is 1, forcing an early
return, i.e. prior to executing the command(s) associated with the dprintf
breakpoint.

For the sake of comparison, this is what the stack looks like when hitting
the dprintf breakpoint for the second time when issuing the 'run'
command from the GDB prompt.

Thread 1 "gdb" hit Breakpoint 3, bpstat_do_actions_1 (bsp=0x7fffffffccd8)
    at /ironwood1/sourceware-git/f34-master/bld/../../worktree-master/gdb/breakpoint.c:4431
4431	  if (executing_breakpoint_commands)

 #0  bpstat_do_actions_1 (bsp=0x7fffffffccd8)
     at gdb/breakpoint.c:4431
 #1  0x00000000004d8bc6 in dprintf_after_condition_true (bs=0x16b0290)
     at gdb/breakpoint.c:13048
 #2  0x00000000004c5caa in bpstat_stop_status (aspace=0x116dbc0, bp_addr=0x40116e, thread=0x13f0e60, ws=0x7fffffffd138,
     stop_chain=0x16b0290) at gdb/breakpoint.c:5498
 #3  0x0000000000768d98 in handle_signal_stop (ecs=0x7fffffffd110)
     at gdb/infrun.c:6172
 #4  0x00000000007678d3 in handle_inferior_event (ecs=0x7fffffffd110)
     at gdb/infrun.c:5662
 #5  0x0000000000763cd5 in fetch_inferior_event ()
     at gdb/infrun.c:4060
 #6  0x0000000000746d7d in inferior_event_handler (event_type=INF_REG_EVENT)
     at gdb/inf-loop.c:41
 #7  0x00000000007a702f in handle_target_event (error=0, client_data=0x0)
     at gdb/linux-nat.c:4207
 bminor#8  0x0000000000b8cd6e in gdb_wait_for_event (block=block@entry=0)
     at gdbsupport/event-loop.cc:701
 bminor#9  0x0000000000b8d032 in gdb_wait_for_event (block=0)
     at gdbsupport/event-loop.cc:597
 bminor#10 gdb_do_one_event () at gdbsupport/event-loop.cc:212
 bminor#11 0x00000000007cf512 in start_event_loop ()
     at gdb/main.c:421
 bminor#12 0x00000000007cf631 in captured_command_loop ()
     at gdb/main.c:481
 bminor#13 0x00000000007d0ebf in captured_main (data=0x7fffffffd3f0)
     at gdb/main.c:1353
 bminor#14 0x00000000007d0f25 in gdb_main (args=0x7fffffffd3f0)
     at gdb/main.c:1368
 #15 0x00000000004186dd in main (argc=5, argv=0x7fffffffd508)
     at gdb/gdb.c:32

This relatively short backtrace is due to the current UI's async field
being set to 1.

Yet another thing to be aware of regarding this problem is the
difference in the way that commands associated to dprintf breakpoints
versus regular breakpoints are handled.  While they both use a command
list associated with the breakpoint, regular breakpoints will place
the commands to be run on the bpstat chain constructed in
bp_stop_status().  These commands are run later on.  For dprintf
breakpoints, commands are run via the 'after_condition_true' function
pointer directly from bpstat_stop_status().  (The 'commands' field in
the bpstat is cleared in dprintf_after_condition_true().  This
prevents the dprintf commands from being run again later on when other
commands on the bpstat chain are processed.)

Another thing that I noticed is that dprintf breakpoints are the only
type of breakpoint which use 'after_condition_true'.  This suggests
that one possible way of fixing this problem, that of making dprintf
breakpoints work more like regular breakpoints, probably won't work.
(I must admit, however, that my understanding of this code isn't
complete enough to say why.  I'll trust that whoever implemented it
had a good reason for doing it this way.)

The comment referenced earlier regarding 'executing_breakpoint_commands'
states that the reason for checking this variable is to avoid
potential endless recursion when a 'source' command appears in
bs->commands.  We know that a dprintf command is constrained to either
1) execution of a GDB printf command, 2) an inferior function call of
a printf-like function, or 3) execution of an agent-printf command.
Therefore, infinite recursion due to a 'source' command cannot happen
when executing commands upon hitting a dprintf breakpoint.

I chose to fix this problem by having dprintf_after_condition_true()
directly call execute_control_commands().  This means that it no
longer attempts to go through bpstat_do_actions_1() avoiding the
infinite recursion check for potential 'source' commands on the
command chain.  I think it simplifies this code a little bit too, a
definite bonus.

Summary:

	* breakpoint.c (dprintf_after_condition_true): Don't call
	bpstat_do_actions_1().  Call execute_control_commands()
	instead.
pipcet pushed a commit that referenced this issue Dec 23, 2021
The immediate form of MSR has a 4-bit immediate field (in CRm).
However, many forms of MSR require a smaller immediate.  These cases
are identified by value in operand_general_constraint_met_p,
but they're now the common case rather than the exception.

This patch therefore adds the maximum value to the sys_reg
description and gets the range from there.  It also enforces
the minimum of 0, which avoids a situation in which:

  msr dit, #2

would give the expected:

  Error: immediate value out of range 0 to 1

whereas:

  msr dit, #-1

would give:

  Error: immediate value out of range 0 to 15

(from the later UIMM4 checking).

Also:

- we were reporting the first error above against the wrong operand
- TCO takes a single-bit immediate, but we previously allowed
  all 16 values.
  [https://developer.arm.com/documentation/ddi0596/2021-09/Base-Instructions/MSR--immediate---Move-immediate-value-to-Special-Register-?lang=en]

opcodes/
	* aarch64-opc.h (F_REG_MAX_VALUE, F_GET_REG_MAX_VALUE): New macros.
	* aarch64-opc.c (operand_general_constraint_met_p): Read the
	maximum MSR immediate value from aarch64_pstatefields.
	(aarch64_pstatefields): Add the maximum immediate value
	for each register.

gas/
	* testsuite/gas/aarch64/sysreg-4.s: Use an immediate value of 1
	rather than 8 for the TCO test.
	* testsuite/gas/aarch64/sysreg-4.d: Update accordingly.
	* testsuite/gas/aarch64/armv8_2-a-illegal.l: Fix operand number
	in MSR immediate error messages.
	* testsuite/gas/aarch64/diagnostic.l: Likewise.
	* testsuite/gas/aarch64/pan-illegal.l: Likewise.
	* testsuite/gas/aarch64/ssbs-illegal1.l: Likewise.
	* testsuite/gas/aarch64/illegal-sysreg-4b.s,
	* testsuite/gas/aarch64/illegal-sysreg-4b.d,
	* testsuite/gas/aarch64/illegal-sysreg-4b.l: New test.
pipcet pushed a commit that referenced this issue Dec 23, 2021
On Fedora 35,

$ readelf -d /usr/bin/npc

caused readelf to run out of stack since load_separate_debug_info
returned the input main file as the separate debug info:

(gdb) bt
 #0  load_separate_debug_info (
    main_filename=main_filename@entry=0x510f50 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo",
    xlink=xlink@entry=0x4e5180 <debug_displays+4480>,
    parse_func=parse_func@entry=0x431550 <parse_gnu_debuglink>,
    check_func=check_func@entry=0x432ae0 <check_gnu_debuglink>,
    func_data=func_data@entry=0x7fffffffdb60, file=file@entry=0x51d430)
    at /export/gnu/import/git/sources/binutils-gdb/binutils/dwarf.c:11057
 #1  0x000000000043328d in check_for_and_load_links (file=0x51d430,
    filename=0x510f50 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo")
    at /export/gnu/import/git/sources/binutils-gdb/binutils/dwarf.c:11381
 #2  0x00000000004332ae in check_for_and_load_links (file=0x51b070,
    filename=0x518dd0 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo")

Return NULL if the separate debug info is the same as the input main
file to avoid infinite recursion.

	PR binutils/28679
	* dwarf.c (load_separate_debug_info): Don't return the input
	main file.
pipcet pushed a commit that referenced this issue Dec 23, 2021
While working on another patch relating to remote targets, I wanted to
test with 'maint set target-async off' in place.  Unfortunately I ran
into some problems.  This commit is an attempt to fix one of the
issues I hit.

In my particular case I was actually running with:

  maint set target-async off
  maint set target-non-stop off

that is, we're telling GDB to force the targets to operate in
non-async mode, and in all-stop mode.  Here's my GDB session showing
the problem:

  (gdb) maintenance set target-async off
  (gdb) maintenance set target-non-stop off
  (gdb) target extended-remote :54321
  Remote debugging using :54321
  (gdb) attach 2365960
  Attaching to process 2365960
  No unwaited-for children left.
  (gdb)

Notice the 'No unwaited-for children left.' error, this is the
problem.  There's no reason why GDB should not be able to attach to
the process.

The problem is this:

  1. The user runs 'attach PID' and this sends GDB into attach_command
  in infcmd.c.  From here we call the ::attach method on the attach
  target, which will be the extended_remote_target.

  2. In extended_remote_target::attach, we attach to the remote target
  and get the first reply (which is a stop packet).  We put off
  processing the stop packet until the end of ::attach.  We setup the
  inferior and thread to represent the process we attached to, and
  download the target description.  Finally, we process the initial
  stop packet.

  If '!target_is_non_stop_p ()' and '!target_can_async_p ()', which is
  the case for us given the maintenance commands we used, we cache the
  stop packet within the remote_state::buf for later processing.

  3. Back in attach_command, if 'target_is_non_stop_p ()' then we
  request that the target stops.  This will either process any cached
  stop replies, or request that the target stops, and process the stop
  replies.  However, this code is not what we use due to non-stop mode
  being disabled.  So, we skip to the next step which is to call
  validate_exec_file.

  4. Calling validate_exec_file can cause packets to be sent to the
  remote target, and replies received, the first path I hit is the
  call to target_pid_to_exec_file, which calls
  remote_target::pid_to_exec_file, which can then try to read the
  executable from the remote.  Sending an receiving packets will make
  use of the remote_state::buf object.

  5. The attempt to attach continues, but the damage is already done...

So, the problem is that, in step #2 we cache a stop reply in the
remote_state::buf, and then in step #4 we reuse the remote_state::buf
object, discarding any cached stop reply.  As a result, the initial
stop, which is sent when GDB first attaches to the target, is lost.

This problem can clearly be seen, I feel, by looking at the
remote_state::cached_wait_status flag.  This flag tells GDB if there
is a wait status cached in remote_state::buf.  However, in
remote_target::putpkt_binary and remote_target::getpkt_or_notif_sane_1
this flag is just set back to 0, doing this immediately discards any
cached data.

I don't know if this scheme ever made sense,  looking at commit
2d717e4, where the cached_wait_status flag was added, it appears
that there was nothing between where the stop was cached, and where
the stop was consumed, so, I suspect, there never was a situation
where we ended up in putpkt_binary or getpkt_or_notif_sane_1 and
needed to clear to the flag, maybe the clearing was added "just in
case".  Whatever the history, I claim that this clearing this flag is
no longer a good idea.

So, my first step toward fixing this issue was to replace the two
instances of 'rs->cached_wait_status = 0;' in ::putpkt_binary and
::getpkt_or_notif_sane_1 with 'gdb_assert (rs->cached_wait_status ==
0);', this, at least would show me when GDB was doing something
dangerous, and indeed, this assert is now hit in my test case above.

I did play with using some kind of scoped restore to backup, and
restore the remote_state::buf object in all the places within remote.c
that I was hitting where the ::buf was being corrupted.  The first
problem with this is that, where the ::cached_wait_status flag is
reset is _not_ where ::buf is corrupted.  For the ::putpkt_binary
case, by the time we get to the method the buffer has already been
corrupted in many cases, so we end up needing to add the scoped
save/restore within the callers, which means we need the save/restore
in _lots_ of places.

Plus, using this save/restore model feels like the wrong solution.  I
don't think that it's obvious that the buffer might be holding cached
data, and I think it would be too easy for new corruptions of the
buffer to be introduced, which could easily go unnoticed for a long
time.

So, I really wanted a solution that didn't require us to cache data in
the ::buf object.

Luckily, I think we already have such a solution in place, the
remote_state::stop_reply_queue, it seems like this does exactly the
same task, just in a slightly different way.  With the
::stop_reply_queue, the stop packets are processed upon receipt and
the stop_reply object is added to the queue.  With the ::buf cache
solution, the unprocessed stop reply is cached in the ::buf, and
processed later.

So, finally, in this commit, I propose to remove the
remote_state::cached_wait_status flag and to stop using the ::buf to
cache stop replies.  Instead, stop replies will now always be stored
in the ::stop_reply_queue.

There are two places where we use the ::buf to hold a cached stop
reply, the first is in the ::attach method, and the second is in
remote_target::start_remote, however, the second of these cases is far
less problematic, as after caching the stop reply in ::buf we call the
global start_remote function, which does very little work before
calling normal_stop, which processes the cached stop reply.  However,
my plan is to switch both users over to using ::stop_reply_queue so
that the old (unsafe) ::cached_wait_status mechanism can be completely
removed.

The next problem is that the ::stop_reply_queue is currently only used
for async-mode, and so, in remote_target::push_stop_reply, where we
push stop_reply objects into the ::stop_reply_queue, we currently also
mark the async event token.  I've modified this so we only mark the
async event token if 'target_is_async_p ()' - note, _is_, not _can_
here. The ::push_stop_reply method is called in places where async
mode has been temporarily disabled, but, when async mode is switched
back on (see remote_target::async) we will mark the event token if
there are events in the queue.

Another change of interest is in remote_target::remote_interrupt_as.
Previously this code checked ::cached_wait_status, but didn't check
for events in the ::stop_reply_queue.  Now that ::cached_wait_status
has been removed we now check the queue length instead, which should
have the same result.

Finally, in remote_target::wait_as, I've tried to merge the processing
of the ::stop_reply_queue with how we used to handle the
::cached_wait_status flag.

Currently, when processing the ::stop_reply_queue we call
process_stop_reply and immediately return.  However, when handling
::cached_wait_status we run through the whole of ::wait_as, and return
at the end of the function.

If we consider a standard stop packet, the two differences I see are:

  1. Resetting of the remote_state::waiting_for_stop_reply, flag; this
  is not currently done when processing a stop from the
  ::stop_reply_queue.

  2. The final return value has the possibility of being adjusted at
  the end of ::wait_as, as well as there being calls to
  record_currthread, non of which are done if we process a stop from
  the ::stop_reply_queue.

After discussion on the mailing list:

  https://sourceware.org/pipermail/gdb-patches/2021-December/184535.html

it was suggested that, when an event is pushed into the
::stop_reply_queue, the ::waiting_for_stop_reply flag is never going
to be set.  As a result, we don't need to worry about the first
difference.  I have however, added a gdb_assert to validate the
assumption that the flag is never going to be set.  If in future the
situation ever changes, then we should find out pretty quickly.

As for the second difference, I have resolved this by having all stop
packets taken from the ::stop_reply_queue, pass through the return
value adjustment code at the end of ::wait_as.

An example of a test that reveals the benefits of this commit is:

  make check-gdb \
    RUNTESTFLAGS="--target_board=native-extended-gdbserver \
                  GDBFLAGS='-ex maint\ set\ target-async\ off \
                            -ex maint\ set\ target-non-stop\ off' \
                  gdb.base/attach.exp"

For testing I've been running test on x86-64/GNU Linux, and run with
target boards unix, native-gdbserver, and native-extended-gdbserver.
For each board I've run with the default GDBFLAGS, as well as with:

  GDBFLAGS='-ex maint\ set\ target-async\ off \
            -ex maint\ set\ target-non-stop\ off' \

Though running with the above GDBFLAGS is clearly a lot more unstable
both before and after my patch, I'm not seeing any consistent new
failures with my patch, except, with the native-extended-gdbserver
board, where I am seeing new failures, but only because more tests are
now running.  For that configuration alone I see the number of
unresolved go down by 49, the number of passes goes up by 446, and the
number of failures also increases by 144.  All of the failures are new
tests as far as I can tell.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant